C# (CSharp) UPlagSolution.AlgorithmModules StopWordsHandler 예제들

프로그래밍 언어: C# (CSharp)

네임스페이스/패키지 이름: UPlagSolution.AlgorithmModules

클래스/타입: StopWordsHandler

hotexamples.com에서의 예제들: 2

C# (CSharp) UPlagSolution.AlgorithmModules StopWordsHandler - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 C# (CSharp)의 UPlagSolution.AlgorithmModules.StopWordsHandler에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

IsStopWord(1)

예제 #1

파일 보기

파일: Tokeniser.cs 프로젝트: ZoobiDoobi/Urdu-Documents-Plagiarism-Detection

        /// <summary>
        /// This function tokenizes the string(content of each document one by one). It tokenize by using regular expressions
        /// Also, StopWords will also be removed here
        /// </summary>
        /// <param name="input"></param>
        /// <returns>it returns a string array whose each index contains a token.</returns>
        public string[] Tokenize(string documentContents)
        {
            string pattern = "[ ۔،؛:)(!؟/؎{}‘’0123456789]"; //it will match space and other punctuation marks.
            Regex  _regex  = new Regex(pattern);

            string[] tokens = _regex.Split(documentContents);

            List <string> processedList = new List <string>(); // this list will contain words after punctuation removal and stopword removal

            for (int i = 0; i < tokens.Length; i++)
            {
                //Below line further checks if any character in RE is still in content, maybe a space character, which will be removed in if condition
                MatchCollection mc = _regex.Matches(tokens[i]); //Represents the set of successful matches found by iteratively applying a regular

                //expression pattern to the input string.
                if (mc.Count <= 0 && tokens[i].Trim().Length > 0 && !StopWordsHandler.IsStopWord(tokens[i]))
                {
                    processedList.Add(tokens[i]);
                }
            }
            return(processedList.ToArray());
        }

예제 #2

파일 보기

파일: Tokeniser.cs 프로젝트: ZoobiDoobi/Urdu-Documents-Plagiarism-Detection

 public Tokeniser()
 {
     StopWordsHandler stopWordHandler = new StopWordsHandler();
 }