C# (CSharp) SearchEngine Indexer.RemoveStopWords 예제들

프로그래밍 언어: C# (CSharp)

네임스페이스/패키지 이름: SearchEngine

클래스/타입: Indexer

메소드/함수: RemoveStopWords

hotexamples.com에서의 예제들: 2

C# (CSharp) SearchEngine Indexer.RemoveStopWords - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 C# (CSharp)의 SearchEngine.Indexer.RemoveStopWords에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

writeTextDic(3)

writeTextChache(2)

ProcessBatch(2)

RemoveStopWords(2)

getPostingString(1)

tempPosting(1)

tempPost(1)

postingList(1)

mergefile(1)

indexDocs(1)

AddWordsToIndexList(1)

endSession(1)

CalculateIDF(1)

Update(1)

StartIndexing(1)

IndexAllParallel(1)

Delete(1)

Calculatetfstar(1)

CalculateLength(1)

Write(1)

예제 #1

파일 보기

파일: Ranker.cs 프로젝트: Jacob-Holm-Mortensen/Wi

        public List <KeyValuePair <int, double> > GetPagesWithWords(string words, Dictionary <string, Dictionary <int, double> > index)
        {
            List <string> split = i.RemoveStopWords(words.Split(new char[0], StringSplitOptions.RemoveEmptyEntries).ToList());
            Dictionary <string, Dictionary <int, double> > output, tf, tfidf = new Dictionary <string, Dictionary <int, double> >();
            Dictionary <string, double>        idf = new Dictionary <string, double>();
            List <KeyValuePair <int, double> > pages = new List <KeyValuePair <int, double> >();
            Dictionary <int, List <double> >   vectors = new Dictionary <int, List <double> >();

            tf    = tfCalc(index);
            idf   = idfCalc(index);
            tfidf = tfidfCalc(tf, idf);
            // implement vector compare

            // Use tfidf comparison
            output = tfidf.Where(x => split.Any(z => z == x.Key)).ToDictionary(x => x.Key, x => x.Value);

            // Make vectors
            //vectors = CreateVectors(output, tfidf);

            // Make vector comparison

            if (output.Count > 0)
            {
                pages = output[output.Keys.First()].ToList();
                foreach (var key in output.Keys.ToList())
                {
                    pages = pages.Where(x => output[key].ContainsKey(x.Key)).ToList();
                }
                pages.Sort((pair1, pair2) => pair2.Value.CompareTo(pair1.Value));
            }
            return(pages);
        }

예제 #2

파일 보기

        private void FetchData(string url)
        {
            List <string> hyperlinks = new List <string>();
            string        content    = string.Empty;

            HtmlWeb      web = new HtmlWeb();
            HtmlDocument doc;

            try
            {
                doc = web.Load(url);
            }
            catch (Exception)
            {
                return;
            }

            Task hyperlinkTask = Task.Run(() =>
            {
                HtmlNodeCollection hyperNodes = doc.DocumentNode.SelectNodes("//a[@href]");

                if (!(hyperNodes == null))
                {
                    foreach (HtmlNode link in hyperNodes)
                    {
                        string href = string.Empty;

                        try
                        {
                            href = link.OuterHtml.Split("\"")[1];
                        }
                        catch (IndexOutOfRangeException)
                        {
                            continue;
                        }

                        if (href.StartsWith("/"))
                        {
                            href = url + href.Substring(1);
                        }

                        if (href.StartsWith("http"))
                        {
                            hyperlinks.Add(href);
                        }
                    }
                }

                SortHyperLinks(new Uri(url).Host, hyperlinks);
            });

            // Preprocessering the content
            Task preprocesseringTask = Task.Run(() =>
            {
                HtmlNodeCollection contentNodes = doc.DocumentNode.SelectNodes("//body");

                if (!(contentNodes == null))
                {
                    foreach (HtmlNode text in doc.DocumentNode.SelectNodes("//body"))
                    {
                        if (!string.IsNullOrWhiteSpace(text.InnerText))
                        {
                            content += text.InnerText.Trim().Replace("&nbsp", "");
                        }
                    }
                }

                content   = Regex.Replace(content, @"\s+", " ");
                Regex rgx = new Regex("[^a-zA-Z0-9 ÆØÅ æøå -]");
                content   = rgx.Replace(content, "");
                content   = Indexer.RemoveStopWords(content.ToLower());
                ContentHandler.AddContent(content, url);
            });
        }