C# (CSharp) SearchEngine Indexer.RemoveStopWords示例

编程语言: C# (CSharp)

命名空间/包名称: SearchEngine

类/类型: Indexer

方法/功能: RemoveStopWords

hotexamples.com的示例: 2

C# (CSharp) SearchEngine Indexer.RemoveStopWords - 已找到2个示例。这些是从开源项目中提取的最受好评的SearchEngine.Indexer.RemoveStopWords现实C# (CSharp)示例。您可以评价示例，以帮助我们提高示例质量。

常用方法

显示隐藏

writeTextDic(3)

writeTextChache(2)

ProcessBatch(2)

RemoveStopWords(2)

getPostingString(1)

tempPosting(1)

tempPost(1)

postingList(1)

mergefile(1)

indexDocs(1)

AddWordsToIndexList(1)

endSession(1)

CalculateIDF(1)

Update(1)

StartIndexing(1)

IndexAllParallel(1)

Delete(1)

Calculatetfstar(1)

CalculateLength(1)

Write(1)

示例#1

显示文件

文件： Ranker.cs 项目： Jacob-Holm-Mortensen/Wi

        public List <KeyValuePair <int, double> > GetPagesWithWords(string words, Dictionary <string, Dictionary <int, double> > index)
        {
            List <string> split = i.RemoveStopWords(words.Split(new char[0], StringSplitOptions.RemoveEmptyEntries).ToList());
            Dictionary <string, Dictionary <int, double> > output, tf, tfidf = new Dictionary <string, Dictionary <int, double> >();
            Dictionary <string, double>        idf = new Dictionary <string, double>();
            List <KeyValuePair <int, double> > pages = new List <KeyValuePair <int, double> >();
            Dictionary <int, List <double> >   vectors = new Dictionary <int, List <double> >();

            tf    = tfCalc(index);
            idf   = idfCalc(index);
            tfidf = tfidfCalc(tf, idf);
            // implement vector compare

            // Use tfidf comparison
            output = tfidf.Where(x => split.Any(z => z == x.Key)).ToDictionary(x => x.Key, x => x.Value);

            // Make vectors
            //vectors = CreateVectors(output, tfidf);

            // Make vector comparison

            if (output.Count > 0)
            {
                pages = output[output.Keys.First()].ToList();
                foreach (var key in output.Keys.ToList())
                {
                    pages = pages.Where(x => output[key].ContainsKey(x.Key)).ToList();
                }
                pages.Sort((pair1, pair2) => pair2.Value.CompareTo(pair1.Value));
            }
            return(pages);
        }

示例#2

显示文件

        private void FetchData(string url)
        {
            List <string> hyperlinks = new List <string>();
            string        content    = string.Empty;

            HtmlWeb      web = new HtmlWeb();
            HtmlDocument doc;

            try
            {
                doc = web.Load(url);
            }
            catch (Exception)
            {
                return;
            }

            Task hyperlinkTask = Task.Run(() =>
            {
                HtmlNodeCollection hyperNodes = doc.DocumentNode.SelectNodes("//a[@href]");

                if (!(hyperNodes == null))
                {
                    foreach (HtmlNode link in hyperNodes)
                    {
                        string href = string.Empty;

                        try
                        {
                            href = link.OuterHtml.Split("\"")[1];
                        }
                        catch (IndexOutOfRangeException)
                        {
                            continue;
                        }

                        if (href.StartsWith("/"))
                        {
                            href = url + href.Substring(1);
                        }

                        if (href.StartsWith("http"))
                        {
                            hyperlinks.Add(href);
                        }
                    }
                }

                SortHyperLinks(new Uri(url).Host, hyperlinks);
            });

            // Preprocessering the content
            Task preprocesseringTask = Task.Run(() =>
            {
                HtmlNodeCollection contentNodes = doc.DocumentNode.SelectNodes("//body");

                if (!(contentNodes == null))
                {
                    foreach (HtmlNode text in doc.DocumentNode.SelectNodes("//body"))
                    {
                        if (!string.IsNullOrWhiteSpace(text.InnerText))
                        {
                            content += text.InnerText.Trim().Replace("&nbsp", "");
                        }
                    }
                }

                content   = Regex.Replace(content, @"\s+", " ");
                Regex rgx = new Regex("[^a-zA-Z0-9 ÆØÅ æøå -]");
                content   = rgx.Replace(content, "");
                content   = Indexer.RemoveStopWords(content.ToLower());
                ContentHandler.AddContent(content, url);
            });
        }