C# (CSharp) WebCrawler Crawler.FixPath примеры использования

Язык программирования: C# (CSharp)

Пространство имен/Пакет: WebCrawler

Класс/Тип: Crawler

Метод/Функция: FixPath

Примеров на hotexamples.com: 1

C# (CSharp) WebCrawler Crawler.FixPath - 1 пример найден. Это лучшие примеры C# (CSharp) кода для WebCrawler.Crawler.FixPath, полученные из open source проектов. Вы можете ставить оценку каждому примеру, чтобы помочь нам улучшить качество примеров.

Основные методы

Показать Скрыть

Crawl(10)

ReadTextFrom(3)

CrawlAsync(2)

SaveInFile(2)

CrawlPage(2)

RunAsync(2)

ProcessAsync(2)

Parse(2)

AddLink(1)

Hostname(1)

Save(1)

ReplaceAllIllegalCharactersInPathWith(1)

ReadDateFrom(1)

PrintArticles(1)

Output(1)

InitilizeCreateReport(1)

FixPath(1)

GetInfoFromWikipediaAndSaveToFilePathAsync(1)

GetHtmlCodeFromPageAndSaveToFileAsync(1)

Craw(1)

FindLinksInPageAsync(1)

FileHandler(1)

ExtractImageUrls(1)

EnqueueNewUri(1)

DownloadImagesFromUrlAndSaveToFolder(1)

DownloadFile(1)

DiscoverWebSites(1)

CreateQueue(1)

Crawling(1)

CrawlItAsync(1)

SearchForPhrase(1)

Пример #1

Показать файл

        /// <summary>
        /// Parses a page looking for links.
        /// </summary>
        /// <param name="page">The page whose text is to be parsed.</param>
        /// <param name="sourceUrl">The source url of the page.</param>
        public void ParseLinks(Page page, string sourceUrl)
        {
            MatchCollection matches = Regex.Matches(page.Text, _LINK_REGEX);

            for (int i = 0; i <= matches.Count - 1; i++)
            {
                Match anchorMatch = matches[i];

                if (anchorMatch.Value == String.Empty)
                {
                    BadUrls.Add("Blank url value on page " + sourceUrl);
                    continue;
                }

                string foundHref = null;
                try
                {
                    foundHref = anchorMatch.Value.Replace("href=\"", "");
                    foundHref = foundHref.Substring(0, foundHref.IndexOf("\""));
                }
                catch (Exception exc)
                {
                    Exceptions.Add("Error parsing matched href: " + exc.Message);
                }


                if (!GoodUrls.Contains(foundHref))
                {
                    if (foundHref != "/")
                    {
                        if (IsExternalUrl(foundHref))
                        {
                            _externalUrls.Add(foundHref);
                        }
                        else if (!IsAWebPage(foundHref))
                        {
                            foundHref = Crawler.FixPath(sourceUrl, foundHref);
                            _otherUrls.Add(foundHref);
                        }
                        else
                        {
                            GoodUrls.Add(foundHref);
                        }
                    }
                }
            }
        }