C# (CSharp) CrawlNewsComments ArticleParserHelper.GetHtmlDoc 예제들

프로그래밍 언어: C# (CSharp)

네임스페이스/패키지 이름: CrawlNewsComments

클래스/타입: ArticleParserHelper

메소드/함수: GetHtmlDoc

hotexamples.com에서의 예제들: 4

C# (CSharp) CrawlNewsComments ArticleParserHelper.GetHtmlDoc - 4개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 C# (CSharp)의 CrawlNewsComments.ArticleParserHelper.GetHtmlDoc에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

GetHtmlDoc(4)

GetHtmlStr(2)

CopyFileAndUpdateLatestFile(1)

SaveToLocalFile(1)

예제 #1

파일 보기

파일: Crawler_QQ.cs 프로젝트: nirofang/DAASCrawl

        public override IList <NewsItem> GetNewsList()
        {
            IList <NewsItem> newsList = new List <NewsItem>();
            HtmlDocument     doc      = ArticleParserHelper.GetHtmlDoc(siteUrl);

            if (null != doc)
            {
                HtmlNode rootNode = doc.DocumentNode;

                //Get top 1 heading list
                //this.GetHeadingNewsList(rootNode, newsList);

                //Get 2nd heading list
                //this.GetHeadingNewsList(rootNode, newsList, "//div[@id='headingNews']/div[@class='hdNews hasPic cf']");

                //Get top news in main page with one pic
                this.GetHotNewsList(rootNode, newsList,
                                    "//div[@class='item major']/div[@class='Q-tpList']");

                //Get hot top new in main page with multi pics
                this.GetHotNewsList(rootNode, newsList,
                                    "//div[@class='item major']/div[@class='Q-pList']");
            }

            return(newsList.GroupBy(n => n.BaseUrl).Select(g => g.First()).ToList() as IList <NewsItem>);  //Distinct news
        }

예제 #2

파일 보기

        private string[] GetNewsBodyText(string url)
        {
            string       firstpara = string.Empty;
            string       content   = string.Empty;
            HtmlDocument doc       = ArticleParserHelper.GetHtmlDoc(url);

            if (null != doc)
            {
                HtmlNode rootNode    = doc.DocumentNode;
                string   xpathhdNews = "//div[@class='article-content']//p";

                HtmlNodeCollection newshdCollection = rootNode.SelectNodes(xpathhdNews);
                if (null == newshdCollection)
                {
                    return(new string[] { firstpara, content });
                }

                foreach (HtmlNode wraperNode in newshdCollection)
                {
                    if (!string.IsNullOrEmpty(wraperNode.InnerText.Trim()))
                    {
                        firstpara = wraperNode.InnerText.Trim();
                        break;
                    }
                }

                foreach (HtmlNode wraperNode in newshdCollection)
                {
                    content += wraperNode.InnerText.Trim();
                }
            }

            return(new string[] { firstpara, content });
        }

예제 #3

파일 보기

        public override IList <Comment> GetComments(NewsItem state)
        {
            List <Comment> comments = new List <Comment>();

            HtmlDocument doc = ArticleParserHelper.GetHtmlDoc(state.CommentUrl);

            if (null != doc)
            {
                HtmlNode rootNode    = doc.DocumentNode;
                string   xpathhdNews = "//div[@class='comment-content']";

                HtmlNodeCollection newshdCollection = rootNode.SelectNodes(xpathhdNews);
                if (null == newshdCollection)
                {
                    return(comments);
                }

                foreach (HtmlNode wraperNode in newshdCollection)
                {
                    Comment  c         = new Comment();
                    HtmlNode c_content = wraperNode.SelectSingleNode("./div[@class='content']");
                    c.Cotent = c_content.InnerText.Trim();
                    HtmlNode c_vote = wraperNode.SelectSingleNode("./div[@class='comment_actions clearfix']/span[@class='action']/a[@class='comment_digg ']");
                    c.Vote = Convert.ToInt32(c_vote.InnerText.Trim());

                    comments.Add(c);
                }
            }

            return(comments);
        }

예제 #4

파일 보기

파일: Crawler_QQ.cs 프로젝트: nirofang/DAASCrawl

        /// <summary>
        /// Crawl news keywords, first paragraph and body text.
        /// </summary>
        /// <param name="news">Adding more info to the news item.</param>
        private void GetMoreNewsInfo(NewsItem news)
        {
            if (null == news)
            {
                news = new NewsItem();
            }

            System.Threading.Thread.Sleep(1000);  //delayed request is required to unblock from server side
            Console.WriteLine("Crawling url: {0}", news.BaseUrl);
            HtmlDocument doc = ArticleParserHelper.GetHtmlDoc(news.BaseUrl);

            if (null == doc)
            {
                return;
            }

            HtmlNode rootNode = doc.DocumentNode;

            news.Keywords = GetNewsKeywords(rootNode, " ");

            string[] content = GetNewsContent(rootNode);
            news.FirstPara = content[0];
            news.BodyText  = content[1];
        }