C# (CSharp) ISpider.HtmlToTextAsync Exemples

Langage de programmation: C# (CSharp)

Class/Type: ISpider

Méthode/Fonction: HtmlToTextAsync

Exemples au hotexamples.com: 1

C# (CSharp) ISpider.HtmlToTextAsync - 1 exemples trouvés. Ce sont les exemples réels les mieux notés de ISpider.HtmlToTextAsync extraits de projets open source. Vous pouvez noter les exemples pour nous aider à en améliorer la qualité.

Méthodes fréquemment utilisées

Afficher Cacher

Exit(10)

Pause(7)

Log(7)

Contiune(6)

TurnRight(3)

TurnLeft(3)

MoveFront(3)

GetPosition(3)

RunAsync(2)

GetOrientation(2)

HasCompleted(1)

Start(1)

RemoveNodesFromDocument(1)

LoadPage(1)

HtmlToTextAsync(1)

AddCookie(1)

HandleTags(1)

HandleMedia(1)

Continue(1)

Grab(1)

GetType(1)

GetMedia(1)

GetHeadersOfSize(1)

Extract(1)

DownloadArticleByHeader(1)

Dispose(1)

Crawl(1)

HandleLinks(1)

Méthodes fréquemment utilisées

Exit (10)

Pause (7)

Log (7)

Contiune (6)

TurnRight (3)

TurnLeft (3)

MoveFront (3)

GetPosition (3)

RunAsync (2)

GetOrientation (2)

Méthodes fréquemment utilisées

HasCompleted (1)

Start (1)

RemoveNodesFromDocument (1)

LoadPage (1)

HtmlToTextAsync (1)

AddCookie (1)

HandleTags (1)

HandleMedia (1)

Continue (1)

Grab (1)

GetType (1)

GetMedia (1)

GetHeadersOfSize (1)

Extract (1)

DownloadArticleByHeader (1)

Dispose (1)

Crawl (1)

HandleLinks (1)

Méthodes fréquemment utilisées

GetType (1)

GetMedia (1)

GetHeadersOfSize (1)

Extract (1)

DownloadArticleByHeader (1)

Dispose (1)

Crawl (1)

HandleLinks (1)

Exemple #1

0

Afficher le fichier

Fichier : Extractor.cs Projet : mimustafa/MediaSpin

public string ExtractBodyTextFromArticleDocument(HtmlDocument articleHtmlDocument) { RemoveHeadersFromDocument(articleHtmlDocument); RemoveLinksFromDocument(articleHtmlDocument); RemoveUnorderedListsFromDocument(articleHtmlDocument); RemoveScriptsFromDocument(articleHtmlDocument); if (articleHtmlDocument?.DocumentNode?.OuterHtml == null) { return(String.Empty); } var cleanedHtml = articleHtmlDocument.DocumentNode.OuterHtml; var htmlToTextConversion = _spider.HtmlToTextAsync(cleanedHtml); Task.WaitAll(htmlToTextConversion); if (htmlToTextConversion.IsCompletedSuccessfully) { var articleText = htmlToTextConversion.Result.Replace("\n", " "); var finalArticleText = RemoveNonBodyTextSentences(articleText); return(finalArticleText); } else { throw new Exception($"could not convert the following html to text {cleanedHtml}"); } }