Skip to content

bogdan-litescu/SharpCover.NetCrawler

Repository files navigation

SharpCover.NetCrawler

Currently under development!

Released under dual license, Affero GPL and Comercial License. Contact us for pricing, customizations, support.

Usage example:

The model:

[CrawlWithXPath("//*[@id='productInfo']")]
class TestAppStoreProduct
{
    [CrawlWithXPath("./div[@class='title']/h1/text()")]
    public string Title { get; set; }

    [CrawlWithXPath("./div[@class='description']/text()")]
    public string Description { get; set; }

    [CrawlWithRegex("class\\s*=\\s*['\"]gallery['\"].+?src\\s*=\\s*['\"]([^\"]+)", MatchGroup=1)]
    public string IconUrl { get; set; }
}

Loading XHTML document and parsing data into the model:

  var content = new XHtmlContent();
  content.LoadFromFile("product-info.html");

  var crawler = new NetCrawler(content);
  var quickTimeProduct = crawler.Crawl<TestAppStoreProduct>();

Features:

  • Parse data directly into model based on class and property attributes
  • Class attribute to filter content down to a section before extracting content into class properties
  • Ability to stack crawlers, so a property can be extracted by doing an XPath operation followed by a Regex
  • XPath crawler that works with XHTML (with Agility Pack) and XML
  • Regex crawler
  • Default Value if content is not matched or optionally throw exception
  • Parse List of objects (for example a products list)

TODO:

  • custom type crawler
  • StripHtml crawler
  • Regex.Replace crawler
  • Follow nested modeles (members of the "root" model) to further exract data
  • handle filling types other than strings in the model
  • add logging

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages