Skip to content

perusworld/WebScraper.NET

Repository files navigation

A .Net based Web Scraper using the WebBrowser control

nuget Build status

Usage

Install

Install the library using nuget package manager (search for WebScraper.NET) or using Package Manager Console

Install-Package WebScraper.NET -Version 3.0.0

Install the library using .NET CLI

dotnet add package WebScraper.NET --version 3.0.0

Sample Code

Create a form with a WebBrowser control

  • Import Namespace
using WebScraper.Web;

Implement the AgentCallback interface

  • Build the actions that you want to do, for example
List<WebAction> actions = new List<WebAction>();
//goto home url - https://www.bing.com/
actions.Add(new SimpleWebAction(new UrlWebStep("open search", "https://www.bing.com/"),
    new LocatorCheckValidator(new SimpleHtmlElementLocator("q search box",
    new AttributeHtmlElementMatcher("q search box", "name", "q"))), waitForEvent: true));
//submit a search
actions.Add(new SimpleWebAction(new FormWebStep("submit search", new IdElementLocator("locate form to submit", "sb_form"), new Dictionary<String, String>
{
    {"sb_form_q", "WebScraper.NET github"}
}
), waitForEvent: true));
//load results
actions.Add(new ExtractWebAction<String>(new StringHtmlElementDataExtractor("href"), "firstNavLink",
    new TagElementLocator("match results", "ol", false, "firstResultLink",
    new SimpleChildHtmlElementLocator("find first link",
    filter: new SimpleChildHtmlElementLocator("get first a", new TagHtmlElementMatcher("match first a", "a"))),
    new IdHtmlElementMatcher("match results ol", "b_results"))));
//goto first result
actions.Add(new SimpleWebAction(new UrlWebStep("open search", "firstNavLink"), new TitleWebValidator("GitHub - perusworld/WebScraper.NET: A .Net based Web Scraper using the WebBrowser control"), waitForEvent: true));
  • Initialize and call the agent
SimpleAgent bingSearchAgent = new SimpleAgent(webBrowser, actions);
bingSearchAgent.AgentCallback = this;
bingSearchAgent.init();
bingSearchAgent.startAgent();
  • once the actions are done, the onCompleted AgentCallback will be called
public void onCompleted(Agent agent)
{
	//completed actions
}

About

A .Net based Web Scraper using the WebBrowser control

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages