Query language for webscraping
We are using the packages "HTMLAgilityPack" and "Csharpmonad" to realize this project.
Regarding the syntax, ScrapeQL is much alike its famous paragon, SQL.
There are several queries.
- Load-Query: Loads a website or HTML File for further selection into a virtual workspace, later accessible by
identifier
.
LOAD "filename.fileExtension/http://websiteName.domain" AS Identifier
- Write-Query: Writes the finished selection into filename.filextension.
WRITE identifier TO "filename.fileextension"
- Select-Query: Selfexplanatory... Selects from
identifier
using givenselector
.
SELECT "selector" FROM identifier <WHERE attribute=value|identifier CONTAINS attribute>
Generally, console commands begin with a :
.
- Load File Command: Loads a file of filetype ".scrapeql" and executes it. The file can contain both console commands and queries. (Not to be confused with Load-Query)
:load file.scrapeql
- Print Variable Command: Prints designated variable.
:printvar identifier
- Print Scope Command: Prints names of all loaded objects
:printscope
- Setprompt Command: Sets the prompt. Default is "ScrapeQL>".
:setprompt string
- Clear command: Clears the commandline.
:clear
(Note: Commands surrounded by <> are optional. Choose between Codeblocks, when they're separated by |
. The language is case sensitive.)