- This solution is not complete. It was meant as a fun exercise after I stumbled upon the Eaze interview project.
- The solution currently has a threading issue when dealing with deterministic crawl jobs. (Something I still need to fix.) If you run in single-threaded configuration then it works 100%. It is thread-safe, but will not auto-detect when a crawl job is complete when parallelism is enabled.
- The crawler engine is quite extensible for many job types.
- The crawler's design does not currently work for single-page apps.
- I am using Microsoft Orleans because I wanted to learn it after doing other projects in Akka.net. It's very nice!
- Hangfire? Not so nice. Probably would not use this lib again, although it has its benefits.
- I haven't looked at this code in several months. Just didn't want to lose it.
Backend .Net Interview Coding Exercise
Build a solution to solve the below problem using a .Net 4.6 solution.
Problem:
- We need an API endpoint that has the ability take a request to scrape a web page.
- The API must allow for submitting a job, checking the status of a job, and retrieving the results.
- This endpoint will be hit very heavily, so we need to design it to remain available under heavy load and when a scraping job takes an extended time.
Hints:
- Look at using a job scheduler like Quartz
- Be sure to write unit tests for different cases...
- Concurrency with multiple jobs running.
Bonus:
- Solve this issue without using a database.
- Don't use any third party web scraping frameworks.
- Think how this API will be consumed and what you might suggest to improve this.
- Documentation & Local repo.