Skip to content
This repository has been archived by the owner on Mar 23, 2019. It is now read-only.

tomkerkhove/analyzing-stackexchange-with-azure-data-lake

Repository files navigation

Analyzing StackExchange with Azure Data Lake

This repository contains all the code & scripts for my 'Analyzing StackExchange data with Azure Data Lake' talk. This talk highlights the power of Azure Data Lake Store & Analytics and how they can be the center of your big data ecosystem.

Data Lake in Ecosystem

During the talk I used a StackExchange data dump to demo the loading, storing, processing and visualizing data with Azure Data Lake Store, Data Lake Analytics & Power BI.

Demo Scenario

Getting the data sets

StackExchange

Stack Exchange has made their data available from all their websites under Creative Commons license. It includes data about users, posts, comments, votes, etc for every single site.

Stack Exchange Logo

This data is used as a demo set since this reflect real-world data. The data contains information about every website by StackExchange going from users & posts to comments and votes and beyond.

Here is an example of how the folder for coffee-stackexchange-com is structured:

+ coffee-stackexchange-com
	- Badges.xml
	- Comments.xml
	- PostHistory.xml
	- PostLinks.xml
	- Posts.xml
	- Tags.xml
	- Users.xml
	- Votes.xml

You can find the coffee-stackexchange-com sample here, download all the data here or more information on StackExchange.

Reference data

The demo uses a CSV representing all the countries defined by ISO 3166. This can be found at lukes/ISO-3166-Countries-with-Regional-Codes.

Alternatives

Not a fan of this data set? caesar0301/awesome-public-datasets contains a ton of alternatives.

Learn more about Data Lakes & Azure Data Lake

  • Azure Data Lake GitHub repository (link)
  • U-SQL Documention (link)
  • "Introducing Azure Data Lake" Microsoft Virtual Acadamey (link)
  • U-SQL Tutorials (link)
  • Comparison between Azure Blob Storage & Azure Data Lake Store (link)
  • Martin Fowler on Data Lakes (link)
  • "Mastering Azure Analytics" by Zoiner Tejada (link)

Mastering Azure Analytics

License

Licensed under the terms of the MIT license.

About

Analyzing StackExchange data with Azure Data Lake

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published