Skip to content

jenningsanderson/GIS3-Sandy-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hurricane Sandy Geo-Coded Tweets Project

Project Lead: Jennings Anderson

Members: Andrew Hardin, Ellie Falletta

Project EPIC, Geography 5303

This project looks at all of the Geo-Coded tweets from Hurricane Sandy

##About Somewhere on order of 1% of tweets are geo-tagged. What can be learned about a Twitterer's movement behavior during Hurricane Sandy?

##Dependencies

Ruby Requirements

gem install georuby
gem install rgeo-shapefile
gem install bson
gem install mongo
gem install bson_ext

####Mongo Connection The data for this project is held on Project EPIC's local analytics server on the CU campus. There are multiple collections created under the sandygeo database

  • edited_tweets: The main collection of tweets cut to the study timeframe: October 20 to November 7, 2012. Each document is a full tweet, as retrieved from the Twitter API.

  • coastal_users: The final collection of 17,627 users that were identified as having a tweet within the highly affected eastern seaboard area as defined by FEMA.

  • after_sandy: Tweets between October 1, 2012 and October 22, 2012 that were excluded from the project analysis.

  • before_sandy: Tweets between November 7, 2012 and December 1, 2012 that were excluded from the project analysis.

  • tweets: The original 260,859 geo-coded tweets extracted from the 22 million tweet keyword collection. Used to identify geo-coding Twitterers for contextual stream fetching.

  • userpaths: Distinct paths for 32,842 users. Each document contains an array of tweets where each tweet has date, text, entities, and place information. A GeoJSON Linestring Object exists for each user that tracks the user's path.

  • user_indiv_tweets: Similar to userpaths, but not a Linestring, instead each individual tweet with place, text, and timestamp as properties.

  • most_impacted_users:

##Project Conventions

  • Users are referenced by their id (A long number). If a user has multiple screen names as their tweets are aggregated, their handle that is written will be a string of unique usernames separated by commas.

##Project Directories

###fileio/ #####kml_output.rb Writes

#####identified_users.kml

#####tweet_io.rb Includes the two classes for interacting with Mongo. SandyMongoClient creates an object for querying the database and returning tweets based on various parameters and Tweet_JSON_Reader reads and imports to mongo, a text file containing valid JSON tweets, separated by newlines. The main runtime for this script runs an import; however, other scripts use the SandyMongoClient class for interacting with the database.

#####write_geojson.rb Uses the json library to write valid geojson from a variety of inputs. The main runtime of this script will write both a tweet and a userpath geojson file from a users collection.

#####write_user_tweets_geojson.rb Requires the write_geojson.rb script to generate a folder containing valid geojson objects for each user's tweets.

#####tweet_shape.rb Uses the georuby library to create shapefiles from Tweets. This functionality is deprecated because creating shapefiles for viewing the data is less convenient than KML or GeoJSON files.

###mongo/ #####linestring_reduce.js Map reduce function to generate the usertracks collection from the edited_tweets collection.

###extract_scripts/ #####tracks.rb Write two shapefiles from the collection, one of linestrings for each user, representing their path and one of just the the tweets as points.

#####geo_bounded_tracks.rb Performs the same task as tracks.rb, but allows for geo-sensitive queries.

#####mongo_extractor.rb A very simple Mongo --> Shapefile script for quick visualizations of data.

#####find_users_within_area.rb A cleaner, more robust version of geo_bounded_tracks.rb, built for a bounding box of any shape polygon.

###parsers/ #####extract_geo_json.rb Line by line parsing of a text file of JSON tweets delimitated by newlines. Identifies tweets which are geotagged and writes them to a separate text file of the same format (JSON tweets separated by \n character)

#####get_geo_contextual.rb Parses contextual stream text file, extrating geo-tagged tweets and inserting them into a Mongo collection. The filepaths to the contextual streams are built dynamically based on the username that the script collects.

#####reformat_date.rb A small helper function to reformat the string date to an ISOdate so that Mongo recognizes it. Should be built into import -- otherwise, it's deprecated. A javascript loop in the Mongo shell is more convenient.

#####user_indiv_path.rb Creates a new collection where each document represents a single user and their tweet coordinates are stored as line strings to observe their movement path.

#####user_indiv_tweet.rb Creates a new collection where each document represents a single user. Their tweet coordinates are stored as points.

#####user_node_collection.rb (Unfinished) Store a user's tweets in 3 timebins: before, during, after

#####user_track_parser.rb (Deprecated) Performs similar function to user_indiv_path.rb

###analysis/ #####Twitter_In_Evac.py A python script that uses ArcPy to parse a CSV of before, during, and after locations for a particular user and perform comparisons of these locations to known evacuation zones. Performs set intersect operations on lists of users to determine who sheltered in place in an evacuation zone.

###userAnalysis/ The Visual Studio project that take the extracted users and outputs diagnostic files, such as a KML, a CSV of perimeters, and each user's median points.

About

Public repository for Hurricane Sandy Project for GIS3

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published