Recently a massive dataset of NYC Taxi Data was made public. There are torrents available but at 19gb the data can be quite unwieldy to manage on a home machine. /r/BigQuery have uploaded the dataset to Google’s BigQuery service.
BQ provides a simple way to get insights out of this dataset without tearing through your internet usage or waiting for your home machine to query 173 million records. For example on reddit they have already discovered some anonymization issues.
I’ve taken some of the popular Queries and charted them.
Histogram of tips as a % of fare.
1 2 3 4 5 6 |
|
Average Speed Over Hour.
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Average Tip Over Month.
1 2 3 4 5 6 |
|