Trying to develop a combination map + 24-hr line graph using plotly's dash . The map will show route segments from the City's Bluetooth sensors (from the Open Data and when the user clicks on a given segment, a plot of the average travel times over 24 hours will be displayed for that segment.
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import plotly.graph_objs as go
init_notebook_mode(connected=True)
I downloaded the 2017 data and the WGS84 shapefile for the routes. Since I'm remote I'm going to reimport the data into a PostgreSQL database on my local computer. After unzipping the shapefile, I imported it into the database using the following command in bash:
shp2pgsql -S -s 4326 -D bluetooth_routes_wgs84.shp bluetooth.routes | psql -d bigdata
The sql we used to create the data exports can be found on the City of Toronto's Github. Note that, in order to make the exported data match as close as possible the data from the live feed, a little switcheroo was performed:
REPLACE(to_char( (datetime_bin AT TIME ZONE 'America/Toronto' + interval '5 minutes'), 'YYYY-MM-DD HH24:MI:SSOF'), ' ', 'T')
So this will have to be switched back in order to import that data into the database.
from psycopg2 import connect
import psycopg2.sql as pgsql
con = connect(database='bigdata')
import pandas
import pandas.io.sql as pandasql
import sqlalchemy
bt_2017 = pandas.read_csv('bt_2017.csv')
bt_2017
bt_2017['observed_time'] = bt_2017['updated'].str.replace('T', ' ')
del bt_2017['updated']
bt_2017.to_csv('bt_2017_fixed.csv', index=False)
I imported the fixed csv into PostgreSQL and checked the span of the data (January 2017). So we will create an average 24hr weekday plot for a segment on Richmond, BR2_BR3
.
weekday_avg_sql = pgsql.SQL('''
SELECT updated::TIME AS "Time", AVG(timeinseconds)
FROM bluetooth.bt_2017
WHERE resultid = {resultid} and EXTRACT('isodow' FROM updated) <6
GROUP BY "Time"
ORDER BY "Time" ''')
richmond_jan = pandasql.read_sql(weekday_avg_sql.format(resultid=pgsql.Literal('BR2_BR3')), con)
trace = go.Scatter(x=richmond_jan['Time'],
y=richmond_jan['avg'],
mode='lines')
layout = dict(title = "Average Weekday Travel Times",
xaxis = dict(title="Time of Day"),
yaxis = dict(title="Travel Time (s)"))
iplot(dict(data=[trace], layout=layout), filename='graph1.html')
plot(dict(data=[trace], layout=layout), filename='graph1.html', auto_open=False)
Great! Now testing to run this in Dash I saved this notebook as a python file and then added Dash imports:
import dash
from dash.dependencies import Input, Output
import dash_core_components as dcc
import dash_html_components as html
app = dash.Dash()
And then adding to the bottom:
app.layout = html.Div(children=[dcc.Graph(id='travel-time-graph',
figure=dict(data=[trace], layout=layout))])
if __name__ == '__main__':
app.run_server(debug=True)
Finally running the Dash app from the command-line with python graph_map.py
and opening up a browser... surprise! it works!
(yes I know there's an issue with missing data at 2:30 AM, I'll deal with that after)
Now that we've created the simple line graph, let's see if we can display the segments on a plotly map. We're going to try to read the geometry from the database. From the plotly mapping documentation latitude and longitude have to be passed separately as:
lon (list, numpy array, or Pandas series of numbers, strings, or datetimes.)
This isn't really how most geographic libraries operate, instead requiring arrays of coordinates, so this approach might be a little... hacky.
geometry_sql = pgsql.SQL('''SELECT resultid, ST_ASGeoJSON(geom) geojson
FROM bluetooth.routes
WHERE resultid = {resultid} ''')
resultid = pgsql.Literal('BR2_BR3')
richmond_df = pandasql.read_sql(geometry_sql.format(resultid=resultid), con)
I'll be basing myself on this tutorial which, honestly, is pretty hacky.
import json
row = richmond_df.iloc[[0]]
# row['geojson'][0]
geojson = json.loads(row.geojson[0])
geojson['coordinates'][0]
def get_lat_lon(geojson):
lons, lats = [], []
for coord in geojson['coordinates']:
lons.append(coord[0])
lats.append(coord[1])
return lats, lons
segments = []
for row in richmond_df.itertuples():
geojson = json.loads(row.geojson)
lats, lons = get_lat_lon(geojson)
segments.append( dict(type = 'scattergeo',
lon = lons, lat = lats,
mode = 'lines',
line = dict(width = 2)
))
layout = dict(
title = 'Bluetooth segments',
showlegend = False,
geo = dict(
resolution=50,
showland = True,
showlakes = True,
showcountries = True,
scope = "North America",
projection = dict(type = "mercator"),
lonaxis = dict( range = [-79.39, -79.36]),
lataxis = dict( range = [43.64, 43.66])
)
)
iplot(dict(data=segments,layout=layout), filename='map1.html')
plot(dict(data=segments,layout=layout), filename='map1.html')
mapbox_token = 'pk.eyJ1IjoicmVtb3RlZ2VudHJpZnkiLCJhIjoiY2lnanJzMjJpMDA1dnYxbHo5MTZtdGZsYSJ9.gLE8d40zmDAtMSSZyd2h1Q'
data = go.Data([
go.Scattermapbox(
lat=lats,
lon=lons,
mode='lines',
hoverinfo='text',
showlegend=False
)])
layout = go.Layout(
title='Bluetooth Segments',
autosize=True,
hovermode='closest',
mapbox=dict(
accesstoken=mapbox_token,
bearing=0,
center=dict(
lat=43.65,
lon=-79.37
),
pitch=0,
zoom=12,
style='light'
)
)
iplot(dict(data=data,layout=layout), filename='jupyter/map_better')
plot(dict(data=data,layout=layout), filename='map_better.html')