OPS CodeDay: Dash Plotly Map + Graph
On August 10th, 2017, the Ontario Public Service hosted a Code Day, where public servants across ministries and levels of government hacked on work projects in little clusters, helping each other out and learning in the process.
My goal was to explore the dashboarding potential of plotly’s dash , a web-server framework combining Flask and plotly’s python library. I wanted to learn more about plotly and dash while exploring the potential for linking an interactive map with graphs, in this case 24-hr timeseries graph of the average travel times on the selected segment based on the City’s Bluetooth sensors (from the Open Data ). This post details what I accomplished during the day. It’s derived from a Jupyter Notebook which you can see in its full interactive glory here and the actual Notebook on Github here. Note, none of the interactive plots are in this post at the moment… (but they will be in iframes soon)
24-hr Plot
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import plotly.graph_objs as go
init_notebook_mode(connected=True)
I downloaded the 2017 data and the WGS84 shapefile for the routes. Since I’m remote I’m going to reimport the data into a PostgreSQL database on my local computer. After unzipping the shapefile, I imported it into the database using the following command in bash:
shp2pgsql -S -s 4326 -D bluetooth_routes_wgs84.shp bluetooth.routes | psql -d bigdata
The sql we used to create the data exports can be found on the City of Toronto’s Github. Note that, in order to make the exported data match as close as possible the data from the live feed, a little switcheroo was performed:
REPLACE(to_char( (datetime_bin AT TIME ZONE 'America/Toronto' + interval '5 minutes'), 'YYYY-MM-DD HH24:MI:SSOF'), ' ', 'T')
So this will have to be switched back in order to import that data into the database.
from psycopg2 import connect
import psycopg2.sql as pgsql
con = connect(database='bigdata')
import pandas
import pandas.io.sql as pandasql
import sqlalchemy
bt_2017 = pandas.read_csv('bt_2017.csv')
bt_2017
resultId | timeInSeconds | count | updated | |
---|---|---|---|---|
0 | J_I | 56 | 16 | 2017-01-01T00:05:00-05 |
1 | J_I | 60 | 9 | 2017-01-01T00:10:00-05 |
2 | J_I | 60 | 8 | 2017-01-01T00:15:00-05 |
3 | J_I | 60 | 8 | 2017-01-01T00:20:00-05 |
4 | J_I | 55 | 17 | 2017-01-01T00:25:00-05 |
5 | J_I | 53 | 15 | 2017-01-01T00:30:00-05 |
6 | J_I | 58 | 13 | 2017-01-01T00:35:00-05 |
7 | J_I | 55 | 18 | 2017-01-01T00:40:00-05 |
8 | J_I | 58 | 22 | 2017-01-01T00:45:00-05 |
9 | J_I | 54 | 19 | 2017-01-01T00:50:00-05 |
10 | J_I | 59 | 19 | 2017-01-01T00:55:00-05 |
11 | J_I | 57 | 11 | 2017-01-01T01:00:00-05 |
12 | J_I | 54 | 11 | 2017-01-01T01:05:00-05 |
13 | J_I | 59 | 15 | 2017-01-01T01:10:00-05 |
14 | J_I | 54 | 12 | 2017-01-01T01:15:00-05 |
15 | J_I | 57 | 23 | 2017-01-01T01:20:00-05 |
16 | J_I | 58 | 11 | 2017-01-01T01:25:00-05 |
17 | J_I | 51 | 15 | 2017-01-01T01:30:00-05 |
18 | J_I | 55 | 17 | 2017-01-01T01:35:00-05 |
19 | J_I | 53 | 8 | 2017-01-01T01:40:00-05 |
20 | J_I | 58 | 23 | 2017-01-01T01:45:00-05 |
21 | J_I | 52 | 21 | 2017-01-01T01:50:00-05 |
22 | J_I | 60 | 10 | 2017-01-01T01:55:00-05 |
23 | J_I | 57 | 10 | 2017-01-01T02:00:00-05 |
24 | J_I | 49 | 11 | 2017-01-01T02:05:00-05 |
25 | J_I | 55 | 13 | 2017-01-01T02:10:00-05 |
26 | J_I | 50 | 13 | 2017-01-01T02:15:00-05 |
27 | J_I | 52 | 19 | 2017-01-01T02:20:00-05 |
28 | J_I | 56 | 11 | 2017-01-01T02:25:00-05 |
29 | J_I | 52 | 10 | 2017-01-01T02:30:00-05 |
... | ... | ... | ... | ... |
368339 | BR2_BR3 | 354 | 1 | 2017-01-31T19:05:00-05 |
368340 | BR2_BR3 | 484 | 1 | 2017-01-31T19:10:00-05 |
368341 | BR2_BR3 | 390 | 1 | 2017-01-31T19:15:00-05 |
368342 | BR2_BR3 | 281 | 2 | 2017-01-31T19:25:00-05 |
368343 | BR2_BR3 | 300 | 1 | 2017-01-31T19:30:00-05 |
368344 | BR2_BR3 | 300 | 1 | 2017-01-31T19:35:00-05 |
368345 | BR2_BR3 | 249 | 3 | 2017-01-31T19:50:00-05 |
368346 | BR2_BR3 | 275 | 1 | 2017-01-31T20:10:00-05 |
368347 | BR2_BR3 | 274 | 1 | 2017-01-31T20:15:00-05 |
368348 | BR2_BR3 | 252 | 1 | 2017-01-31T20:20:00-05 |
368349 | BR2_BR3 | 256 | 2 | 2017-01-31T20:25:00-05 |
368350 | BR2_BR3 | 243 | 4 | 2017-01-31T20:30:00-05 |
368351 | BR2_BR3 | 183 | 1 | 2017-01-31T20:40:00-05 |
368352 | BR2_BR3 | 266 | 1 | 2017-01-31T20:45:00-05 |
368353 | BR2_BR3 | 210 | 1 | 2017-01-31T20:50:00-05 |
368354 | BR2_BR3 | 226 | 1 | 2017-01-31T21:00:00-05 |
368355 | BR2_BR3 | 153 | 1 | 2017-01-31T21:10:00-05 |
368356 | BR2_BR3 | 229 | 3 | 2017-01-31T21:25:00-05 |
368357 | BR2_BR3 | 202 | 1 | 2017-01-31T21:30:00-05 |
368358 | BR2_BR3 | 259 | 3 | 2017-01-31T21:45:00-05 |
368359 | BR2_BR3 | 240 | 1 | 2017-01-31T21:50:00-05 |
368360 | BR2_BR3 | 216 | 1 | 2017-01-31T22:10:00-05 |
368361 | BR2_BR3 | 255 | 1 | 2017-01-31T22:15:00-05 |
368362 | BR2_BR3 | 241 | 1 | 2017-01-31T22:20:00-05 |
368363 | BR2_BR3 | 249 | 4 | 2017-01-31T22:30:00-05 |
368364 | BR2_BR3 | 248 | 2 | 2017-01-31T22:45:00-05 |
368365 | BR2_BR3 | 236 | 2 | 2017-01-31T22:55:00-05 |
368366 | BR2_BR3 | 328 | 2 | 2017-01-31T23:05:00-05 |
368367 | BR2_BR3 | 644 | 1 | 2017-01-31T23:35:00-05 |
368368 | BR2_BR3 | 202 | 1 | 2017-01-31T23:55:00-05 |
368369 rows × 4 columns
bt_2017['observed_time'] = bt_2017['updated'].str.replace('T', ' ')
del bt_2017['updated']
bt_2017.to_csv('bt_2017_fixed.csv', index=False)
I imported the fixed csv into PostgreSQL and checked the span of the data (January 2017). So we will create an average 24hr weekday plot for a segment on Richmond, BR2_BR3
.
weekday_avg_sql = pgsql.SQL('''
SELECT updated::TIME AS "Time", AVG(timeinseconds)
FROM bluetooth.bt_2017
WHERE resultid = {resultid} and EXTRACT('isodow' FROM updated) <6
GROUP BY "Time"
ORDER BY "Time" ''')
richmond_jan = pandasql.read_sql(weekday_avg_sql.format(resultid=pgsql.Literal('BR2_BR3')), con)
trace = go.Scatter(x=richmond_jan['Time'],
y=richmond_jan['avg'],
mode='lines')
layout = dict(title = "Average Weekday Travel Times",
xaxis = dict(title="Time of Day"),
yaxis = dict(title="Travel Time (s)"))
iplot(dict(data=[trace], layout=layout))
Great! Now testing to run this in Dash I saved this notebook as a python file and then added Dash imports:
import dash
from dash.dependencies import Input, Output
import dash_core_components as dcc
import dash_html_components as html
app = dash.Dash()
And then adding to the bottom:
app.layout = html.Div(children=[dcc.Graph(id='travel-time-graph',
figure=dict(data=[trace], layout=layout))])
if __name__ == '__main__':
app.run_server(debug=True)
Finally running the Dash app from the command-line with python graph_map.py
and opening up a browser… surprise! it works!
(yes I know there’s an issue with missing data at 2:30 AM, I’ll deal with that after)
The map
Now that we’ve created the simple line graph, let’s see if we can display the segments on a plotly map. We’re going to try to read the geometry from the database. From the plotly mapping documentation latitude and longitude have to be passed separately as:
lon (list, numpy array, or Pandas series of numbers, strings, or datetimes.)
This isn’t really how most geographic libraries operate, instead requiring arrays of coordinates, so this approach might be a little… hacky.
geometry_sql = pgsql.SQL('''SELECT resultid, ST_ASGeoJSON(geom) geojson
FROM bluetooth.routes
WHERE resultid = {resultid} ''')
resultid = pgsql.Literal('BR2_BR3')
richmond_df = pandasql.read_sql(geometry_sql.format(resultid=resultid), con)
I’ll be basing myself on this tutorial which, honestly, is pretty hacky.
import json
row = richmond_df.iloc[[0]]
# row['geojson'][0]
geojson = json.loads(row.geojson)
geojson['coordinates'][0]
[-79.3691483292635, 43.6533779496959]
def get_lat_lon(geojson):
lons, lats = [], []
for coord in geojson['coordinates']:
lons.append(coord[0])
lats.append(coord[1])
return lats, lons
segments = []
for row in richmond_df.itertuples():
geojson = json.loads(row.geojson)
lats, lons = get_lat_lon(geojson)
segments.append( dict(type = 'scattergeo',
lon = lons, lat = lats,
mode = 'lines',
line = dict(width = 2)
))
layout = dict(
title = 'Bluetooth segments',
showlegend = False,
geo = dict(
resolution=50,
showland = True,
showlakes = True,
showcountries = True,
scope = "North America",
projection = dict(type = "mercator"),
lonaxis = dict( range = [-79.39, -79.36]),
lataxis = dict( range = [43.64, 43.66])
)
)
iplot(dict(data=segments,layout=layout))
mapbox_token = 'pk.eyJ1IjoicmVtb3RlZ2VudHJpZnkiLCJhIjoiY2lnanJzMjJpMDA1dnYxbHo5MTZtdGZsYSJ9.gLE8d40zmDAtMSSZyd2h1Q'
data = go.Data([
go.Scattermapbox(
lat=lats,
lon=lons,
mode='lines',
hoverinfo='text',
showlegend=False
)])
layout = go.Layout(
title='Bluetooth Segments',
autosize=True,
hovermode='closest',
mapbox=dict(
accesstoken=mapbox_token,
bearing=0,
center=dict(
lat=43.65,
lon=-79.37
),
pitch=0,
zoom=12,
style='light'
)
)
iplot(dict(data=data,layout=layout))
It works!!
Next Steps
I still have to test out the Mapbox map in Dash. And then the final step is to test linking the two together using Dash’s callbacks and then throw more data and more interactivity options.
Let me know what you think of this article on twitter @dumasraphael!