OPS CodeDay: Dash Plotly Map + Graph

by on under tutorial
11 minute read

On August 10th, 2017, the Ontario Public Service hosted a Code Day, where public servants across ministries and levels of government hacked on work projects in little clusters, helping each other out and learning in the process.

My goal was to explore the dashboarding potential of plotly’s dash , a web-server framework combining Flask and plotly’s python library. I wanted to learn more about plotly and dash while exploring the potential for linking an interactive map with graphs, in this case 24-hr timeseries graph of the average travel times on the selected segment based on the City’s Bluetooth sensors (from the Open Data ). This post details what I accomplished during the day. It’s derived from a Jupyter Notebook which you can see in its full interactive glory here and the actual Notebook on Github here. Note, none of the interactive plots are in this post at the moment… (but they will be in iframes soon)

24-hr Plot

from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import plotly.graph_objs as go
init_notebook_mode(connected=True)

I downloaded the 2017 data and the WGS84 shapefile for the routes. Since I’m remote I’m going to reimport the data into a PostgreSQL database on my local computer. After unzipping the shapefile, I imported it into the database using the following command in bash:

shp2pgsql -S -s 4326 -D bluetooth_routes_wgs84.shp bluetooth.routes | psql -d bigdata

The sql we used to create the data exports can be found on the City of Toronto’s Github. Note that, in order to make the exported data match as close as possible the data from the live feed, a little switcheroo was performed:

REPLACE(to_char( (datetime_bin AT TIME ZONE 'America/Toronto' + interval '5 minutes'), 'YYYY-MM-DD HH24:MI:SSOF'), ' ', 'T')

So this will have to be switched back in order to import that data into the database.

from psycopg2 import connect
import psycopg2.sql as pgsql
con = connect(database='bigdata')
import pandas
import pandas.io.sql as pandasql
import sqlalchemy
bt_2017 = pandas.read_csv('bt_2017.csv')
bt_2017
resultId timeInSeconds count updated
0 J_I 56 16 2017-01-01T00:05:00-05
1 J_I 60 9 2017-01-01T00:10:00-05
2 J_I 60 8 2017-01-01T00:15:00-05
3 J_I 60 8 2017-01-01T00:20:00-05
4 J_I 55 17 2017-01-01T00:25:00-05
5 J_I 53 15 2017-01-01T00:30:00-05
6 J_I 58 13 2017-01-01T00:35:00-05
7 J_I 55 18 2017-01-01T00:40:00-05
8 J_I 58 22 2017-01-01T00:45:00-05
9 J_I 54 19 2017-01-01T00:50:00-05
10 J_I 59 19 2017-01-01T00:55:00-05
11 J_I 57 11 2017-01-01T01:00:00-05
12 J_I 54 11 2017-01-01T01:05:00-05
13 J_I 59 15 2017-01-01T01:10:00-05
14 J_I 54 12 2017-01-01T01:15:00-05
15 J_I 57 23 2017-01-01T01:20:00-05
16 J_I 58 11 2017-01-01T01:25:00-05
17 J_I 51 15 2017-01-01T01:30:00-05
18 J_I 55 17 2017-01-01T01:35:00-05
19 J_I 53 8 2017-01-01T01:40:00-05
20 J_I 58 23 2017-01-01T01:45:00-05
21 J_I 52 21 2017-01-01T01:50:00-05
22 J_I 60 10 2017-01-01T01:55:00-05
23 J_I 57 10 2017-01-01T02:00:00-05
24 J_I 49 11 2017-01-01T02:05:00-05
25 J_I 55 13 2017-01-01T02:10:00-05
26 J_I 50 13 2017-01-01T02:15:00-05
27 J_I 52 19 2017-01-01T02:20:00-05
28 J_I 56 11 2017-01-01T02:25:00-05
29 J_I 52 10 2017-01-01T02:30:00-05
... ... ... ... ...
368339 BR2_BR3 354 1 2017-01-31T19:05:00-05
368340 BR2_BR3 484 1 2017-01-31T19:10:00-05
368341 BR2_BR3 390 1 2017-01-31T19:15:00-05
368342 BR2_BR3 281 2 2017-01-31T19:25:00-05
368343 BR2_BR3 300 1 2017-01-31T19:30:00-05
368344 BR2_BR3 300 1 2017-01-31T19:35:00-05
368345 BR2_BR3 249 3 2017-01-31T19:50:00-05
368346 BR2_BR3 275 1 2017-01-31T20:10:00-05
368347 BR2_BR3 274 1 2017-01-31T20:15:00-05
368348 BR2_BR3 252 1 2017-01-31T20:20:00-05
368349 BR2_BR3 256 2 2017-01-31T20:25:00-05
368350 BR2_BR3 243 4 2017-01-31T20:30:00-05
368351 BR2_BR3 183 1 2017-01-31T20:40:00-05
368352 BR2_BR3 266 1 2017-01-31T20:45:00-05
368353 BR2_BR3 210 1 2017-01-31T20:50:00-05
368354 BR2_BR3 226 1 2017-01-31T21:00:00-05
368355 BR2_BR3 153 1 2017-01-31T21:10:00-05
368356 BR2_BR3 229 3 2017-01-31T21:25:00-05
368357 BR2_BR3 202 1 2017-01-31T21:30:00-05
368358 BR2_BR3 259 3 2017-01-31T21:45:00-05
368359 BR2_BR3 240 1 2017-01-31T21:50:00-05
368360 BR2_BR3 216 1 2017-01-31T22:10:00-05
368361 BR2_BR3 255 1 2017-01-31T22:15:00-05
368362 BR2_BR3 241 1 2017-01-31T22:20:00-05
368363 BR2_BR3 249 4 2017-01-31T22:30:00-05
368364 BR2_BR3 248 2 2017-01-31T22:45:00-05
368365 BR2_BR3 236 2 2017-01-31T22:55:00-05
368366 BR2_BR3 328 2 2017-01-31T23:05:00-05
368367 BR2_BR3 644 1 2017-01-31T23:35:00-05
368368 BR2_BR3 202 1 2017-01-31T23:55:00-05

368369 rows × 4 columns

bt_2017['observed_time'] = bt_2017['updated'].str.replace('T', ' ')

del bt_2017['updated']

bt_2017.to_csv('bt_2017_fixed.csv', index=False)


I imported the fixed csv into PostgreSQL and checked the span of the data (January 2017). So we will create an average 24hr weekday plot for a segment on Richmond, BR2_BR3.

weekday_avg_sql = pgsql.SQL('''
SELECT updated::TIME AS "Time", AVG(timeinseconds)
FROM bluetooth.bt_2017
WHERE resultid = {resultid} and EXTRACT('isodow' FROM updated) <6
GROUP BY "Time" 
ORDER BY "Time" ''')
richmond_jan = pandasql.read_sql(weekday_avg_sql.format(resultid=pgsql.Literal('BR2_BR3')), con)
trace = go.Scatter(x=richmond_jan['Time'],
                   y=richmond_jan['avg'],
                   mode='lines')
layout = dict(title = "Average Weekday Travel Times",
              xaxis = dict(title="Time of Day"),
              yaxis = dict(title="Travel Time (s)"))
iplot(dict(data=[trace], layout=layout))

Great! Now testing to run this in Dash I saved this notebook as a python file and then added Dash imports:

import dash
from dash.dependencies import Input, Output
import dash_core_components as dcc
import dash_html_components as html

app = dash.Dash()

And then adding to the bottom:

app.layout = html.Div(children=[dcc.Graph(id='travel-time-graph',
                                          figure=dict(data=[trace], layout=layout))])

if __name__ == '__main__':
    app.run_server(debug=True)

Finally running the Dash app from the command-line with python graph_map.py and opening up a browser… surprise! it works!

(yes I know there’s an issue with missing data at 2:30 AM, I’ll deal with that after)

The map

Now that we’ve created the simple line graph, let’s see if we can display the segments on a plotly map. We’re going to try to read the geometry from the database. From the plotly mapping documentation latitude and longitude have to be passed separately as:

lon (list, numpy array, or Pandas series of numbers, strings, or datetimes.)

This isn’t really how most geographic libraries operate, instead requiring arrays of coordinates, so this approach might be a little… hacky.

geometry_sql = pgsql.SQL('''SELECT resultid, ST_ASGeoJSON(geom) geojson
FROM bluetooth.routes
WHERE resultid = {resultid} ''')
resultid = pgsql.Literal('BR2_BR3')
richmond_df = pandasql.read_sql(geometry_sql.format(resultid=resultid), con)

I’ll be basing myself on this tutorial which, honestly, is pretty hacky.

import json
row = richmond_df.iloc[[0]]
# row['geojson'][0]
geojson = json.loads(row.geojson)
geojson['coordinates'][0]
[-79.3691483292635, 43.6533779496959]
def get_lat_lon(geojson):
    lons, lats = [], []
    for coord in geojson['coordinates']:
        lons.append(coord[0])
        lats.append(coord[1])
    return lats, lons
segments = []

for row in richmond_df.itertuples():
    geojson = json.loads(row.geojson)
    lats, lons = get_lat_lon(geojson)
    segments.append( dict(type = 'scattergeo',
                          lon = lons, lat = lats,
                          mode = 'lines',
                          line = dict(width = 2)
                         ))
layout = dict(
        title = 'Bluetooth segments',
        showlegend = False,
        geo = dict(
            resolution=50,
            showland = True,
            showlakes = True,
            showcountries = True,
            scope = "North America",
            projection = dict(type = "mercator"),
            lonaxis = dict( range = [-79.39, -79.36]),
            lataxis = dict( range = [43.64, 43.66])
        )
    )
iplot(dict(data=segments,layout=layout))
mapbox_token = 'pk.eyJ1IjoicmVtb3RlZ2VudHJpZnkiLCJhIjoiY2lnanJzMjJpMDA1dnYxbHo5MTZtdGZsYSJ9.gLE8d40zmDAtMSSZyd2h1Q'


data = go.Data([
    go.Scattermapbox(
        lat=lats,
        lon=lons,
        mode='lines',
        hoverinfo='text',
        showlegend=False
    )])
    
layout = go.Layout(
    title='Bluetooth Segments',
    autosize=True,
    hovermode='closest',
    mapbox=dict(
        accesstoken=mapbox_token,
        bearing=0,
        center=dict(
            lat=43.65,
            lon=-79.37
        ),
        pitch=0,
        zoom=12,
        style='light'
    )
)    

iplot(dict(data=data,layout=layout))

It works!!

Next Steps

I still have to test out the Mapbox map in Dash. And then the final step is to test linking the two together using Dash’s callbacks and then throw more data and more interactivity options.

dash, plotly, python, visualization