OPS CodeDay: Dash Plotly Map + Graph

by Raphael Dumas on August 10, 2017 under tutorial

11 minute read

On August 10th, 2017, the Ontario Public Service hosted a Code Day, where public servants across ministries and levels of government hacked on work projects in little clusters, helping each other out and learning in the process.

My goal was to explore the dashboarding potential of plotly’s dash , a web-server framework combining Flask and plotly’s python library. I wanted to learn more about plotly and dash while exploring the potential for linking an interactive map with graphs, in this case 24-hr timeseries graph of the average travel times on the selected segment based on the City’s Bluetooth sensors (from the Open Data ). This post details what I accomplished during the day. It’s derived from a Jupyter Notebook which you can see in its full interactive glory here and the actual Notebook on Github here. Note, none of the interactive plots are in this post at the moment… (but they will be in iframes soon)

24-hr Plot

from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import plotly.graph_objs as go
init_notebook_mode(connected=True)

I downloaded the 2017 data and the WGS84 shapefile for the routes. Since I’m remote I’m going to reimport the data into a PostgreSQL database on my local computer. After unzipping the shapefile, I imported it into the database using the following command in bash:

shp2pgsql -S -s 4326 -D bluetooth_routes_wgs84.shp bluetooth.routes | psql -d bigdata

The sql we used to create the data exports can be found on the City of Toronto’s Github. Note that, in order to make the exported data match as close as possible the data from the live feed, a little switcheroo was performed:

REPLACE(to_char( (datetime_bin AT TIME ZONE 'America/Toronto' + interval '5 minutes'), 'YYYY-MM-DD HH24:MI:SSOF'), ' ', 'T')

So this will have to be switched back in order to import that data into the database.

from psycopg2 import connect
import psycopg2.sql as pgsql
con = connect(database='bigdata')

import pandas
import pandas.io.sql as pandasql
import sqlalchemy

bt_2017 = pandas.read_csv('bt_2017.csv')
bt_2017

	resultId	timeInSeconds	count	updated
0	J_I	56	16	2017-01-01T00:05:00-05
1	J_I	60	9	2017-01-01T00:10:00-05
2	J_I	60	8	2017-01-01T00:15:00-05
3	J_I	60	8	2017-01-01T00:20:00-05
4	J_I	55	17	2017-01-01T00:25:00-05
5	J_I	53	15	2017-01-01T00:30:00-05
6	J_I	58	13	2017-01-01T00:35:00-05
7	J_I	55	18	2017-01-01T00:40:00-05
8	J_I	58	22	2017-01-01T00:45:00-05
9	J_I	54	19	2017-01-01T00:50:00-05
10	J_I	59	19	2017-01-01T00:55:00-05
11	J_I	57	11	2017-01-01T01:00:00-05
12	J_I	54	11	2017-01-01T01:05:00-05
13	J_I	59	15	2017-01-01T01:10:00-05
14	J_I	54	12	2017-01-01T01:15:00-05
15	J_I	57	23	2017-01-01T01:20:00-05
16	J_I	58	11	2017-01-01T01:25:00-05
17	J_I	51	15	2017-01-01T01:30:00-05
18	J_I	55	17	2017-01-01T01:35:00-05
19	J_I	53	8	2017-01-01T01:40:00-05
20	J_I	58	23	2017-01-01T01:45:00-05
21	J_I	52	21	2017-01-01T01:50:00-05
22	J_I	60	10	2017-01-01T01:55:00-05
23	J_I	57	10	2017-01-01T02:00:00-05
24	J_I	49	11	2017-01-01T02:05:00-05
25	J_I	55	13	2017-01-01T02:10:00-05
26	J_I	50	13	2017-01-01T02:15:00-05
27	J_I	52	19	2017-01-01T02:20:00-05
28	J_I	56	11	2017-01-01T02:25:00-05
29	J_I	52	10	2017-01-01T02:30:00-05
...	...	...	...	...
368339	BR2_BR3	354	1	2017-01-31T19:05:00-05
368340	BR2_BR3	484	1	2017-01-31T19:10:00-05
368341	BR2_BR3	390	1	2017-01-31T19:15:00-05
368342	BR2_BR3	281	2	2017-01-31T19:25:00-05
368343	BR2_BR3	300	1	2017-01-31T19:30:00-05
368344	BR2_BR3	300	1	2017-01-31T19:35:00-05
368345	BR2_BR3	249	3	2017-01-31T19:50:00-05
368346	BR2_BR3	275	1	2017-01-31T20:10:00-05
368347	BR2_BR3	274	1	2017-01-31T20:15:00-05
368348	BR2_BR3	252	1	2017-01-31T20:20:00-05
368349	BR2_BR3	256	2	2017-01-31T20:25:00-05
368350	BR2_BR3	243	4	2017-01-31T20:30:00-05
368351	BR2_BR3	183	1	2017-01-31T20:40:00-05
368352	BR2_BR3	266	1	2017-01-31T20:45:00-05
368353	BR2_BR3	210	1	2017-01-31T20:50:00-05
368354	BR2_BR3	226	1	2017-01-31T21:00:00-05
368355	BR2_BR3	153	1	2017-01-31T21:10:00-05
368356	BR2_BR3	229	3	2017-01-31T21:25:00-05
368357	BR2_BR3	202	1	2017-01-31T21:30:00-05
368358	BR2_BR3	259	3	2017-01-31T21:45:00-05
368359	BR2_BR3	240	1	2017-01-31T21:50:00-05
368360	BR2_BR3	216	1	2017-01-31T22:10:00-05
368361	BR2_BR3	255	1	2017-01-31T22:15:00-05
368362	BR2_BR3	241	1	2017-01-31T22:20:00-05
368363	BR2_BR3	249	4	2017-01-31T22:30:00-05
368364	BR2_BR3	248	2	2017-01-31T22:45:00-05
368365	BR2_BR3	236	2	2017-01-31T22:55:00-05
368366	BR2_BR3	328	2	2017-01-31T23:05:00-05
368367	BR2_BR3	644	1	2017-01-31T23:35:00-05
368368	BR2_BR3	202	1	2017-01-31T23:55:00-05

368369 rows × 4 columns

bt_2017['observed_time'] = bt_2017['updated'].str.replace('T', ' ')

del bt_2017['updated']

bt_2017.to_csv('bt_2017_fixed.csv', index=False)

I imported the fixed csv into PostgreSQL and checked the span of the data (January 2017). So we will create an average 24hr weekday plot for a segment on Richmond, BR2_BR3.

weekday_avg_sql = pgsql.SQL('''
SELECT updated::TIME AS "Time", AVG(timeinseconds)
FROM bluetooth.bt_2017
WHERE resultid = {resultid} and EXTRACT('isodow' FROM updated) <6
GROUP BY "Time" 
ORDER BY "Time" ''')

richmond_jan = pandasql.read_sql(weekday_avg_sql.format(resultid=pgsql.Literal('BR2_BR3')), con)

trace = go.Scatter(x=richmond_jan['Time'],
                   y=richmond_jan['avg'],
                   mode='lines')
layout = dict(title = "Average Weekday Travel Times",
              xaxis = dict(title="Time of Day"),
              yaxis = dict(title="Travel Time (s)"))

iplot(dict(data=[trace], layout=layout))

Great! Now testing to run this in Dash I saved this notebook as a python file and then added Dash imports:

import dash
from dash.dependencies import Input, Output
import dash_core_components as dcc
import dash_html_components as html

app = dash.Dash()

And then adding to the bottom:

app.layout = html.Div(children=[dcc.Graph(id='travel-time-graph',
                                          figure=dict(data=[trace], layout=layout))])

if __name__ == '__main__':
    app.run_server(debug=True)

Finally running the Dash app from the command-line with python graph_map.py and opening up a browser… surprise! it works!

(yes I know there’s an issue with missing data at 2:30 AM, I’ll deal with that after)

The map

Now that we’ve created the simple line graph, let’s see if we can display the segments on a plotly map. We’re going to try to read the geometry from the database. From the plotly mapping documentation latitude and longitude have to be passed separately as:

lon (list, numpy array, or Pandas series of numbers, strings, or datetimes.)

This isn’t really how most geographic libraries operate, instead requiring arrays of coordinates, so this approach might be a little… hacky.

geometry_sql = pgsql.SQL('''SELECT resultid, ST_ASGeoJSON(geom) geojson
FROM bluetooth.routes
WHERE resultid = {resultid} ''')

resultid = pgsql.Literal('BR2_BR3')

richmond_df = pandasql.read_sql(geometry_sql.format(resultid=resultid), con)

I’ll be basing myself on this tutorial which, honestly, is pretty hacky.

import json

row = richmond_df.iloc[[0]]

# row['geojson'][0]
geojson = json.loads(row.geojson)
geojson['coordinates'][0]

[-79.3691483292635, 43.6533779496959]

def get_lat_lon(geojson):
    lons, lats = [], []
    for coord in geojson['coordinates']:
        lons.append(coord[0])
        lats.append(coord[1])
    return lats, lons

segments = []

for row in richmond_df.itertuples():
    geojson = json.loads(row.geojson)
    lats, lons = get_lat_lon(geojson)
    segments.append( dict(type = 'scattergeo',
                          lon = lons, lat = lats,
                          mode = 'lines',
                          line = dict(width = 2)
                         ))

layout = dict(
        title = 'Bluetooth segments',
        showlegend = False,
        geo = dict(
            resolution=50,
            showland = True,
            showlakes = True,
            showcountries = True,
            scope = "North America",
            projection = dict(type = "mercator"),
            lonaxis = dict( range = [-79.39, -79.36]),
            lataxis = dict( range = [43.64, 43.66])
        )
    )
iplot(dict(data=segments,layout=layout))

mapbox_token = 'pk.eyJ1IjoicmVtb3RlZ2VudHJpZnkiLCJhIjoiY2lnanJzMjJpMDA1dnYxbHo5MTZtdGZsYSJ9.gLE8d40zmDAtMSSZyd2h1Q'


data = go.Data([
    go.Scattermapbox(
        lat=lats,
        lon=lons,
        mode='lines',
        hoverinfo='text',
        showlegend=False
    )])
    
layout = go.Layout(
    title='Bluetooth Segments',
    autosize=True,
    hovermode='closest',
    mapbox=dict(
        accesstoken=mapbox_token,
        bearing=0,
        center=dict(
            lat=43.65,
            lon=-79.37
        ),
        pitch=0,
        zoom=12,
        style='light'
    )
)    

iplot(dict(data=data,layout=layout))

It works!!

Next Steps

I still have to test out the Mapbox map in Dash. And then the final step is to test linking the two together using Dash’s callbacks and then throw more data and more interactivity options.

dash, plotly, python, visualization

I feedback.
Let me know what you think of this article on twitter @dumasraphael!