
Python's Plotly vs. Unemployment Data
Having read several articles and some commentary on May's jobs report, it was obvious to those that I read that the numbers were “off”. With this, I decided to dig a little deeper and go to the U.S. Bureau of Labor Statistics (BLS) website to read the report for myself. After reading it, I was surprised by the report but even more so by the explanation at the bottom of the report. Now I have to honest in that I went into this with the somewhat incredulous attitude that there was no way the May 2020 unemployment rate had fallen to 13.3%; especially during an economic shutdown of COVID-19 magnitude.
Here's the rub: per the "explanation" section at the bottom of the report, some workers had been mistakenly classified “If the workers who were recorded as employed but absent from work due to "other reasons" (over and above the number absent for other reasons in a typical May) had been classified as unemployed on temporary layoff, the overall unemployment rate would have been about 3 percentage points higher than reported…”
3%!?!!
Not insignificant.
I'm not here trying to find fault with the BLS as those folks do a fine job and the fact that they explained their process, and apparently do so as a routine practice, is helpful to those that want to make sense of data. I think the problem in all this lies in those vested interests that use slices of the data to their benefit and fail to communicate any and all caveats to the numbers. Politics!
But rather than chase my tail on this, I thought I’d take the numbers, a weekend and put some visuals together.
Python, Pandas and Plotly
“The plotly Python library is an interactive, open-source plotting library that supports over 40 unique chart types covering a wide range of statistical, financial, geographic, scientific, and 3-dimensional use-cases.”
Plotly is way user friendly and has a high level api that really simplifies things “Plotly Express is a terse, consistent, high-level API for creating figures.”
Added bonus: Plotly’s api makes it easy to work with Pandas dataframes.
Add some Django
I enjoy working with Django because as their website states “Django makes it easier to build better Web apps more quickly and with less code”.
An added benefit with Django is the rich community support and third-party apps that can easily be plugged in to an existing app which in turn saves considerable time - I’m all about that given my time demands! For this project, we’ll be using django-plotly-dash app which helps Django and Plotly play well together.
Data and Wrangling
For data, we’re going to use BLS unemployment figures, by state for the past 10 years. This data is readily available or simply google “unemployment by state by year” and you’ll be set.
To get started, import the data, do a little transformation (love pandas melt) and export the result to a csv file for use with Plotly.
# import dependencies | |
import pandas as pd | |
import datetime | |
# load unemployment data and view | |
df_bls = pd.read_csv('state_unemp.csv') | |
# reshape dataframe using melt | |
df_bls = df_bls.melt(id_vars=['State'], var_name='Date', value_name='Rate') | |
# convert Date to time series Time | |
df_bls['Time'] = pd.to_datetime(df_bls['Date']) | |
# sort the dataframe by State then Time | |
df_bls = df_bls.sort_values(by=['State', 'Time']) | |
# export to csv | |
df_bls.to_csv("state_unemployment.csv", index=False) |
In the end, the dataframe will look like so:
import dash | |
import dash_core_components as dcc | |
import dash_html_components as html | |
import plotly.graph_objects as go | |
import plotly.express as px | |
import pandas as pd | |
from django_plotly_dash import DjangoDash | |
colors = { | |
'background': '#26293b', | |
'text': '#7FDBFF' | |
} | |
#### state unemployment last year #### | |
app = DjangoDash('LastYear') | |
df = pd.read_csv('static/appdata/state_unemp.csv') | |
fig = go.Figure(data=go.Choropleth( | |
locations=df['State'], | |
z=df['May 2019'].astype(float), | |
locationmode='USA-states', | |
colorscale='Blues', | |
colorbar_title="Millions USD", | |
)).update_geos( | |
bgcolor=colors['background'], | |
showlakes=True, | |
lakecolor=colors['text'], | |
).update_layout( | |
title_text='US Unemployment May 2019', | |
geo_scope='usa', | |
plot_bgcolor=colors['background'], | |
paper_bgcolor=colors['background'], | |
font={'color': colors['text']}, | |
) | |
app.layout = html.Div([ | |
dcc.Graph(figure=fig), | |
]) | |
#### state unemployment this year #### | |
app = DjangoDash('ThisYear') | |
df = pd.read_csv('static/appdata/state_unemp.csv') | |
fig = go.Figure(data=go.Choropleth( | |
locations=df['State'], | |
z=df['May 2020'].astype(float), | |
locationmode='USA-states', | |
colorscale='Blues', | |
colorbar_title="Millions USD", | |
)).update_geos( | |
bgcolor=colors['background'], | |
showlakes=True, | |
lakecolor=colors['text'], | |
).update_layout( | |
title_text='US Unemployment May 2020', | |
geo_scope='usa', | |
plot_bgcolor=colors['background'], | |
paper_bgcolor=colors['background'], | |
font={'color': colors['text']}, | |
) | |
app.layout = html.Div([ | |
dcc.Graph(figure=fig), | |
]) | |
#### state unemployment animation #### | |
app = DjangoDash('StateUnempExpress') | |
df = pd.read_csv( | |
'static/appdata/state_unemployment.csv') | |
fig = px.choropleth(df, locations='State', | |
locationmode="USA-states", color='Rate', hover_name="State", | |
animation_frame='Date', | |
# animation_group='Year', | |
projection="albers usa", | |
title='US Unemployment May 2010 to May 2020',).update_layout({ | |
'plot_bgcolor': colors['background'], | |
'paper_bgcolor': colors['background'], | |
'font': {'color': colors['text'] | |
} | |
}) | |
fig.update_geos( | |
bgcolor=colors['background'], | |
showlakes=True, | |
lakecolor=colors['text'], | |
) | |
app.layout = html.Div([ | |
dcc.Graph(figure=fig), | |
]) | |
#### state unemployment data table #### | |
app = DjangoDash('DataTable') | |
df = pd.read_csv( | |
'static/appdata/state_unemployment.csv') | |
fig = go.Figure(data=[go.Table( | |
header=dict(values=["State", "Date", "Rate"], | |
fill_color=colors['background'], | |
align='left', line_color='darkslategray', font=dict(color=colors['text'], size=13) | |
), | |
cells=dict(values=[df.State, df.Date, df.Rate], | |
fill_color=colors['background'], | |
align='left', line_color='darkslategray', font=dict(color=colors['text'], size=12))) | |
]).update_layout({ | |
'plot_bgcolor': colors['background'], | |
'paper_bgcolor': colors['background'], | |
'font': {'color': colors['text'] | |
} | |
}) | |
fig.update_layout(title="US Unemployment Dataset May 2010 to May 2020" | |
) | |
app.layout = html.Div([ | |
dcc.Graph(figure=fig), | |
]) | |
Heroku Deployment
I’ve deployed on Amazon, Google and even Netlify but chose Heroku because it best suited this project. To view this application: BLS Unemployment
*UPDATE*
I disconnected the backend due to the constant maintenance requirements and so have instead included screenshots from the application: HOMEPAGE, 2019/2020 CHART, ANIMATED CHART.
It'll be interesting to see the July numbers and the coverage and slant that they'll undoubtedly receive.
Hold the line!
State | Date | Rate | Time | |
---|---|---|---|---|
1 | AK | May 2010 | 7.9 | 2010-05-01 |
52 | AK | Jun 2010 | 7.8 | 2010-06-01 |
103 | AK | Jul 2010 | 7.8 | 2010-07-01 |
154 | AK | Aug 2010 | 7.8 | 2010-08-01 |
205 | AK | Sep 2010 | 7.8 | 2010-09-01 |
... | ... | ... | ... | ... |
Plotly
As I stated earlier, this API makes graphing in python a breeze, is well documented with plenty of example scripts and the community is vibrant and engaged - no I don’t work for Plotly rather just appreciate the level of talent it took to craft this codebase.
Case in point: with just a few lines of code, I was able to generate 3 graphs and one with an animation component that otherwise would have taken me a considerable amount of time to construct: