Python's Plotly vs. Unemployment Data

Having read several articles and some commentary on May's jobs report, it was obvious to those that I read that the numbers were “off”. With this, I decided to dig a little deeper and go to the U.S. Bureau of Labor Statistics (BLS) website to read the report for myself. After reading it, I was surprised by the report but even more so by the explanation at the bottom of the report. Now I have to honest in that I went into this with the somewhat incredulous attitude that there was no way the May 2020 unemployment rate had fallen to 13.3%; especially during an economic shutdown of COVID-19 magnitude.

Here's the rub: per the "explanation" section at the bottom of the report, some workers had been mistakenly classified “If the workers who were recorded as employed but absent from work due to "other reasons" (over and above the number absent for other reasons in a typical May) had been classified as unemployed on temporary layoff, the overall unemployment rate would have been about 3 percentage points higher than reported…”

3%!?!!

Not insignificant.

I'm not here trying to find fault with the BLS as those folks do a fine job and the fact that they explained their process, and apparently do so as a routine practice, is helpful to those that want to make sense of data. I think the problem in all this lies in those vested interests that use slices of the data to their benefit and fail to communicate any and all caveats to the numbers. Politics!

But rather than chase my tail on this, I thought I’d take the numbers, a weekend and put some visuals together.

Python, Pandas and Plotly

“The plotly Python library is an interactive, open-source plotting library that supports over 40 unique chart types covering a wide range of statistical, financial, geographic, scientific, and 3-dimensional use-cases.”

Plotly is way user friendly and has a high level api that really simplifies things “Plotly Express is a terse, consistent, high-level API for creating figures.”

Added bonus: Plotly’s api makes it easy to work with Pandas dataframes.

Add some Django

I enjoy working with Django because as their website states “Django makes it easier to build better Web apps more quickly and with less code”.

An added benefit with Django is the rich community support and third-party apps that can easily be plugged in to an existing app which in turn saves considerable time - I’m all about that given my time demands! For this project, we’ll be using django-plotly-dash app which helps Django and Plotly play well together.

Data and Wrangling

For data, we’re going to use BLS unemployment figures, by state for the past 10 years. This data is readily available or simply google “unemployment by state by year” and you’ll be set.

To get started, import the data, do a little transformation (love pandas melt) and export the result to a csv file for use with Plotly.

# import dependencies
import pandas as pd
import datetime
# load unemployment data and view
df_bls = pd.read_csv('state_unemp.csv')
# reshape dataframe using melt
df_bls = df_bls.melt(id_vars=['State'], var_name='Date', value_name='Rate')
# convert Date to time series Time
df_bls['Time'] = pd.to_datetime(df_bls['Date'])
# sort the dataframe by State then Time
df_bls = df_bls.sort_values(by=['State', 'Time'])
# export to csv
df_bls.to_csv("state_unemployment.csv", index=False)
view raw bls_data.py hosted with ❤ by GitHub

In the end, the dataframe will look like so:

import dash
import dash_core_components as dcc
import dash_html_components as html
import plotly.graph_objects as go
import plotly.express as px
import pandas as pd
from django_plotly_dash import DjangoDash
colors = {
'background': '#26293b',
'text': '#7FDBFF'
}
#### state unemployment last year ####
app = DjangoDash('LastYear')
df = pd.read_csv('static/appdata/state_unemp.csv')
fig = go.Figure(data=go.Choropleth(
locations=df['State'],
z=df['May 2019'].astype(float),
locationmode='USA-states',
colorscale='Blues',
colorbar_title="Millions USD",
)).update_geos(
bgcolor=colors['background'],
showlakes=True,
lakecolor=colors['text'],
).update_layout(
title_text='US Unemployment May 2019',
geo_scope='usa',
plot_bgcolor=colors['background'],
paper_bgcolor=colors['background'],
font={'color': colors['text']},
)
app.layout = html.Div([
dcc.Graph(figure=fig),
])
#### state unemployment this year ####
app = DjangoDash('ThisYear')
df = pd.read_csv('static/appdata/state_unemp.csv')
fig = go.Figure(data=go.Choropleth(
locations=df['State'],
z=df['May 2020'].astype(float),
locationmode='USA-states',
colorscale='Blues',
colorbar_title="Millions USD",
)).update_geos(
bgcolor=colors['background'],
showlakes=True,
lakecolor=colors['text'],
).update_layout(
title_text='US Unemployment May 2020',
geo_scope='usa',
plot_bgcolor=colors['background'],
paper_bgcolor=colors['background'],
font={'color': colors['text']},
)
app.layout = html.Div([
dcc.Graph(figure=fig),
])
#### state unemployment animation ####
app = DjangoDash('StateUnempExpress')
df = pd.read_csv(
'static/appdata/state_unemployment.csv')
fig = px.choropleth(df, locations='State',
locationmode="USA-states", color='Rate', hover_name="State",
animation_frame='Date',
# animation_group='Year',
projection="albers usa",
title='US Unemployment May 2010 to May 2020',).update_layout({
'plot_bgcolor': colors['background'],
'paper_bgcolor': colors['background'],
'font': {'color': colors['text']
}
})
fig.update_geos(
bgcolor=colors['background'],
showlakes=True,
lakecolor=colors['text'],
)
app.layout = html.Div([
dcc.Graph(figure=fig),
])
#### state unemployment data table ####
app = DjangoDash('DataTable')
df = pd.read_csv(
'static/appdata/state_unemployment.csv')
fig = go.Figure(data=[go.Table(
header=dict(values=["State", "Date", "Rate"],
fill_color=colors['background'],
align='left', line_color='darkslategray', font=dict(color=colors['text'], size=13)
),
cells=dict(values=[df.State, df.Date, df.Rate],
fill_color=colors['background'],
align='left', line_color='darkslategray', font=dict(color=colors['text'], size=12)))
]).update_layout({
'plot_bgcolor': colors['background'],
'paper_bgcolor': colors['background'],
'font': {'color': colors['text']
}
})
fig.update_layout(title="US Unemployment Dataset May 2010 to May 2020"
)
app.layout = html.Div([
dcc.Graph(figure=fig),
])
view raw bls_graphs.py hosted with ❤ by GitHub

Heroku Deployment

I’ve deployed on Amazon, Google and even Netlify but chose Heroku because it best suited this project. To view this application: BLS Unemployment

*UPDATE*

I disconnected the backend due to the constant maintenance requirements and so have instead included screenshots from the application: HOMEPAGE, 2019/2020 CHART, ANIMATED CHART.

It'll be interesting to see the July numbers and the coverage and slant that they'll undoubtedly receive.

Hold the line!

State Date Rate Time
1 AK May 2010 7.9 2010-05-01
52 AK Jun 2010 7.8 2010-06-01
103 AK Jul 2010 7.8 2010-07-01
154 AK Aug 2010 7.8 2010-08-01
205 AK Sep 2010 7.8 2010-09-01
... ... ... ... ...

Plotly

As I stated earlier, this API makes graphing in python a breeze, is well documented with plenty of example scripts and the community is vibrant and engaged - no I don’t work for Plotly rather just appreciate the level of talent it took to craft this codebase.

Case in point: with just a few lines of code, I was able to generate 3 graphs and one with an animation component that otherwise would have taken me a considerable amount of time to construct:

Next
Next

Quickbooks API & Dashboard