Migrant Flows - Sankey Diagram Demo

A demonstration of how to use the ipysankeywidget package to generate a Sankey diagram from a pandas dataframe.

This notebook also demonstrates how widget libraries can also be thought of as code generators capable of generating reusable code that can be used directly elsewhere, or can be treated as an automatically generated "first draft" of the code for interactive chart that can be further enhanced and edited by hand to produce a more polished production quality output.

Originally motivated by Oli Hawkins' Internal migration flows in the UK [about].

In [1]:
#!pip3 install ipysankeywidget
#!jupyter nbextension enable --py --sys-prefix ipysankeywidget
In [2]:
import pandas as pd

#Data from ONS: https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/migrationwithintheuk/datasets/matricesofinternalmigrationmovesbetweenlocalauthoritiesandregionsincludingthecountriesofwalesscotlandandnorthernireland

#Read in the CSV file
#If we specify the null character and thousands separator, the flows whould be read in as numerics not strings
df=pd.read_csv("../data/laandregionsquarematrices2015/regionsquarematrix2015.csv",
               skiprows = 8,thousands=',',na_values='-')
df.head()
Out[2]:
DESTINATION Region E12000001 E12000002 E12000003 E12000004 E12000005 E12000006 E12000007 E12000008 E12000009 W92000004 S92000003 N92000002
0 North East E12000001 NaN 6870.0 10820.0 3580.0 2360.0 3560.0 4400.0 4580.0 2250.0 1010.0 3350.0 630.0
1 North West E12000002 6670.0 NaN 22930.0 11130.0 15000.0 8020.0 14870.0 12240.0 7570.0 10190.0 6000.0 2150.0
2 Yorkshire and The Humber E12000003 10830.0 22050.0 NaN 19280.0 8470.0 9530.0 11230.0 10680.0 5710.0 2910.0 3690.0 620.0
3 East Midlands E12000004 3030.0 10300.0 19520.0 NaN 19180.0 20820.0 16010.0 19050.0 6980.0 3140.0 2310.0 540.0
4 West Midlands E12000005 2260.0 13440.0 8220.0 17110.0 NaN 9390.0 17760.0 16540.0 13250.0 8260.0 2230.0 540.0
In [3]:
from ipysankeywidget import SankeyWidget
In [4]:
#The widget requires an edgelist with source, target and value columns
dfm=pd.melt(df,id_vars=['DESTINATION','Region'], var_name='source', value_name='value')
dfm.columns=['DESTINATION','target','source','value']
dfm['target']=dfm['target']+'_'
dfm.head()
Out[4]:
DESTINATION target source value
0 North East E12000001_ E12000001 NaN
1 North West E12000002_ E12000001 6670.0
2 Yorkshire and The Humber E12000003_ E12000001 10830.0
3 East Midlands E12000004_ E12000001 3030.0
4 West Midlands E12000005_ E12000001 2260.0
In [5]:
#The SankeyWidget function expects a list of dicts, each dict specifying an edge
#Also check how to drop rows where the weight is NA
links=dfm.dropna()[['source','target','value']].to_dict(orient='records')
links[:3]
Out[5]:
[{'source': 'E12000001', 'target': 'E12000002_', 'value': 6670.0},
 {'source': 'E12000001', 'target': 'E12000003_', 'value': 10830.0},
 {'source': 'E12000001', 'target': 'E12000004_', 'value': 3030.0}]
In [6]:
#Generate and display default styled Sankey diagram
SankeyWidget(value={'links': links},
             width=800, height=800,margins=dict(top=0, bottom=0))