As well as supporting the generation of parameterised reports, reproducible workflows also support the automated generation of (templated) code that implements interactive charts.
For example, inspired by Oli Hawkins (Visualising migration between the countries of the UK [demo]), we can generate interactive Sankey plots using the googleVis
or rCharts
packages.
Note that the packages generate standalone HTML/Javascript code to implement the charts, code that can be used elsewhere or embedded in other HTML pages.
#The RCharts package throws a wobbly if we don't load knitr in explicitly
library(knitr)
library(readr)
#Data from ONS: https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/migrationwithintheuk/datasets/matricesofinternalmigrationmovesbetweenlocalauthoritiesandregionsincludingthecountriesofwalesscotlandandnorthernireland
regionsquarematrix2015 = read_csv("../data/laandregionsquarematrices2015/regionsquarematrix2015.csv", skip = 8)
#The data has thousand separator commas - so remove them and convert to numeric
#There is probably a more idiomatic way of doing this using tidyr...
regionsquarematrix2015 = cbind(regionsquarematrix2015[1:2],
sapply(regionsquarematrix2015[3:ncol(regionsquarematrix2015)],
function(x) as.numeric(gsub(",", "", x)) ) )
head(regionsquarematrix2015)
## DESTINATION Region E12000001 E12000002 E12000003
## 1 North East E12000001 NA 6870 10820
## 2 North West E12000002 6670 NA 22930
## 3 Yorkshire and The Humber E12000003 10830 22050 NA
## 4 East Midlands E12000004 3030 10300 19520
## 5 West Midlands E12000005 2260 13440 8220
## 6 East E12000006 2850 7120 7600
## E12000004 E12000005 E12000006 E12000007 E12000008 E12000009 W92000004
## 1 3580 2360 3560 4400 4580 2250 1010
## 2 11130 15000 8020 14870 12240 7570 10190
## 3 19280 8470 9530 11230 10680 5710 2910
## 4 NA 19180 20820 16010 19050 6980 3140
## 5 17110 NA 9390 17760 16540 13250 8260
## 6 15500 7630 NA 72460 28570 9500 3050
## S92000003 N92000002
## 1 3350 630
## 2 6000 2150
## 3 3690 620
## 4 2310 540
## 5 2230 540
## 6 3030 700
The Sankey diagram generators seem to expect the data to be provided as edge lists (from, to, value).
library(tidyr)
#Melt the data (wide to long) so we have from/to/value flows
rr=regionsquarematrix2015 %>% gather(source, value, 3:ncol(.))
#Merge in names for the source areas
rr=merge(rr, unique(data.frame(SOURCE=rr$DESTINATION,
source=rr$Region)),
by='source')
#The Sankey diagram generators dislike cycles - so set unique labels for from/to
rr$source=paste0(rr$source,'_')
rr$SOURCE=paste0(rr$SOURCE,' ')
#Drop rows that have no flow associated with them
rr=rr[!is.na(rr$value),]
colnames(rr) = c("source","targetName","target","value","sourceName")
rr = rr[,c("sourceName","targetName","source","target","value")]
head(rr)
## sourceName targetName source target value
## 2 North East North West E12000001_ E12000002 6670
## 3 North East Yorkshire and The Humber E12000001_ E12000003 10830
## 4 North East East Midlands E12000001_ E12000004 3030
## 5 North East West Midlands E12000001_ E12000005 2260
## 6 North East East E12000001_ E12000006 2850
## 7 North East London E12000001_ E12000007 6110
googleVis
googleVis
is an R package that provides an R wrapper around/interface to Google Chart tools.
We can generate a Sankey diagram using googleVis
from a data frame representing an edge list in the following way:
#For use in Rmd/knitr, set the block parameter: results='asis'
library(googleVis)
options(gvis.plot.tag='chart')
#Generate the Sankey diagram HTML
s=gvisSankey(rr[,c('source','target','value')])
#And render it
plot(s)
Notwithstanding the availability of from
, to
and weight
parameters for specifying column names, the function appears to want the dataframe passed in in a particular way, specifically from
, to
, weight
.
According to the Google Sankey diagram documentation, node labels, as well as the color of nodes and edges, can be controlled; see Sankey diagrams with googleVis or this StackOverflow answer for an example of how to pass the parameters in.
To color the nodes, we need to provide node colors in the order in which the nodes are added to the chart. We can find the node order by interleaving the source and target columns:
nodeOrder=unique(c(rbind(rr$source, rr$target)))
Add some node colors:
colormapl=c(E='#ffcc00',N='green',S='blue',W='red')
#Now we need to get the color for the node order.
nodeColor=unname(colormapl[substring(nodeOrder, 1, 1)])
#http://stackoverflow.com/a/32111596/454773
colors_node_array = paste0("[", paste0("'", nodeColor,"'", collapse = ','), "]")
opts = paste0("{ node: { colors: ", colors_node_array ," } }" )
s=gvisSankey(rr[,c('source','target','value')], options=list( sankey=opts))
plot(s)
Add some edge colors, again in the order of edges supplied.
#Use the originating node colour for the edge
opts = paste0("{ link: { colorMode: 'source' },node: { colors: ", colors_node_array ," } }" )
s=gvisSankey(rr[,c('source','target','value')], options=list(sankey=opts))
plot(s)
The labels are the node values. If we map the identifiers to (distinct) labels, they make the chart more informative.
s=gvisSankey(rr[,c('sourceName','targetName','value')], options=list(sankey=opts))
plot(s)
Generate a view of the chart that omits flows within the same country.
#Limit the rows
rr2=rr[substring(rr$target, 1, 1)!=substring(rr$source, 1, 1),]
#ABstract out the code that allows us to generate a new color array
setNodeColors=function(df,source='source',target='target'){
#Interleave the nodes from the edgelist in the order they are introduced
nodeOrder=unique(c(rbind(df[[source]], df[[target]])))
#Generae a color mapping from the country indicator at the start of the country/region code
nodeColor=unname(colormapl[substring(nodeOrder, 1, 1)])
#Get the data in the form that the Sankey widget wants it...
colors_node_array = paste0("[", paste0("'", nodeColor,"'", collapse = ','), "]")
colors_node_array
}
colors_node_array=setNodeColors(rr2)
opts = paste0("{ link: { colorMode: 'source' }, node: { colors: ", colors_node_array ," } }" )
s=gvisSankey(rr2[,c('sourceName','targetName','value')], options=list(sankey=opts))
plot(s)
Finally, how about we group the (English) regional flows to a single English flow.
library(dplyr)
countrymap=c(E='England',N='Northern Ireland',S='Scotland',W='Wales')
rr2$countrysource=countrymap[substring(rr2$source, 1, 1)]
rr2$countrytarget=paste0(countrymap[substring(rr2$target, 1, 1)],' ')
rrg = rr2 %>%
group_by(countrysource,countrytarget) %>%
summarise(value = sum(value))
#Generate new color array
colors_node_array=setNodeColors(rrg,'countrysource','countrytarget')
opts = paste0("{ link: { colorMode: 'source' }, node: { colors: ", colors_node_array ," } }" )
s=gvisSankey(rrg,options=list(sankey=opts))
plot(s)
We can also change the edge color to a gradient between the source and target color values, but this just looks a horrible mess to me!
opts = paste0("{ link: { colorMode: 'gradient' }, node: { colors: ", colors_node_array ," } }" )
s=gvisSankey(rrg,options=list(sankey=opts))
plot(s)
rCharts
Generate a Sankey diagram using rCharts
:
#Based on http://bl.ocks.org/timelyportfolio/6085852
#There is also a particle flow enhancement demoed at https://bl.ocks.org/micahstubbs/6a366e759f029599678e293521d7e26c
library(rCharts)
sankeyPlot2 <- rCharts$new()
sankeyPlot2$setLib('http://timelyportfolio.github.io/rCharts_d3_sankey/')
sankeyPlot2$set(
data = rr[,c('source','target','value')],
nodeWidth = 15,
nodePadding = 10,
layout = 32,
width = 750,
height = 500
)
sankeyPlot2$show('iframesrc', cdn = TRUE)
#Note that at the time of writing, the rCharts_d3_sankey bakes in the http protocol for loading three
#resources that breaks if the output HTML page is served as https.
Some control over colouring can be introduced by extending the template, as demonstrated in this Stack Overflow answer (following this original explanation from @timelyportfolio, the author of the rCharts Sankey package.)
A wide range of interactive chart types can be generated in this way. The htmlwidgets for R project represents the latest iteration in the production of interactive Javascript widgets for use in RMarkdown documents and Shiny applications.
sankeyD3
This seems to be the most recent attempt at an R/Sankey diagram library, again using D3.js.
The library appears to require nodes being identified as consecutive integers, starting at 0.
#devtools::install_github("fbreitwieser/sankeyD3")
library(sankeyD3)
library(plyr)
#Get a mapping from codes to numeric node IDs
#Need to interleave the nodes appropriately
rrd=data.frame(rid= c(rbind(regionsquarematrix2015$Region,
paste0(regionsquarematrix2015$Region,'_'))) )
rrd['num']=0:(nrow(rrd)-1)
rrd['name']=c(rbind(c(regionsquarematrix2015$DESTINATION,
paste0(regionsquarematrix2015$DESTINATION,'_'))))
#Map the edges
rr$source=as.integer(mapvalues(unlist(rr$source),from=unlist(rrd['rid']),to=unlist(rrd['num'])))
rr$target=as.integer(mapvalues(unlist(rr$target),from=unlist(rrd['rid']),to=unlist(rrd['num'])))
sankeyNetwork(Links = rr, Nodes = rrd, Source = "source", title="Migration Flows",
Target = "target", Value = "value", NodeID = "name",
fontSize = 12, nodeWidth = 30,showNodeValues = FALSE)