“Football field” (variable-width bar) charts in R / Power BI

An interesting way to present long-term contracts and an opportunity to try our R visuals in Power BI

On a recent project, our team was asked if we can visualize long term real estate contracts on a “football field” chart. Now, I’ve heard of snowball charts, waterfall charts, but a football field chart was something new. A quick search, however, showed that I was missing out.

Yup, it’s a thing.

What exactly is a football field chart? It effectively is a column/bar chart where the individual bars are used to indicate range (and thus start/end in arbitrary places). Most examples on the internet relate to valuation ranges. In our case, the idea was to display contracts over time as discreet bars, varying the bar width with contract value changes.

Something along these lines

We were planning to deliver our insights via Power BI reports, so my immediate reaction was to look whether we could create these sort of charts using Power BI’s R visuals and ggplot2. Turns out, ggplot2 includes exactly what we needed in this scenario. Its geom_tile() / geom_rect() allow drawing arbitrary rectangles. If you prefer Python, the ggplot2-inspired plotnine package supports the same geoms, and, admittedly has way more impressive examples.

This is an example of what you can achieve with the ggplot2 / plotnine packages. Technically, this is a bar chart!
Credit: Plotnine geom_tile() documentation

Getting to the right visual in R

For the purpose of this blog post, I generated some synthetic data representing a real estate portfolio in Germany. In case you are interested in how it was created, check out the python script with details (it was a lot of fun to play around with various probability distributions, I may write up a blog post about that separately one day).

We effectively had: 1) a list of contracts 2) a list of terms associated with each contract and 3) a monthly rental portfolio data set derived from the contract/term information.

Getting to the desired visualization was surprisingly easy. First, let’s select top 10 contracts by their future value (you don’t really want to visualize all contracts at once unless you intend to print it on A0 size paper..)


#read in the table that contains individual contract terms with associated start/end dates
terms = fread("~/coding/real-estate/terms.csv")

#read in the table that contains monthly information for each term (SQM / rent income in a month)
#add proper year/month columns to the dataframe
PL = fread("~/coding/real-estate/monthly_data.csv") %>% 
  mutate(date = as.Date(strptime(paste0(month,"-01"), format="%Y-%m-%d"))) %>%
  mutate(year = year(date), month = month(date))

#calculate future value per contract (assuming we're at the end of 2020)
# filter it down to top10 contracts only
top_10_contracts = PL %>% filter(date > "2020-12-31") %>% 
  inner_join(terms, by='term_id') %>%
  inner_join(contracts, by="contract_id") %>% 
  group_by(contract_id) %>% summarize(future_value = sum(rent), .groups='drop_last') %>% 
  arrange(desc(future_value)) %>% head(10)

Then, we can proceed to get the first iteration of the chart. As geom_tile() requires “central” positions and width/height of the rectangles, let’s calculate the mid-period of each term to be used as the x-axis position, width will be equal to number of days, and height to monthly rent. Let’s try with one contract.

chart_df = 
  #filter to 1 contract
  top_10_contracts[1,] %>% 
  #get terms of contract
  inner_join(terms, by="contract_id") %>% 
  #calculate # of days of each term
  mutate(width = as.numeric(difftime(end, start, units="days"))) %>% 
  #calculate the mid-point of each term
  mutate(mid_point = start + width/2) %>% 
  #extensions are numeric, make sure they are treated as discrete variables
  mutate(extension = as.factor(extension)) 

ggplot(chart_df) + 
Not bad for an initial attempt.

Seems like we’re on the right track, even if not beautiful. We can fix that later. Let’s try with 10 contracts.

Accidental art?

Right. That’s not that great. While initially I thought that R, for some reason, places each series on top of each other (i.e. ignores Y-axis parameter), I later realized the real issue was bar heights. They needed to be re-scaled. Here’s a fixed version where each bar height is scaled to maximum of 1 (and some formatting tweaks).

ggplot(chart_df) + 
    height=monthly_rent/max(monthly_rent), #that's the fix
  ) +
  ylab("") + xlab("Year") + theme_bw() + 
  theme(panel.grid.minor.y = element_blank(), panel.grid.major.y = element_blank())
Much better.

Almost perfect. Just need to add labels that indicate total value and sort the contracts from the highest value to the lowest. Here’s the final code that achieves all that.

chart_df = 
  top_10_contracts %>%
  #get terms of top 10 contracts
  inner_join(terms, by="contract_id") %>% 
  #calculate # of days of each term
  mutate(width = as.numeric(difftime(end, start, units="days"))) %>% 
  #calculate the mid-point of each term
  mutate(mid_point = start + width/2) %>% 
  #extensions are numeric, make sure they are treated as discrete variables
  mutate(extension = as.factor(extension)) %>% 
  #reorder contracts
  mutate(contract_id = reorder(as.character(contract_id), future_value, mean)) 

labels = 
  #group by contract
  chart_df %>% group_by(contract_id) %>% 
  #calculate total value and get end date
  summarize(end_date = max(end), value = max(future_value), .groups="drop_last") %>% 
  #format total value
  mutate(label = paste0(round(value / 1000000, 1), "M")) 

ggplot(chart_df) + 
  #create rectangle geoms
  )) + 
  #deal with axis labels
  ylab("") + xlab("Year") + theme_bw() + 
  #remove gridlines on Y-axis
  theme(panel.grid.minor.y = element_blank(), panel.grid.major.y = element_blank()) +
  #add labels at the end of each contract bar
  geom_label(data = labels, mapping=aes(end_date + 400, contract_id, label=label)) +
  #apply a better color palette
  scale_fill_brewer(palette = "RdYlGn")
Here we are.

Edit: I received a comment (thanks Q!) that the chart may look more intuitive if there is a shadow behind each bar indicating the maximum monthly value on the chart for reference. Here is how that could look like (code-wise, that is just another geom_tile() layer).

Putting it all together in Power BI

Power BI’s R visual integration is incredibly simple. Effectively, all you need to do is to use their built-in R visual, drop in the fields that you want to have available as part of the dataframe in R (Power BI does all the aggregation for you, it’s effectively passing the same dataset as if it was a matrix visual) and then adjust R code accordingly (some of the data transforms were no longer required).

While not applicable in this case, a couple of caveats about R Visuals in Power BI:

  • R visuals can be cross-filtered but are not interactive themselves and do not cross-filter other visuals in the report;
  • When published to Power BI Service, they will work as long as you use (a long list) of supported packages;
  • There are certain limitations to number of data points, processing time and etc.

Combine it with a dynamic TopN selection, and voilà. Click to see it in action.

A quick Power BI mock-up with dynamic interactions

All code used in this blog post is available at https://github.com/kamicollo/football-field-charts.

Hi! 👋 I am Aurimas Račas. I love all things data. My code lives on GitHub, opinions on Twitter / Mastodon, and you can learn more about me on LinkedIn.