An interesting way to present long-term contracts and an opportunity to try our R visuals in Power BI
On a recent project, our team was asked if we can visualize long term real estate contracts on a “football field” chart. Now, I’ve heard of snowball charts, waterfall charts, but a football field chart was something new. A quick search, however, showed that I was missing out.
What exactly is a football field chart? It effectively is a column/bar chart where the individual bars are used to indicate range (and thus start/end in arbitrary places). Most examples on the internet relate to valuation ranges. In our case, the idea was to display contracts over time as discreet bars, varying the bar width with contract value changes.
We were planning to deliver our insights via Power BI reports, so my immediate reaction was to look whether we could create these sort of charts using Power BI’s R visuals and ggplot2. Turns out, ggplot2 includes exactly what we needed in this scenario. Its geom_tile() / geom_rect() allow drawing arbitrary rectangles. If you prefer Python, the ggplot2-inspired plotnine package supports the same geoms, and, admittedly has way more impressive examples.
Getting to the right visual in R
For the purpose of this blog post, I generated some synthetic data representing a real estate portfolio in Germany. In case you are interested in how it was created, check out the python script with details (it was a lot of fun to play around with various probability distributions, I may write up a blog post about that separately one day).
We effectively had: 1) a list of contracts 2) a list of terms associated with each contract and 3) a monthly rental portfolio data set derived from the contract/term information.
Getting to the desired visualization was surprisingly easy. First, let’s select top 10 contracts by their future value (you don’t really want to visualize all contracts at once unless you intend to print it on A0 size paper..)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | library (tidyverse) library (data.table) library (RColorBrewer) #read in the table that contains individual contract terms with associated start/end dates terms = fread ( "~/coding/real-estate/terms.csv" ) #read in the table that contains monthly information for each term (SQM / rent income in a month) #add proper year/month columns to the dataframe PL = fread ( "~/coding/real-estate/monthly_data.csv" ) %>% mutate (date = as.Date ( strptime ( paste0 (month, "-01" ), format= "%Y-%m-%d" ))) %>% mutate (year = year (date), month = month (date)) #calculate future value per contract (assuming we're at the end of 2020) # filter it down to top10 contracts only top_10_contracts = PL %>% filter (date > "2020-12-31" ) %>% inner_join (terms, by= 'term_id' ) %>% inner_join (contracts, by= "contract_id" ) %>% group_by (contract_id) %>% summarize (future_value = sum (rent), .groups= 'drop_last' ) %>% arrange ( desc (future_value)) %>% head (10) |
Then, we can proceed to get the first iteration of the chart. As geom_tile() requires “central” positions and width/height of the rectangles, let’s calculate the mid-period of each term to be used as the x-axis position, width will be equal to number of days, and height to monthly rent. Let’s try with one contract.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | chart_df = #filter to 1 contract top_10_contracts[1,] %>% #get terms of contract inner_join (terms, by= "contract_id" ) %>% #calculate # of days of each term mutate (width = as.numeric ( difftime (end, start, units= "days" ))) %>% #calculate the mid-point of each term mutate (mid_point = start + width/2) %>% #extensions are numeric, make sure they are treated as discrete variables mutate (extension = as.factor (extension)) ggplot (chart_df) + geom_tile ( aes ( x=mid_point, y=contract_id, width=width, height=monthly_rent, fill=extension )) |
Seems like we’re on the right track, even if not beautiful. We can fix that later. Let’s try with 10 contracts.
Right. That’s not that great. While initially I thought that R, for some reason, places each series on top of each other (i.e. ignores Y-axis parameter), I later realized the real issue was bar heights. They needed to be re-scaled. Here’s a fixed version where each bar height is scaled to maximum of 1 (and some formatting tweaks).
1 2 3 4 5 6 7 8 9 10 | ggplot (chart_df) + geom_tile ( aes ( x=mid_point, y=contract_id, width=width, height=monthly_rent/ max (monthly_rent), #that's the fix fill=extension) ) + ylab ( "" ) + xlab ( "Year" ) + theme_bw () + theme (panel.grid.minor.y = element_blank (), panel.grid.major.y = element_blank ()) |
Almost perfect. Just need to add labels that indicate total value and sort the contracts from the highest value to the lowest. Here’s the final code that achieves all that.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | chart_df = top_10_contracts %>% #get terms of top 10 contracts inner_join (terms, by= "contract_id" ) %>% #calculate # of days of each term mutate (width = as.numeric ( difftime (end, start, units= "days" ))) %>% #calculate the mid-point of each term mutate (mid_point = start + width/2) %>% #extensions are numeric, make sure they are treated as discrete variables mutate (extension = as.factor (extension)) %>% #reorder contracts mutate (contract_id = reorder ( as.character (contract_id), future_value, mean)) labels = #group by contract chart_df %>% group_by (contract_id) %>% #calculate total value and get end date summarize (end_date = max (end), value = max (future_value), .groups= "drop_last" ) %>% #format total value mutate (label = paste0 ( round (value / 1000000, 1), "M" )) ggplot (chart_df) + #create rectangle geoms geom_tile ( aes ( x=mid_point, y=contract_id, width=width, height=monthly_rent/ max (monthly_rent), fill=extension )) + #deal with axis labels ylab ( "" ) + xlab ( "Year" ) + theme_bw () + #remove gridlines on Y-axis theme (panel.grid.minor.y = element_blank (), panel.grid.major.y = element_blank ()) + #add labels at the end of each contract bar geom_label (data = labels, mapping= aes (end_date + 400, contract_id, label=label)) + #apply a better color palette scale_fill_brewer (palette = "RdYlGn" ) |
Edit: I received a comment (thanks Q!) that the chart may look more intuitive if there is a shadow behind each bar indicating the maximum monthly value on the chart for reference. Here is how that could look like (code-wise, that is just another geom_tile() layer).
Putting it all together in Power BI
Power BI’s R visual integration is incredibly simple. Effectively, all you need to do is to use their built-in R visual, drop in the fields that you want to have available as part of the dataframe in R (Power BI does all the aggregation for you, it’s effectively passing the same dataset as if it was a matrix visual) and then adjust R code accordingly (some of the data transforms were no longer required).
While not applicable in this case, a couple of caveats about R Visuals in Power BI:
- R visuals can be cross-filtered but are not interactive themselves and do not cross-filter other visuals in the report;
- When published to Power BI Service, they will work as long as you use (a long list) of supported packages;
- There are certain limitations to number of data points, processing time and etc.
Combine it with a dynamic TopN selection, and voilà. Click to see it in action.
All code used in this blog post is available at https://github.com/kamicollo/football-field-charts.