From an Excel schedule to Outlook Calendar with PowerShell

(it was actually quite pleasant!)

I am starting studies at Georgia Tech (yay!) and while the semester didn’t even start yet, it already feels a bit overwhelming. Tons of emails, activities, and lots of calculus. More than I was expecting, actually!

Now, one of the ways I deal with such stress is by getting organized. Sometimes I wonder if that’s just a form of procrastination (organizing stuff vs. actually dealing with it), but it helps. When my email inbox is under control, my calendar is up to date, I just feel better. period.

So it probably won’t be surprising that when an email arrived announcing some additional “TA sessions”, with 3 sessions every week until the end of the semester, I had an immediate urge to *organize it*. Except that the schedule came in a form of an Excel spreadsheet…

Not exactly “Outlook-ready”…

Now, I’ve spent the past few days refreshing Python skills, and setting up a new laptop (WSL2 is great, it looks like I won’t be dual-booting Windows/Ubuntu anymore.. feels slightly morally wrong, though). In all that process, I had a chance to play around with PowerShell, and I was wondering how hard would it be to create .ics files/events in Outlook calendar for all these sessions. For someone who wasn’t really familiar with PowerShell control structures, equality operators (“-ne”, that’s basically Lithuanian..), the answer is ~1 hour hard. That was surprisingly fast.

Most of the credit goes to the author of the New-IcsEvent PowerShell function which generates .ics files based on supplied parameters. In my case, I did not bother with reading the Excel file either – turned out that working with lists/strings in PowerShell is very intuitive.

$weeks = "08-23-21
08-30-21
09-06-21
09-13-21
09-20-21
09-27-21
10-04-21
10-11-21
10-18-21
10-25-21
11-01-21
11-08-21
11-15-21
11-22-21
11-29-21
12-06-21".split("`n") 
#weeks is now an array - just like in Python 
#(except, of course, unlike the rest of the world, we don't use \n for line breaks..)

After some fiddling (and extracting locations and session titles to similar arrays as above), all my code had to do was pretty much this:

for ($i = 0; $i -lt $weeks.Count; $i++) {
    $start_week_date = [datetime]::parseexact($weeks[$i], 'MM-dd-yy', $null)
    if ($tuesdays[$i] -ne "") {
        $day = $start_week_date.AddDays(1).ToString("yyyy-MM-dd") + " 09:15"
        $topic = "MSA TA: " + $tuesdays[$i]
        Write-Output $day
        New-IcsEvent -Path C:\users\aurim\Desktop -Subject $topic -Location 'location' -StartDate $day -ShowAs Free -Duration '01:00:00'
    }
    if ($thursdays[$i] -ne "") {
        $day = $start_week_date.AddDays(3).ToString("yyyy-MM-dd") + " 09:15"
        $topic = "MSA TA: " + $thursdays[$i]
        Write-Output $day
        New-IcsEvent -Path C:\users\aurim\Desktop -Subject $topic -Location 'location' -StartDate $day -ShowAs Free -Duration '01:00:00'
    }
    if ($fridays[$i] -ne "") {
        $day = $start_week_date.AddDays(4).ToString("yyyy-MM-dd") + " 13:00"
        $topic = "MSA TA: " + $fridays[$i]
        Write-Output $day
        New-IcsEvent -Path C:\users\aurim\Desktop -Subject $topic -Location 'location' -StartDate $day -ShowAs Free -Duration '01:00:00'
    }    
}

Now, this created a bunch of .ics files on my desktop. 44 to be precise. Turns out that .ics can contain multiple events, however. So all I needed to do was to concatenate all the files together, skipping the first 4 and the last line. That would be quite simple on Unix: ls *.ics " | tail -n +2 | head -n -1 | cat > my_master_schedule.ics. I have a feeling that with some fiddling with awk/sed, it’s very likely possible to even ensure that the first 4 lines of the first file and the last line of the very last file are not skipped, giving us a fully valid .ics file in one go.

BEGIN:VCALENDAR
VERSION:2.0
METHOD:PUBLISH
PRODID:-//JHP//We love PowerShell!//EN
BEGIN:VEVENT
UID: ab4d0283-885b-433f-b269-7ce6b841df46
CREATED: 20210820T005313Z
DTSTAMP: 20210820T005313Z
LAST-MODIFIED: 20210820T005313Z
CLASS:Public
CATEGORIES:
SEQUENCE:0
DTSTART: 20210826T131500Z
DTEND: 20210826T141500Z
DESCRIPTION: 
SUMMARY: MSA TA: R Intro part I
LOCATION: IC215
TRANSP:TRANSPARENT
END:VEVENT
END:VCALENDAR

Nevertheless, I was curious how difficult would it be to do the same in PowerShell. It took some searching, but it turned out it is pretty straightforward, too. By the way, that’s one thing I did not like about PowerShell – there are tons of answers on the web, but quite a few of them are clearly low quality, even on StackOverflow. Reminds me a bit of looking for good VBA guidance. Curse of the masses, I suppose?

ls *.ics | get-content | where {$_.readcount -gt 4 -and $_ -ne "END:VCALENDAR"} > MSA_TA_SESSIONS.icsx                   

As you can see, I cheated a bit as I think get-content reads line-by-line, and so I could not refer to total line count (there definitely are ways to do that, though, it just seemed to require more steps). Either way, I had all I needed.

Now it was time to finish up the exercise by opening Notepad and adding those missing first 4 and the very last line manually. No point in writing code for that.

“Football field” (variable-width bar) charts in R / Power BI

An interesting way to present long-term contracts and an opportunity to try our R visuals in Power BI

On a recent project, our team was asked if we can visualize long term real estate contracts on a “football field” chart. Now, I’ve heard of snowball charts, waterfall charts, but a football field chart was something new. A quick search, however, showed that I was missing out.

Yup, it’s a thing.

What exactly is a football field chart? It effectively is a column/bar chart where the individual bars are used to indicate range (and thus start/end in arbitrary places). Most examples on the internet relate to valuation ranges. In our case, the idea was to display contracts over time as discreet bars, varying the bar width with contract value changes.

Something along these lines

We were planning to deliver our insights via Power BI reports, so my immediate reaction was to look whether we could create these sort of charts using Power BI’s R visuals and ggplot2. Turns out, ggplot2 includes exactly what we needed in this scenario. Its geom_tile() / geom_rect() allow drawing arbitrary rectangles. If you prefer Python, the ggplot2-inspired plotnine package supports the same geoms, and, admittedly has way more impressive examples.

This is an example of what you can achieve with the ggplot2 / plotnine packages. Technically, this is a bar chart!
Credit: Plotnine geom_tile() documentation

Getting to the right visual in R

For the purpose of this blog post, I generated some synthetic data representing a real estate portfolio in Germany. In case you are interested in how it was created, check out the python script with details (it was a lot of fun to play around with various probability distributions, I may write up a blog post about that separately one day).

We effectively had: 1) a list of contracts 2) a list of terms associated with each contract and 3) a monthly rental portfolio data set derived from the contract/term information.

Getting to the desired visualization was surprisingly easy. First, let’s select top 10 contracts by their future value (you don’t really want to visualize all contracts at once unless you intend to print it on A0 size paper..)

library(tidyverse)
library(data.table)
library(RColorBrewer)

#read in the table that contains individual contract terms with associated start/end dates
terms = fread("~/coding/real-estate/terms.csv")

#read in the table that contains monthly information for each term (SQM / rent income in a month)
#add proper year/month columns to the dataframe
PL = fread("~/coding/real-estate/monthly_data.csv") %>% 
  mutate(date = as.Date(strptime(paste0(month,"-01"), format="%Y-%m-%d"))) %>%
  mutate(year = year(date), month = month(date))

#calculate future value per contract (assuming we're at the end of 2020)
# filter it down to top10 contracts only
top_10_contracts = PL %>% filter(date > "2020-12-31") %>% 
  inner_join(terms, by='term_id') %>%
  inner_join(contracts, by="contract_id") %>% 
  group_by(contract_id) %>% summarize(future_value = sum(rent), .groups='drop_last') %>% 
  arrange(desc(future_value)) %>% head(10)

Then, we can proceed to get the first iteration of the chart. As geom_tile() requires “central” positions and width/height of the rectangles, let’s calculate the mid-period of each term to be used as the x-axis position, width will be equal to number of days, and height to monthly rent. Let’s try with one contract.

chart_df = 
  #filter to 1 contract
  top_10_contracts[1,] %>% 
  #get terms of contract
  inner_join(terms, by="contract_id") %>% 
  #calculate # of days of each term
  mutate(width = as.numeric(difftime(end, start, units="days"))) %>% 
  #calculate the mid-point of each term
  mutate(mid_point = start + width/2) %>% 
  #extensions are numeric, make sure they are treated as discrete variables
  mutate(extension = as.factor(extension)) 

ggplot(chart_df) + 
  geom_tile(aes(
    x=mid_point, 
    y=contract_id, 
    width=width, 
    height=monthly_rent, 
    fill=extension
  ))
Not bad for an initial attempt.

Seems like we’re on the right track, even if not beautiful. We can fix that later. Let’s try with 10 contracts.

Accidental art?

Right. That’s not that great. While initially I thought that R, for some reason, places each series on top of each other (i.e. ignores Y-axis parameter), I later realized the real issue was bar heights. They needed to be re-scaled. Here’s a fixed version where each bar height is scaled to maximum of 1 (and some formatting tweaks).

ggplot(chart_df) + 
  geom_tile(aes(
    x=mid_point, 
    y=contract_id, 
    width=width, 
    height=monthly_rent/max(monthly_rent), #that's the fix
    fill=extension)
  ) +
  ylab("") + xlab("Year") + theme_bw() + 
  theme(panel.grid.minor.y = element_blank(), panel.grid.major.y = element_blank())
Much better.

Almost perfect. Just need to add labels that indicate total value and sort the contracts from the highest value to the lowest. Here’s the final code that achieves all that.

chart_df = 
  top_10_contracts %>%
  #get terms of top 10 contracts
  inner_join(terms, by="contract_id") %>% 
  #calculate # of days of each term
  mutate(width = as.numeric(difftime(end, start, units="days"))) %>% 
  #calculate the mid-point of each term
  mutate(mid_point = start + width/2) %>% 
  #extensions are numeric, make sure they are treated as discrete variables
  mutate(extension = as.factor(extension)) %>% 
  #reorder contracts
  mutate(contract_id = reorder(as.character(contract_id), future_value, mean)) 

labels = 
  #group by contract
  chart_df %>% group_by(contract_id) %>% 
  #calculate total value and get end date
  summarize(end_date = max(end), value = max(future_value), .groups="drop_last") %>% 
  #format total value
  mutate(label = paste0(round(value / 1000000, 1), "M")) 

ggplot(chart_df) + 
  #create rectangle geoms
  geom_tile(aes(
    x=mid_point, 
    y=contract_id, 
    width=width, 
    height=monthly_rent/max(monthly_rent), 
    fill=extension
  )) + 
  #deal with axis labels
  ylab("") + xlab("Year") + theme_bw() + 
  #remove gridlines on Y-axis
  theme(panel.grid.minor.y = element_blank(), panel.grid.major.y = element_blank()) +
  #add labels at the end of each contract bar
  geom_label(data = labels, mapping=aes(end_date + 400, contract_id, label=label)) +
  #apply a better color palette
  scale_fill_brewer(palette = "RdYlGn")
Here we are.

Edit: I received a comment (thanks Q!) that the chart may look more intuitive if there is a shadow behind each bar indicating the maximum monthly value on the chart for reference. Here is how that could look like (code-wise, that is just another geom_tile() layer).

Putting it all together in Power BI

Power BI’s R visual integration is incredibly simple. Effectively, all you need to do is to use their built-in R visual, drop in the fields that you want to have available as part of the dataframe in R (Power BI does all the aggregation for you, it’s effectively passing the same dataset as if it was a matrix visual) and then adjust R code accordingly (some of the data transforms were no longer required).

While not applicable in this case, a couple of caveats about R Visuals in Power BI:

  • R visuals can be cross-filtered but are not interactive themselves and do not cross-filter other visuals in the report;
  • When published to Power BI Service, they will work as long as you use (a long list) of supported packages;
  • There are certain limitations to number of data points, processing time and etc.

Combine it with a dynamic TopN selection, and voilĂ . Click to see it in action.

A quick Power BI mock-up with dynamic interactions

All code used in this blog post is available at https://github.com/kamicollo/football-field-charts.