Making pretty Excel files in R

Gus Lipkin
5 min readFeb 24, 2022

--

It’s a tale as old as time. Your boss gave you a bunch of Excel files and you painstakingly made a bot that will import and display them in a Shiny dashboard. Proud of your work, you take it to your boss and they say “I don’t know what a ‘Shiny’ is, can’t you just give me one of those Excels back?” openxlsx makes this easy.

Today we’re going to recreate an existing workbook with openxlsx. To get started, I made a small .xlsx file for us to work with that can be downloaded here. It’s an attendance list and some notes on supplies you’ll need for a company barbecue. Feel free to get familiar with it then come back here when you’re ready.

A preview of the Attendance and Supplies sheets

As usual, the first thing to do is load our library with library(openxlsx). We can try and load the workbook, the Excel file, with read.xlsx, but for now we’re going to use loadWorkbook. This loads a workbook object and exposes some workbook properties to us rather than just the raw data like with read.xlsx.

Loading the xlsx file as a workbook object
The attendance data.frame
The supplies data.frame

Because we loaded the data as a workbook object, we can use getStyles to load the styles and preview them. Unfortunately, the styles can’t pull conditional formatting and don’t keep track of which cells use which styles. By cross referencing the list of styles and bbq.xlsx, we can identify some styles to use and them assign them each to a variable.

Getting a list of the workbook styles

I’ve chosen to only show the first two items from the styles list here.

[[1]]
A custom cell style.

Cell formatting: GENERAL
Font name: Calibri
Font size: 14
Font colour: 1
Font decoration: BOLD



[[2]]
A custom cell style.

Cell formatting: GENERAL
Cell horz. align: center

First thing we have to do is create a workbook object that we’ll call wb. We can quickly preview it by just typing wb into our chunk or the console.

Creating a workbook object
A Workbook object.

Worksheets:
No worksheets attached

We’re going to make the Supplies sheet first as it’s a little bit easier. We first add a new worksheet named Supplies to the workbook, then we can write the relevant data to that sheet along with styling our column headers using headerStyle from earlier.

Creating the supplies worksheet

All of the numbers in the Supplies sheet are centered both horizontally and vertically. We can achieve this by adding the numberStyle from before to those cells. rows and cols both start at 2 because row one is the header row and column one is the supply type. rows goes to 5 because there are 4 rows of numbers (remember the header row) and cols goes to 5 because there are 4 columns of numbers. gridExpand = TRUE makes sure that all cell reference combinations possible with rows and cols are used, rather than doing an entire row or column.

Adding number formatting to the numbers

If we go back to our data in R, the Quantity for Hotdog buns and Hamburger buns is empty. A bit further back, we see that that’s because those cells were merged with the Quantity of Hotdogs and Hamburgers respectively. We can replicate this using mergeCells. Unlike addStyle, we don’t need to use gridExpand to merge all the cells as it is implied. cols will be 2 for both and we want rows 2 and 3 for Hotdogs and rows 4 and 5 for Hamburgers.

Merging the hotdog and hamburger quantity cells

Our last step on this worksheet is to set the column widths to auto. We again need to specify cols and can just do 1:5 so that all columns are affected.

Setting the column widths

Our Supplies sheet is now complete, but we’re not going to write it just yet because we still need to do the Attendance sheet.

Like before, we first add the Attendance worksheet and write the data to it.

Creating the attendance worksheet

The RSVPed and Status columns each have some conditional formatting. We can reuse the color formatting from the RSVPed column on the Status column, so we’re going to separate the color style from the alignment style. It was easy to see how many rows we have in the Supplies sheet but not here so we’re going to create a new variable that has the rowNumbers. Again, we start on row 2 because of the header row and will end at our last data row plus 1.

Creating styles for conditional formatting

Our first step is to center the values in the RSVPed column with an addStyle. Next, while we could manually color each cell using a for loop, it’s more efficient to use conditionalFormatting. This also has the added bonus of showing in Excel and responding to any changes. The rule argument may be a little strange, and that’s because it must match how the same formatting rule would be written in Excel. In this case, we use our top-leftmost cell as the reference cell in the rule, B2, then we check if it is TRUE or FALSE. When the rule is applied down the cells in column B, the row number will change to match the current row.

Styling the RSVPed column

The Status column is very similar to the RSVPed column, but we add a style for Tentative responses and then need a third conditionalFormatting. The newest part here is in the rule argument. We need to put quotes around Yes so that Excel knows that it is a string and need to use \" so that R knows that the quote is part of the string. Once our formatting has been applied, we set the column widths as we did before.

More conditional formatting for the Status column

The last thing we need to do is reorder the worksheets so that Attendance is first because when we created the workbook in R, we created the Supplies worksheet first. Unfortunately, worksheetOrder only supports integer vectors. We can check our worksheet numbers by calling the object again through either the chunk or console. Then we set the worksheet order and save the workbook to an .xlsx file.

Setting the worksheet order and saving the workbook

If we open the new file, we can see that they are nearly identical. The biggest difference between the two is that the original used formulas to calculate the values in Supplies.

If you did want to take it a step further and use those instead, the writeFormula function is your friend. In any case, I highly encourage everyone to at least skim through the openxlsx documentation here because it has so much to offer to help streamline Excel file generation.

The R logo recreated in Excel
My header image. The R logo painstakingly recreated in Excel

All the code for this article is available here. If you want to see more from me, check out my GitHub or guslipkin.github.io. If you want to hear from me, I’m also on Twitter at guslipkin.

Gus Lipkin is a Data Scientist, Business Analyst, and occasional bike mechanic

--

--

Gus Lipkin

My roommate said you can always rely on me to burn my food. At least I’m reliable | data scientist and occasional 🚲 mechanic | he/him | guslipkin.me