Making pretty Excel files in R
It’s a tale as old as time. Your boss gave you a bunch of Excel files and you painstakingly made a bot that will import and display them in a Shiny dashboard. Proud of your work, you take it to your boss and they say “I don’t know what a ‘Shiny’ is, can’t you just give me one of those Excels back?” openxlsx
makes this easy.
Today we’re going to recreate an existing workbook with openxlsx
. To get started, I made a small .xlsx
file for us to work with that can be downloaded here. It’s an attendance list and some notes on supplies you’ll need for a company barbecue. Feel free to get familiar with it then come back here when you’re ready.
As usual, the first thing to do is load our library with library(openxlsx)
. We can try and load the workbook, the Excel file, with read.xlsx
, but for now we’re going to use loadWorkbook
. This loads a workbook object and exposes some workbook properties to us rather than just the raw data like with read.xlsx
.
Because we loaded the data as a workbook object, we can use getStyles
to load the styles and preview them. Unfortunately, the styles can’t pull conditional formatting and don’t keep track of which cells use which styles. By cross referencing the list of styles and bbq.xlsx
, we can identify some styles to use and them assign them each to a variable.
I’ve chosen to only show the first two items from the styles
list here.
[[1]]
A custom cell style.
Cell formatting: GENERAL
Font name: Calibri
Font size: 14
Font colour: 1
Font decoration: BOLD
[[2]]
A custom cell style.
Cell formatting: GENERAL
Cell horz. align: center
First thing we have to do is create a workbook object that we’ll call wb
. We can quickly preview it by just typing wb
into our chunk or the console.
A Workbook object.
Worksheets:
No worksheets attached
We’re going to make the Supplies
sheet first as it’s a little bit easier. We first add a new worksheet named Supplies
to the workbook, then we can write the relevant data to that sheet along with styling our column headers using headerStyle
from earlier.
All of the numbers in the Supplies
sheet are centered both horizontally and vertically. We can achieve this by adding the numberStyle
from before to those cells. rows
and cols
both start at 2 because row one is the header row and column one is the supply type. rows
goes to 5 because there are 4 rows of numbers (remember the header row) and cols
goes to 5 because there are 4 columns of numbers. gridExpand = TRUE
makes sure that all cell reference combinations possible with rows
and cols
are used, rather than doing an entire row or column.
If we go back to our data in R, the Quantity
for Hotdog buns and Hamburger buns is empty. A bit further back, we see that that’s because those cells were merged with the Quantity
of Hotdogs and Hamburgers respectively. We can replicate this using mergeCells
. Unlike addStyle
, we don’t need to use gridExpand
to merge all the cells as it is implied. cols
will be 2 for both and we want rows
2 and 3 for Hotdogs and rows
4 and 5 for Hamburgers.
Our last step on this worksheet is to set the column widths to auto. We again need to specify cols
and can just do 1:5
so that all columns are affected.
Our Supplies
sheet is now complete, but we’re not going to write it just yet because we still need to do the Attendance
sheet.
Like before, we first add the Attendance
worksheet and write the data to it.
The RSVPed
and Status
columns each have some conditional formatting. We can reuse the color formatting from the RSVPed
column on the Status
column, so we’re going to separate the color style from the alignment style. It was easy to see how many rows we have in the Supplies
sheet but not here so we’re going to create a new variable that has the rowNumbers
. Again, we start on row 2 because of the header row and will end at our last data row plus 1.
Our first step is to center the values in the RSVPed
column with an addStyle
. Next, while we could manually color each cell using a for
loop, it’s more efficient to use conditionalFormatting
. This also has the added bonus of showing in Excel and responding to any changes. The rule
argument may be a little strange, and that’s because it must match how the same formatting rule would be written in Excel. In this case, we use our top-leftmost cell as the reference cell in the rule, B2
, then we check if it is TRUE
or FALSE
. When the rule is applied down the cells in column B
, the row number will change to match the current row.
The Status
column is very similar to the RSVPed
column, but we add a style for Tentative
responses and then need a third conditionalFormatting
. The newest part here is in the rule
argument. We need to put quotes around Yes
so that Excel knows that it is a string and need to use \"
so that R knows that the quote is part of the string. Once our formatting has been applied, we set the column widths as we did before.
The last thing we need to do is reorder the worksheets so that Attendance
is first because when we created the workbook in R, we created the Supplies
worksheet first. Unfortunately, worksheetOrder
only supports integer vectors. We can check our worksheet numbers by calling the object again through either the chunk or console. Then we set the worksheet order and save the workbook to an .xlsx
file.
If we open the new file, we can see that they are nearly identical. The biggest difference between the two is that the original used formulas to calculate the values in Supplies
.
If you did want to take it a step further and use those instead, the writeFormula
function is your friend. In any case, I highly encourage everyone to at least skim through the openxlsx
documentation here because it has so much to offer to help streamline Excel file generation.
All the code for this article is available here. If you want to see more from me, check out my GitHub or guslipkin.github.io. If you want to hear from me, I’m also on Twitter at guslipkin.
Gus Lipkin is a Data Scientist, Business Analyst, and occasional bike mechanic