How to read merged excel cells with R
If a VBA/R hybrid suits your purposes, here is a VBA macro which will unmerge all cells in a worksheet, while simultaneously filling all cells in the unmerged region with the corresponding value:
Sub UnMerge(ws As Worksheet)
Dim R As Range, c As Range
Dim v As Variant
For Each c In ws.UsedRange
If c.MergeCells Then
v = c.Value
Set R = c.MergeArea
R.UnMerge
R.Value = v
End If
Next c
End Sub
A simple test to show how it is called:
Sub test()
UnMerge Sheets(1)
End Sub
The sub UnMerged
can be used as part of a larger program that e.g. iterates over all .xlsx
files in a folder and all data-containing sheets in the files, unmerging them all and saving them as .csv files.
On Edit. Native VBA file handling is somewhat annoying. I tend to use the related scripting language VBScript if I need to iterate over multiple files. I'm not sure if your virtual Windows can handle VBScript. I would assume so since VBScript is a standard part of the Windows OS. If this is the case, see if the following works (after backing up the files just to be safe). Save the code as a simple text file with a .vbs
extension in the folder that contains the Excel files that you want to modify. Then, simply click its icon. It will iterate over all .xlx
and .xlsx
files in the directory that contains the script and unmerge sheet 1 in each such file. I didn't test it extensively and it contains no error-handling, but I did test it on a folder with three Excel files which each contained multiple merged regions and it ran as expected on my Windows machine. I don't know if it will work on your Mac:
Option Explicit
Dim fso,fol,f,xl, wb, ws,ext,v,r,c
Set fso = WScript.CreateObject("Scripting.FileSystemObject")
Set xl = CreateObject("Excel.Application")
xl.DisplayAlerts = False
xl.ScreenUpdating = False
set fol = fso.GetFolder(fso.GetParentFolderName(WScript.ScriptFullName))
For Each f In fol.Files
ext = LCase(fso.GetExtensionName(f.Name))
If ext = "xls" Or ext = "xlsx" Then
Set wb = xl.Workbooks.Open(f.Path)
Set ws = wb.Sheets(1)
For Each c In ws.UsedRange
If c.MergeCells Then
v = c.Value
Set R = c.MergeArea
R.UnMerge
R.Value = v
End If
Next
wb.Save
wb.Close
End If
Next
This solution assumes that there is only one merged area on a spreadsheet, and that the only missing values (NAs) are due to the merged cells.
Code:
library("openxlsx")
data = read.xlsx(xlsxFile = "Book1.xlsx", colNames = F)
cl = min(ceiling(which(is.na(data))/dim(data)[1]))
rw = min(which(is.na(data))%%dim(data)[1])
data[is.na(data)] = data[rw,cl]
Example:
Data read from an excel with merged cells:
X1 X2 X3 X4 X5
1 1 a q a 11
2 2 b w s 22
3 3 c e d 33
4 4 d <NA> <NA> 44
5 5 <NA> <NA> <NA> 55
6 6 <NA> <NA> <NA> 66
7 7 g u j 77
8 8 h i k 88
9 9 i o l 99
10 10 j p m 110
As you see, "d" was merged in rows 4 to 6 and columns 2 to 4. The only NAs are due to the merged cells.
From the proposed code, cl
and rw
find the column and row of the merged value "d".
The last line finds all NAs and replaces them with "d".
Result:
X1 X2 X3 X4 X5
1 1 a q a 11
2 2 b w s 22
3 3 c e d 33
4 4 d d d 44
5 5 d d d 55
6 6 d d d 66
7 7 g u j 77
8 8 h i k 88
9 9 i o l 99
10 10 j p m 110
Note:
The colnames = F
should be removed if your Excel data has column names.
library(openxlsx)
data <- read.xlsx(xlsxFile = "Your path", fillMergedCells = TRUE, colNames = FALSE)
fillMergedCells = TRUE
Try this!