Read merged cells in Excel with Python
You can also try using fillna method available in pandas https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html
df = pd.read_excel(dir+filename,header=1)
df[ColName] = df[ColName].fillna(method='ffill')
This should replace the cell's value with the previous value
I just tried this and it seems to work for your sample data:
all_data = []
excel = xlrd.open_workbook(excel_dir+ excel_file)
sheet_0 = excel.sheet_by_index(0) # Open the first tab
prev_row = [None for i in range(sheet_0.ncols)]
for row_index in range(sheet_0.nrows):
row= []
for col_index in range(sheet_0.ncols):
value = sheet_0.cell(rowx=row_index,colx=col_index).value
if len(value) == 0:
value = prev_row[col_index]
row.append(value)
prev_row = row
all_data.append(row)
returning
[['2', '0', '30'], ['2', '1', '20'], ['2', '5', '52']]
It keeps track of the values from the previous row and uses them if the corresponding value from the current row is empty.
Note that the above code does not check if a given cell is actually part of a merged set of cells, so it could possibly duplicate previous values in cases where the cell should really be empty. Still, it might be of some help.
Additional information:
I subsequently found a documentation page that talks about a merged_cells
attribute that one can use to determine the cells that are included in various ranges of merged cells. The documentation says that it is "New in version 0.6.1", but when i tried to use it with xlrd-0.9.3 as installed by pip
I got the error
NotImplementedError: formatting_info=True not yet implemented
I'm not particularly inclined to start chasing down different versions of xlrd to test the merged_cells
feature, but perhaps you might be interested in doing so if the above code is insufficient for your needs and you encounter the same error that I did with formatting_info=True
.