Get Excel cell background color in pandas read_excel?
The Solution suggested above works only for xls
file, not for xlsx
file. This raises a NotImplementedError: formatting_info=True not yet implemented
. Xlrd
library is still not updated to work for xlsx
files. So you have to Save As
and change the format every time which may not work for you.
Here is a solution for xlsx
files using openpyxl
library. A2
is the cell whose color code we need to find out.
import openpyxl
from openpyxl import load_workbook
excel_file = 'color_codes.xlsx'
wb = load_workbook(excel_file, data_only = True)
sh = wb['Sheet1']
color_in_hex = sh['A2'].fill.start_color.index # this gives you Hexadecimal value of the color
print ('HEX =',color_in_hex)
print('RGB =', tuple(int(color_in_hex[i:i+2], 16) for i in (0, 2, 4))) # Color in RGB
Brute-forced it through xlrd
, as per Mark's suggestion:
from xlrd import open_workbook
wb = open_workbook('wb.xls', formatting_info=True)
sheet = wb.sheet_by_name("mysheet")
#create empy colormask matrix
bgcol=np.zeros([sheet.nrows,sheet.ncols])
#cycle through all cells to get colors
for row in range(sheet.nrows):
for column in range(sheet.ncols):
cell = sheet.cell(row, column)
fmt = wb.xf_list[cell.xf_index]
bgcol[row,column]=fmt.background.background_colour_index
#return pandas mask of colors
colormask=pd.DataFrame(bgcol)
Yet, there must be a better way thorugh pandas directly...
I edited the code snippet from @csaladenes's response above based on this link, and it works for my xls file (the original resulted in all cells showing the same color index, though they have different background colors):
import xlrd
import numpy as np
wb = xlrd.open_workbook(file, formatting_info=True)
sheet = wb.sheet_by_name("mysheet")
bgcol=np.zeros([sheet.nrows,sheet.ncols])
for row in range(sheet.nrows):
for col in range(sheet.ncols):
c = sheet.cell(row, col)
cif = sheet.cell_xf_index(row, col)
iif = wb.xf_list[cif]
cbg = iif.background.pattern_colour_index
bgcol[row,col] = cbg