Pandas dataframe from excel named range

Maybe someday pandas will support this natively. Until then, I use a helper function:

import pandas as pd
import openpyxl

def data_frame_from_xlsx(xlsx_file, range_name):
    """ Get a single rectangular region from the specified file.
    range_name can be a standard Excel reference ('Sheet1!A2:B7') or 
    refer to a named region ('my_cells')."""
    wb = openpyxl.load_workbook(xlsx_file, data_only=True, read_only=True)
    if '!' in range_name:
        # passed a worksheet!cell reference
        ws_name, reg = range_name.split('!')
        if ws_name.startswith("'") and ws_name.endswith("'"):
            # optionally strip single quotes around sheet name
            ws_name = ws_name[1:-1]
        region = wb[ws_name][reg]
    else:
        # passed a named range; find the cells in the workbook
        full_range = wb.get_named_range(range_name)
        if full_range is None:
            raise ValueError(
                'Range "{}" not found in workbook "{}".'.format(range_name, xlsx_file)
            )
        # convert to list (openpyxl 2.3 returns a list but 2.4+ returns a generator)
        destinations = list(full_range.destinations) 
        if len(destinations) > 1:
            raise ValueError(
                'Range "{}" in workbook "{}" contains more than one region.'
                .format(range_name, xlsx_file)
            )
        ws, reg = destinations[0]
        # convert to worksheet object (openpyxl 2.3 returns a worksheet object 
        # but 2.4+ returns the name of a worksheet)
        if isinstance(ws, str):
            ws = wb[ws]
        region = ws[reg]
    # an anonymous user suggested this to catch a single-cell range (untested):
    # if not isinstance(region, 'tuple'): df = pd.DataFrame(region.value)
    df = pd.DataFrame([cell.value for cell in row] for row in region)
    return df

To quote the Microsoft Office help pages!:

A [named range] is a meaningful shorthand that makes it easier to understand the purpose of a cell reference, constant, formula, or table, each of which may be difficult to comprehend at first glance."

Named ranges are furthermore frequently used in spreadsheets to easier access data through ODBC and are particularly useful when there are several data ranges within the same worksheet. To connect via ODBC to Excel, simply choose the appropriate Excel driver and send an SQL statement such as e.g.:

SELECT * 
FROM namedRange

The useful command in Pandas would probably be read_sql.

In Windows, this solution requires however that you align/streamline the installed software versions (32-bit or 64-bit) of Excel, the ODBC driver and the software package from which you open the ODBC connection. As an example, an installed Excel 32-bit version will require a 32-bit ODBC driver and normally a 32-bit installation of Python. Note: this latter point remains to be confirmed for the Python case (I'm a beginner to Python), but I can definitely confirm this point for ODBC connections launched from SAS, SPSS or Stata.

The previous requirement is a very significant drawback and actually speaks in favor of any solution which does not involve ODBC at all. That said, it would be nice if read_Excel provided such a facility. In this context, it is interesting to note that SAS, SPSS and Stata currently do not allow direct access to named ranges in their respective Excel filters - so maybe there is an objective reason for this lacking feature...


Here is the way I use openpyxl to copy a range in a [[]] :

wb = load_workbook(filename=xlPath)
ws, range= next(wb.defined_names["rangename"].destinations)
materials = [[cell.value for cell in row] for row in wb[ws][range]]