Renaming columns when querying with SQLAlchemy into Pandas DataFrame
I am not a SQLAlchemy expert by any means, but I have come up with a more generalized solution (or at least a start).
Caveats
- Will not handle mapped columns with the same name across different Models. You should deal with this by adding suffix, or you could modify my answer below to create pandas columns as
<tablename/model name>.<mapper column name>
.
It involves four key steps:
- Qualify your query statement with labels, which will result in column names in pandas of
<table name>_<column name>
:
df = pd.read_sql(query.statement, query.session.bind).with_labels()
- Separate table name from (actual) column name
table_name, col = col_name.split('_', 1)
- Get the Model based on tablename (from this question's answers)
for c in Base._decl_class_registry.values():
if hasattr(c, '__tablename__') and c.__tablename__ == tname:
return c
- Find the correct mapped name
for k, v in sa_class.__mapper__.columns.items():
if v.name == col:
return k
Bringing it all together, this is the solution I have come up with, with the main caveat being it will result in duplicate column names in your dataframe if you (likely) have duplicate mapped names across classes.
from sqlalchemy import Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class School(Base):
__tablename__ = 'DimSchool'
id = Column('SchoolKey', Integer, primary_key=True)
name = Column('SchoolName', String)
district = Column('SchoolDistrict', String)
class StudentScore(Base):
__tablename__ = 'FactStudentScore'
SchoolKey = Column('SchoolKey', Integer, ForeignKey('DimSchool.SchoolKey'), primary_key = True)
PointsPossible = Column('PointsPossible', Integer)
PointsReceived = Column('PointsReceived', Integer)
school = relationship("School", backref='studentscore')
def mapped_col_name(col_name):
''' Retrieves mapped Model based on
actual table name (as given in pandas.read_sql)
'''
def sa_class(table_name):
for c in Base._decl_class_registry.values():
if hasattr(c, '__tablename__') and c.__tablename__ == tname:
return c
table_name, col = col_name.split('_', 1)
sa_class = sa_class(table_name)
for k, v in sa_class.__mapper__.columns.items():
if v.name == col:
return k
query = session.query(StudentScore, School).join(School)
df = pd.read_sql(query.statement, query.session.bind).with_labels()
df.columns = map(mapped_col_name, df.columns)