UndefinedVariableError when querying pandas DataFrame
data1 = [np.array(df.query('type == @i')['continuous']
for i in ('Type1', 'Type2', 'Type3', 'Type4')]
use '@' to refer variables
please refer to documentation, which writes:
You can refer to variables in the environment by prefixing them with an ‘@’ character like @a + b.
I know too late but maybe it helps somebody - use double quotes for i
data1 = [np.array(df.query('type == "i"')['continuous']
The i
in your query expression
df.query('type == i')
is literally just the string 'i'
. Since there are no extra enclosing quotes around it, pandas interprets it as the name of another column in your DataFrame
, i.e. it looks for cases where
df['type'] == df['i']
Since there is no i
column, you get an UndefinedVariableError
.
It looks like you intended to query where the values in the type
column are equal to the string variable named i
, i.e. where
df['type'] == 'Type1'
df['type'] == 'Type2' # etc.
In this case you need to actually insert the string i
into the query expression:
df.query('type == "%s"' % i)
The extra set of quotes are necessary if 'Type1'
, 'Type2'
etc. are values within the type
column, but not if they are the names of other columns in the dataframe.