Adding calculated column(s) to a dataframe in pandas

The exact code will vary for each of the columns you want to do, but it's likely you'll want to use the map and apply functions. In some cases you can just compute using the existing columns directly, since the columns are Pandas Series objects, which also work as Numpy arrays, which automatically work element-wise for usual mathematical operations.

>>> d
    A   B  C
0  11  13  5
1   6   7  4
2   8   3  6
3   4   8  7
4   0   1  7
>>> (d.A + d.B) / d.C
0    4.800000
1    3.250000
2    1.833333
3    1.714286
4    0.142857
>>> d.A > d.C
0     True
1     True
2     True
3    False
4    False

If you need to use operations like max and min within a row, you can use apply with axis=1 to apply any function you like to each row. Here's an example that computes min(A, B)-C, which seems to be like your "lower wick":

>>> d.apply(lambda row: min([row['A'], row['B']])-row['C'], axis=1)
0    6
1    2
2   -3
3   -3
4   -7

Hopefully that gives you some idea of how to proceed.

Edit: to compare rows against neighboring rows, the simplest approach is to slice the columns you want to compare, leaving off the beginning/end, and then compare the resulting slices. For instance, this will tell you for which rows the element in column A is less than the next row's element in column C:

d['A'][:-1] < d['C'][1:]

and this does it the other way, telling you which rows have A less than the preceding row's C:

d['A'][1:] < d['C'][:-1]

Doing ['A"][:-1] slices off the last element of column A, and doing ['C'][1:] slices off the first element of column C, so when you line these two up and compare them, you're comparing each element in A with the C from the following row.

You could have is_hammer in terms of row["Open"] etc. as follows

def is_hammer(rOpen,rLow,rClose,rHigh):
    return lower_wick_at_least_twice_real_body(rOpen,rLow,rClose) \
       and closed_in_top_half_of_range(rHigh,rLow,rClose)

Then you can use map:

df["isHammer"] = map(is_hammer, df["Open"], df["Low"], df["Close"], df["High"])

For the second part of your question, you can also use shift, for example:

df['t-1'] = df['t'].shift(1)

t-1 would then contain the values from t one row above.

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.shift.html

Adding calculated column(s) to a dataframe in pandas

Tags:

Python

Pandas

Related

Recent Posts