Pandas - get first n-rows based on percentage
I want to pop first 5% of record
There is no built-in method but you can do this:
You can multiply
the total number of rows to your percent and use the result as parameter for head
method.
n = 5
df.head(int(len(df)*(n/100)))
So if your dataframe contains 1000
rows and n = 5%
you will get the first 50
rows.
I've extended Mihai's answer for my usage and it may be useful to people out there. The purpose is automated top-n records selection for time series sampling, so you're sure you're taking old records for training and recent records for testing.
# having
# import pandas as pd
# df = pd.DataFrame...
def sample_first_prows(data, perc=0.7):
import pandas as pd
return data.head(int(len(data)*(perc)))
train = sample_first_prows(df)
test = df.iloc[max(train.index):]