Split pandas dataframe in two if it has more than 10 rows
There is no specific convenience function.
You'd have to do something like:
first_ten = pd.DataFrame()
rest = pd.DataFrame()
if df.shape[0] > 10: # len(df) > 10 would also work
first_ten = df[:10]
rest = df[10:]
I used a List Comprehension to cut a huge DataFrame into blocks of 100'000:
size = 100000
list_of_dfs = [df.loc[i:i+size-1,:] for i in range(0, len(df),size)]
or as generator:
list_of_dfs = (df.loc[i:i+size-1,:] for i in range(0, len(df),size))
This will return the split DataFrames if the condition is met, otherwise return the original and None
(which you would then need to handle separately). Note that this assumes the splitting only has to happen one time per df
and that the second part of the split (if it is longer than 10 rows (meaning that the original was longer than 20 rows)) is OK.
df_new1, df_new2 = df[:10, :], df[10:, :] if len(df) > 10 else df, None
Note you can also use df.head(10)
and df.tail(len(df) - 10)
to get the front and back according to your needs. You can also use various indexing approaches: you can just provide the first dimensions index if you want, such as df[:10]
instead of df[:10, :]
(though I like to code explicitly about the dimensions you are taking). You can can also use df.iloc
and df.ix
to index in similar ways.
Be careful about using df.loc
however, since it is label-based and the input will never be interpreted as an integer position. .loc
would only work "accidentally" in the case when you happen to have index labels that are integers starting at 0 with no gaps.
But you should also consider the various options that pandas provides for dumping the contents of the DataFrame into HTML and possibly also LaTeX to make better designed tables for the presentation (instead of just copying and pasting). Simply Googling how to convert the DataFrame to these formats turns up lots of tutorials and advice for exactly this application.