How to filter a set of rows according to an indexed position?
After sorting the dataframe you can use str.split
to split the strings in the user
column to create a grouping key
, then group
the dataframe on this grouping key and for each subgroup per user
create a mapping of user
-> dataframe
inside a dict
comprehension:
key = df1['user'].str.split().str[0]
dct = {user:grp.reset_index(drop=True) for user, grp in df1.groupby(key)}
Now to access the dataframe corresponding to the user
we can simply lookup inside the dictionary:
>>> dct['John']
user value
0 John (2) 6
1 John (3) 3
2 John (1) 1
>>> dct['Peter']
user value
0 Peter (2) 3
1 Peter (3) 3
2 Peter (1) 1
>>> dct['Johnny']
user value
0 Johnny (1) 4
1 Johnny (2) 1
df1 = pd.DataFrame({"user": ["Peter (1)", "Peter (2)", "Peter (3)","John (1)","John (2)","John (3)","Johnny (1)","Johnny (2)"], "value": [1, 3, 3, 1, 6, 3, 4, 1]}, )
df1=df1.sort_values(by='value', ascending=False)
cols = df1.columns.tolist()
df1['name'] = df1['user'].replace(r'\s\(\d\)','',regex=True)
grp = df1.groupby(by=['name'])
dataframes = [grp.get_group(x)[cols] for x in grp.groups]
df2, df3 = dataframes[:2] # as mentioned, we are interested just in first two users
df2
:
user value
3 John (1) 1
4 John (2) 6
5 John (3) 3
df3
:
user value
6 Johnny (1) 4
7 Johnny (2) 1
You can get the first index value and split it and exclude last item(assuming that user name may have parenthesis), and then search for the value in the entire dataframe for that particular column. For example:
firstIndexUser = df1['user'].str.split('(').str[:-1].str.join('(').iloc[0]
This firstIndexUser will have value as 'John' Now you can compare with against the entire dataframe to get your df2
df2 = df1[df1['user'].str.split('(').str[:-1].str.join('(')==firstIndexUser]
The output looks like this:
>>df2
user value
0 John (2) 6
4 John (3) 3
6 John (1) 1
If you want, you can reset the index for df2
>>df2.reset_index(drop=True, inplace=True)
>>df2
user value
0 John (2) 6
1 John (3) 3
2 John (1) 1
You can follow the similar approach for your df3