Using str in split in pandas
It is explained in the documentation under Indexing using str
.str[index] notation indexes the string by position where as [index] will slice based on the index of the series.
Using the example
s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan,'CABA', 'dog', 'cat'])
s.str[3]
returns the element at index 3 at each row
0 NaN
1 NaN
2 NaN
3 a
4 a
5 NaN
6 A
7 NaN
8 NaN
Whereas
s[3]
returns
'Aaba'
chess_data
is a dataframechess_data.winner
is a serieschess_data.winner.str
is an accessor to methods that are string specific and optimized (to a degree)chess_data.winner.str.split
is one such methodchess_data.winner.map
is a different method that takes a dictionary or a callable object and either calls that callable with each element in the series or calls the dictionariesget
method on each element of the series.
In the case of using chess_data.winner.str.split
Pandas does do a loop and performs a kind of str.split
. While map
is a more crude way of doing the same thing.
With your data.
chess_data.winner.str.split(':')
0 [A, 1]
1 [A, 2]
2 [A, 3]
3 [A, 4]
4 [B, 1]
5 [B, 2]
Name: winner, dtype: object
In order to get each first element, you'll want to use the string accessor again
chess_data.winner.str.split(':').str[0]
0 A
1 A
2 A
3 A
4 B
5 B
Name: winner, dtype: object
This is the equivalent way of performing what you had done in your map
chess_data.winner.map(lambda x: x.split(':')[0])
You could have also used a comprehension
chess_data.assign(new_col=[x.split(':')[0] for x in chess_data.winner])
winner new_col
0 A:1 A
1 A:2 A
2 A:3 A
3 A:4 A
4 B:1 B
5 B:2 B
Your code,
chess_data['winner'].str.split(':')[0]
['A', '1']
Is the same as,
chess_data['winner'].str.split(':').loc[0]
['A', '1']
And,
chess_data['winner'].map(lambda n: n.split(':')[0])
0 A
1 A
2 A
3 A
4 B
5 B
Name: winner, dtype: object
Is the same as,
chess_data.winner.str.split(':').str[0]
0 A
1 A
2 A
3 A
4 B
5 B
Name: winner, dtype: object
Which is also the same as,
pd.Series([x.split(':')[0] for x in chess_data['winner']], name='winner')
0 A
1 A
2 A
3 A
4 B
5 B
Name: winner, dtype: object