Pandas read csv not reading a file properly. Not splitting into proper columns
The safest way I can think for is to read the csv twice:
rows = pd.read_csv('path/to/atp_matches_2016.csv', skiprows=[0], header = None)
# skip header line
rows = rows.dropna(axis=1, how='all')
# drop columns that only have NaNs
rows.columns = pd.read_csv('path/to/atp_matches_2016.csv', nrows=0).columns
print(rows.head(5))
Output:
tourney_id tourney_name surface draw_size tourney_level tourney_date \
0 2016-M020 Brisbane Hard 32.0 A 20160104.0
1 2016-M020 Brisbane Hard 32.0 A 20160104.0
2 2016-M020 Brisbane Hard 32.0 A 20160104.0
3 2016-M020 Brisbane Hard 32.0 A 20160104.0
4 2016-M020 Brisbane Hard 32.0 A 20160104.0
match_num winner_id winner_seed winner_entry ... w_bpFaced l_ace l_df \
0 300.0 105683.0 4.0 NaN ... 1.0 7.0 3.0
1 299.0 103819.0 1.0 NaN ... 1.0 2.0 4.0
2 298.0 105683.0 4.0 NaN ... 4.0 10.0 3.0
3 297.0 103819.0 1.0 NaN ... 1.0 8.0 2.0
4 296.0 106233.0 8.0 NaN ... 2.0 11.0 2.0
l_svpt l_1stIn l_1stWon l_2ndWon l_SvGms l_bpSaved l_bpFaced
0 61.0 34.0 25.0 14.0 10.0 3.0 5.0
1 55.0 31.0 18.0 9.0 8.0 2.0 6.0
2 84.0 54.0 41.0 16.0 12.0 2.0 2.0
3 104.0 62.0 46.0 21.0 16.0 8.0 11.0
4 98.0 52.0 41.0 27.0 15.0 7.0 8.0