How to correctly read csv in Pandas while changing the names of the columns
According to documentation your usecols list should be subset of new names list
usecols : list-like or callable, default None
Return a subset of the columns. If list-like, all elements must either
be positional (i.e. integer indices into the document columns) or strings
that correspond to column names provided either by the user in `names` or
inferred from the document header row(s).
Example of csv
"OLD1", "OLD2", "OLD3"
1,2,3
4,5,6
Code for renaming OLDX -> NEWX and using only NEW2 + NEW3
import pandas as pd
d = pd.read_csv('test.csv', header=0, names=['NEW1', 'NEW2', 'NEW3'], usecols=['NEW2', 'NEW3'])
Output
NEW2 NEW3
0 2 3
1 5 6
NOTE: Even if above is working as expected there is an issue while changing engine='python'
d = pd.read_csv('test.csv', header=0, engine='python',
names=['NEW1', 'NEW2', 'NEW3'], usecols=['NEW2', 'NEW3'])
ValueError: Number of passed names did not match number of header fields in the file
Workaround is set header=None
and skiprows=[0,]
:
d = pd.read_csv('test.csv', header=None, skiprows=[0,], engine='python', names=['NEW1', 'NEW2', 'NEW3'], usecols=['NEW2', 'NEW3'])
Output
NEW2 NEW3
0 2 3
1 5 6
Pandas version: 0.23.4
You are right, something is odd with the name
attributes. Seems to me that you can not use both in the same time. Either you set the name for every columns of the CSV file or you don't set the name at all. So it seems that you can't set the name when you are not taking all the colums (usecols
)
names : array-like
List of column names to use. If file contains no header row, then you should explicitly pass header=None
You might already know it but you can rename the colums after also.
import pandas as pd
from StringIO import StringIO
csv = r"""Date,Open Price,High Price,Low Price,Close Price,WAP,No.of Shares,No. of Trades,Total Turnover (Rs.),Deliverable Quantity,% Deli. Qty to Traded Qty,Spread High-Low,Spread Close-Open
28-February-2015,2270.00,2310.00,2258.00,2294.85,2279.192067772602217319,73422,8043,167342840.00,11556,15.74,52.00,24.85
27-February-2015,2267.25,2280.85,2258.00,2266.35,2269.239841485775122730,50721,4938,115098114.00,12297,24.24,22.85,-0.90
26-February-2015,2314.90,2314.90,2250.00,2259.50,2277.198324862194860047,69845,8403,159050917.00,22046,31.56,64.90,-55.40
25-February-2015,2290.00,2332.00,2278.35,2318.05,2315.100614216488163214,161995,10174,375034724.00,102972,63.56,53.65,28.05
24-February-2015,2276.05,2295.00,2258.00,2278.15,2281.058946240263344242,52251,7726,119187611.00,13292,25.44,37.00,2.10
23-February-2015,2303.95,2311.00,2253.25,2270.70,2281.912259219760108491,75951,7344,173313518.00,24969,32.88,57.75,-33.25
20-February-2015,2324.00,2335.20,2277.00,2284.30,2301.631421152326354478,79717,10233,183479152.00,23045,28.91,58.20,-39.70
19-February-2015,2304.00,2333.90,2292.00,2326.60,2321.485466301625211160,85835,8847,199264705.00,29728,34.63,41.90,22.60
18-February-2015,2284.00,2305.00,2261.10,2295.75,2282.060986778089405300,69884,6639,159479550.00,26665,38.16,43.90,11.75
16-February-2015,2281.00,2305.85,2266.00,2278.50,2284.961866239581019628,85541,10149,195457923.00,22164,25.91,39.85,-2.50
13-February-2015,2311.00,2324.90,2286.95,2296.40,2311.371235111317676864,109731,5570,253629077.00,69039,62.92,37.95,-14.60
12-February-2015,2280.00,2322.85,2275.00,2315.45,2301.372038211769425569,79766,9095,183571242.00,33981,42.60,47.85,35.45
11-February-2015,2275.00,2295.00,2258.25,2287.20,2279.587966250020639664,60563,7467,138058686.00,20058,33.12,36.75,12.20
10-February-2015,2244.90,2297.40,2225.00,2280.30,2269.562228214830293104,141656,13026,321497107.00,55577,39.23,72.40,35.40"""
df = pd.read_csv(StringIO(csv),
usecols=["Date", "Open Price", "Close Price"],
header=0)
df.columns = ['Date', 'O', 'C']
df
output:
Date O C
0 28-February-2015 2270.00 2294.85
1 27-February-2015 2267.25 2266.35
2 26-February-2015 2314.90 2259.50
3 25-February-2015 2290.00 2318.05
4 24-February-2015 2276.05 2278.15
5 23-February-2015 2303.95 2270.70
6 20-February-2015 2324.00 2284.30
7 19-February-2015 2304.00 2326.60
8 18-February-2015 2284.00 2295.75
9 16-February-2015 2281.00 2278.50
10 13-February-2015 2311.00 2296.40
11 12-February-2015 2280.00 2315.45
12 11-February-2015 2275.00 2287.20
13 10-February-2015 2244.90 2280.30