shape vs len for numpy array
From the source code, it looks like shape basically uses len()
:
https://github.com/pandas-dev/pandas/blob/master/pandas/core/frame.py
@property
def shape(self) -> Tuple[int, int]:
return len(self.index), len(self.columns)
def __len__(self) -> int:
return len(self.index)
Calling shape will attempt to run both dim calcs. So maybe df.shape[0] + df.shape[1]
is slower than len(df.index) + len(df.columns)
. Still, performance-wise, the difference should be negligible except for a giant giant 2D dataframe.
So in line with the previous answers, df.shape
is good if you need both dimensions, for a single dimension, len()
seems more appropriate conceptually.
Looking at property vs method answers, it all points to usability and readability of code. So again, in your case, I would say if you want information about the whole dataframe just to check or for example to pass the shape tuple to a function, use shape
. For a single column, including index (i.e. the rows of a df), use len()
.
I wouldn't worry about performance here - any differences should only be very marginal.
I'd say the more pythonic alternative is probably the one which matches your needs more closely:
a.shape
may contain more information than len(a)
since it contains the size along all axes whereas len
only returns the size along the first axis:
>>> a = np.array([[1,2,3,4], [1,2,3,4]])
>>> len(a)
2
>>> a.shape
(2L, 4L)
If you actually happen to work with one-dimensional arrays only, than I'd personally favour using len(a)
in case you explicitly need the array's size.