What is the difference between a pandas Series and a single-column DataFrame?
from the pandas doc http://pandas.pydata.org/pandas-docs/stable/dsintro.html Series is a one-dimensional labeled array capable of holding any data type. To read data in form of panda Series:
import pandas as pd
ds = pd.Series(data, index=index)
DataFrame is a 2-dimensional labeled data structure with columns of potentially different types.
import pandas as pd
df = pd.DataFrame(data, index=index)
In both of the above index is list
for example: I have a csv file with following data:
,country,popuplation,area,capital
BR,Brazil,10210,12015,Brasile
RU,Russia,1025,457,Moscow
IN,India,10458,457787,New Delhi
To read above data as series and data frame:
import pandas as pd
file_data = pd.read_csv("file_path", index_col=0)
d = pd.Series(file_data.country, index=['BR','RU','IN'] or index = file_data.index)
output:
>>> d
BR Brazil
RU Russia
IN India
df = pd.DataFrame(file_data.area, index=['BR','RU','IN'] or index = file_data.index )
output:
>>> df
area
BR 12015
RU 457
IN 457787
Quoting the Pandas docs
pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)
Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure.
So, the Series is the data structure for a single column of a DataFrame
, not only conceptually, but literally, i.e. the data in a DataFrame
is actually stored in memory as a collection of Series
.
Analogously: We need both lists and matrices, because matrices are built with lists. Single row matricies, while equivalent to lists in functionality still cannot exist without the list(s) they're composed of.
They both have extremely similar APIs, but you'll find that DataFrame
methods always cater to the possibility that you have more than one column. And, of course, you can always add another Series
(or equivalent object) to a DataFrame
, while adding a Series
to another Series
involves creating a DataFrame
.
Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create a Series is to call:
s = pd.Series(data, index=index)
DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects.
d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']),
two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)