python pandas merge multiple csv files
dataset_1 = pd.read_csv('csv path')
dataset_2 = pd.read_csv('csv path')
new_dataset = pd.merge(dataset_1, dataset_2, left_on='same column name', right_on=('same column name'), how=('how to join ex:left'))
You're trying to build one large dataframe out of the rows of many dataframes who all have the same column names. axis
should be 0 (the default), not 1. Also you don't need to specify a type of join. This will have no effect since the column names are the same for each dataframe.
df = pd.concat([df1, df2, df3])
should be enough in order to concatenate the datasets.
(see https://pandas.pydata.org/pandas-docs/stable/merging.html )
Your call to set_index
to define an index using the values in the DateTime column should then work.
Consider using read_csv()
args, index_col and parse_dates, to create indices during import and format as datetime. Then run your needed horizontal merge. Below assumes date is in first column of csv. And at the end use sort_index()
on final dataframe to sort the datetimes.
df1 = pd.read_csv(r"E:\Business\Economic Indicators\Consumer Price Index - Core (YoY) - European Monetary Union.csv",
index_col=[0], parse_dates=[0])
df2 = pd.read_csv(r"E:\Business\Economic Indicators\Private loans (YoY) - European Monetary Union.csv",
index_col=[0], parse_dates=[0])
df3 = pd.read_csv(r"E:\Business\Economic Indicators\Current Account s.a - European Monetary Union.csv",
index_col=[0], parse_dates=[0])
finaldf = pd.concat([df1, df2, df3], axis=1, join='inner').sort_index()
And for DRY-er approach especially across the hundreds of csv files, use a list comprehension
import os
...
os.chdir('E:\\Business\\Economic Indicators')
dfs = [pd.read_csv(f, index_col=[0], parse_dates=[0])
for f in os.listdir(os.getcwd()) if f.endswith('csv')]
finaldf = pd.concat(dfs, axis=1, join='inner').sort_index()