How to read CSV file from GitHub using pandas
You can copy/paste the url and change 2 things:
- Remove "blob"
- Replace github.com by raw.githubusercontent.com
For instance this link:
https://github.com/mwaskom/seaborn-data/blob/master/iris.csv
Works this way:
import pandas as pd
pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
Add ?raw=true at the end of the GitHub URL to get the raw file link.
In your case,
import pandas as pd
url = 'https://github.com/lukes/ISO-3166-Countries-with-Regional-Codes/blob/master/all/all.csv?raw=true'
df = pd.read_csv(url,index_col=0)
print(df.head(5))
Output:
alpha-2 alpha-3 country-code iso_3166-2 region \
name
Afghanistan AF AFG 4 ISO 3166-2:AF Asia
Åland Islands AX ALA 248 ISO 3166-2:AX Europe
Albania AL ALB 8 ISO 3166-2:AL Europe
Algeria DZ DZA 12 ISO 3166-2:DZ Africa
American Samoa AS ASM 16 ISO 3166-2:AS Oceania
sub-region intermediate-region region-code \
name
Afghanistan Southern Asia NaN 142.0
Åland Islands Northern Europe NaN 150.0
Albania Southern Europe NaN 150.0
Algeria Northern Africa NaN 2.0
American Samoa Polynesia NaN 9.0
sub-region-code intermediate-region-code
name
Afghanistan 34.0 NaN
Åland Islands 154.0 NaN
Albania 39.0 NaN
Algeria 15.0 NaN
American Samoa 61.0 NaN
Note: This works only with GitHub links and not with GitLab or Bitbucket links.
You should provide URL to raw content. Try using this:
import pandas as pd
url = 'https://raw.githubusercontent.com/lukes/ISO-3166-Countries-with-Regional-Codes/master/all/all.csv'
df = pd.read_csv(url, index_col=0)
print(df.head(5))
Output:
alpha-2 ... intermediate-region-code
name ...
Afghanistan AF ... NaN
Åland Islands AX ... NaN
Albania AL ... NaN
Algeria DZ ... NaN
American Samoa AS ... NaN