How to update a pandas dataframe, from multiple API calls?
Code Explanation
- Create dataframe,
df
, with pd.read_csv
.
- It is expected that all of the values in
'person_id'
, are unique.
- Use
.apply
on 'person_id'
, to call prepare_data
.
prepare_data
expects 'person_id'
to be a str
or int
, as indicated by the type annotation, Union[int, str]
- Call the
API
, which will return a dict
, to the prepare_data
function.
- Convert the
'rents'
key, of the dict
, into a dataframe, with pd.json_normalize
.
- Use
.apply
on 'carId'
, to call the API
, and extract the 'mileage'
, which is added to dataframe data
, as a column.
- Add
'person_id'
to data
, which can be used to merge df
with s
.
- Convert
pd.Series
, s
to a dataframe, with pd.concat
, and then merge
df
and s
, on person_id
.
- Save to a csv with
pd.to_csv
in the desired form.
Potential Issues
- If there's an issue, it's most likely to occur in the
call_api
function.
- As long as
call_api
returns a dict
, like the response shown in the question, the remainder of the code will work correctly to produce the desired output.
import pandas as pd
import requests
import json
from typing import Union
def call_api(url: str) -> dict:
r = requests.get(url)
return r.json()
def prepare_data(uid: Union[int, str]) -> pd.DataFrame:
d_url = f'http://api.myendpoint.intranet/get-data/{uid}'
m_url = 'http://api.myendpoint.intranet/get-mileage/'
# get the rent data from the api call
rents = call_api(d_url)['rents']
# normalize rents into a dataframe
data = pd.json_normalize(rents)
# get the mileage data from the api call and add it to data as a column
data['mileage'] = data.carId.apply(lambda cid: call_api(f'{m_url}{cid}')['mileage'])
# add person_id as a column to data, which will be used to merge data to df
data['person_id'] = uid
return data
# read data from file
df = pd.read_csv('file.csv', sep=';')
# call prepare_data
s = df.person_id.apply(prepare_data)
# s is a Series of DataFrames, which can be combined with pd.concat
s = pd.concat([v for v in s])
# join df with s, on person_id
df = df.merge(s, on='person_id')
# save to csv
df.to_csv('output.csv', sep=';', index=False)
- If there are any errors when running this code:
- Leave a comment, to let me know.
- edit your question, and paste the entire
TraceBack
, as text, into a code block.
Example
# given the following start dataframe
person_id name flag
0 1000 Joseph 1
1 400 Sam 1
# resulting dataframe using the same data for both id 1000 and 400
person_id name flag carId price rentStatus mileage
0 1000 Joseph 1 6638 1000 active 1000.0
1 1000 Joseph 1 5566 2000 active 1000.0
2 400 Sam 1 6638 1000 active 1000.0
3 400 Sam 1 5566 2000 active 1000.0