Dask read_csv-- Mismatched dtypes found in `pd.read_csv`/`pd.read_table`

You can use sample parameter in read_csv method and assign it an integer to indicate the number of bytes to use when determining dtypes. For example, I had to give it 25000000 to correctly infer the types of my data in the shape of (171907, 161).

df = dd.read_csv("game_logs.csv", sample=25000000)

https://docs.dask.org/en/latest/dataframe-api.html#dask.dataframe.read_csv

The message is suggesting that your change your call from

df = dd.read_csv('mylocation.csv', ...)

df = dd.read_csv('mylocation.csv', ..., dtype={'ARTICLE_ID': 'object'})

where you should change the file location and any other arguments to what you were using before. If this still doesn't work, then please update your question.

Dask read_csv-- Mismatched dtypes found in `pd.read_csv`/`pd.read_table`

Tags:

Python

Dataframe

Dask

Related

Recent Posts