Reading csv files in chunks with `readr::read_csv_chunked()`
I figured out that the function to be called in DataFrameCallback$new()
always needs to have one additional argument (pos
in the example from the documentation). This argument does not have to be used so I do not really understand its purpose. But at least, it works this way.
Does anyone know more details about this second argument?
pos
means position, it's the index number of the first line in every chunk. Using this callback function, you can process every line in the chunk.
Below is the official example from https://readr.tidyverse.org/reference/callback.html
ChunkCallback Callback interface definition, all callback functions should inherit from this class.
SideEffectChunkCallback Callback function that is used only for side effects, no results are returned.
DataFrameCallback Callback function that combines each result together at the end.
AccumulateCallBack Callback function that accumulates a single result. Requires the parameter acc to specify the initial value of the accumulator. The parameter acc is NULL by default.
# Print starting line of each chunk
f <- function(x, pos) print(pos)
read_lines_chunked(readr_example("mtcars.csv"), SideEffectChunkCallback$new(f), chunk_size = 5)
# The ListCallback can be used for more flexible output
f <- function(x, pos) x$mpg[x$hp > 100]
read_csv_chunked(readr_example("mtcars.csv"), ListCallback$new(f), chunk_size = 5)