Apache Spark reads for S3: can't pickle thread.lock objects
Your s3_client isn't serialisable.
Instead of flatMap use mapPartitions, and initialise s3_client inside the lambda body to avoid overhead. That will:
- init s3_client on each worker
- reduce initialisation overhead