spark df.write.partitionBy run very slow
Try adding a repartition("day")
before the write
, like this:
spark
.sql(sql)
.repartition("day")
.write
.partitionBy("day")
.json(output_path)
It should speed up your query.
Try adding repartition(any number ) to start with, then try increasing / decreasing the number depending upon the time it takes to write
spark
.sql(sql)
.repartition(any number)
.write
.partitionBy("day")
.json(output_path)