Number reduce tasks Spark

Yes, @svgd, that is the correct parameter. Here is how you reset it in Scala:

// Set number of shuffle partitions to 3
sqlContext.setConf("spark.sql.shuffle.partitions", "3")
// Verify the setting 
sqlContext.getConf("spark.sql.shuffle.partitions")

It's spark.sql.shuffle.partitions that you're after. According to the Spark SQL performance tuning guide:

Click to copy

| Property Name                 | Default | Meaning                                        |
+-------------------------------+---------+------------------------------------------------+
| spark.sql.shuffle.partitions  | 200     | Configures the number of partitions to use     |
|                               |         | when shuffling data for joins or aggregations. |

Another option that is related is spark.default.parallelism, which determines the 'default number of partitions in RDDs returned by transformations like join, reduceByKey, and parallelize when not set by user', however this seems to be ignored by Spark SQL and only relevant when working on plain RDDs.

Number reduce tasks Spark

Tags:

Apache Spark

Apache Spark Sql

Related

Recent Posts