How to let pyspark display the whole query plan instead of ... if there are many fields?
I am afraid there is no easy way
https://github.com/apache/spark/blob/v2.4.2/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L57
It is hard coded to be no more than 100 chars
override def simpleString: String = {
val metadataEntries = metadata.toSeq.sorted.map {
case (key, value) =>
key + ": " + StringUtils.abbreviate(redact(value), 100)
}
In the end I have been using
def full_file_meta(f: FileSourceScanExec) = {
val metadataEntries = f.metadata.toSeq.sorted.flatMap {
case (key, value) if Set(
"Location", "PartitionCount",
"PartitionFilters", "PushedFilters"
).contains(key) =>
Some(key + ": " + value.toString)
case other => None
}
val metadataStr = metadataEntries.mkString("[\n ", ",\n ", "\n]")
s"${f.nodeNamePrefix}${f.nodeName}$metadataStr"
}
val ep = data.queryExecution.executedPlan
print(ep.flatMap {
case f: FileSourceScanExec => full_file_meta(f)::Nil
case other => Nil
}.mkString(",\n"))
It is a hack and better than nothing.
Spark 3.0 introduced explain('formatted')
which layouts the information differently and no truncation is applied.