Spark Parquet Statistics(min/max) integration

PARQUET-686 made changes to intentionally ignore statistics on binary field when it seems to be appropriate. You can override this behavior by setting parquet.strings.signed-min-max.enabled to true.

After setting that config, you can read min/max in binary field with parquet-tools.

More details in my another stackoverflow question


This has been resolved in Spark-2.4.0 version. In here they have upgraded parquet version from 1.8.2 to 1.10.0.

[SPARK-23972] Update Parquet from 1.8.2 to 1.10.0

With these all column types, whether they are Int/String/Decimal will contain min/max statistics.