Spark import of Parquet files converts strings to bytearray
I ran into the same problem. Adding
sqlContext.setConf("spark.sql.parquet.binaryAsString","true")
right after creating my SqlContext, solved it for me.
For people using SparkSession
it is:
spark = SparkSession.builder.config('spark.sql.parquet.binaryAsString', 'true').getOrCreate().newSession()
For spark 2.0 or later
set runtime options
spark.conf.set("spark.sql.parquet.binaryAsString","true")