Create spark dataframe schema from json schema representation
I am posting a pyspark version to a question answered by Assaf:
from pyspark.sql.types import StructType
# Save schema from the original DataFrame into json:
schema_json = df.schema.json()
# Restore schema from json:
import json
new_schema = StructType.fromJson(json.loads(schema_json))
There are two steps for this: Creating the json from an existing dataframe and creating the schema from the previously saved json string.
Creating the string from an existing dataframe
val schema = df.schema
val jsonString = schema.json
create a schema from json
import org.apache.spark.sql.types.{DataType, StructType}
val newSchema = DataType.fromJson(jsonString).asInstanceOf[StructType]