Syntax while setting schema for Pyspark.sql using StructType

It means if the column allows null values, true for nullable, and false for not nullable

StructField(name, dataType, nullable): Represents a field in a StructType. The name of a field is indicated by name. The data type of a field is indicated by dataType. nullable is used to indicate if values of this fields can have null values.

Refer to Spark SQL and DataFrame Guide for more informations.


You can also use a datatype string:

schema = 'Name STRING, DateTime TIMESTAMP, Age INTEGER'

There's not much documentation on datatype strings, but they mention them in the docs. They're much more compact and readable than StructTypes