Passing nullable columns as parameter to Spark SQL UDF
The issue is that null is not a valid value for scala Int (which is the backing value) while it is a valid value for String. Int is equivalent to java int primitive and must have a value. This means the udf can't be called when the value is null and therefore null remains.
There are two ways to solve this:
- Change the function to accept java.lang.Integer (which is an object and can be null)
- If you can't change the function, you can use when/otherwise to do something special in case of null. For example when(col("int col").isNull, someValue).otherwise(the original call)
A good explanation of this can be found here