Spark Dataframe Nested Case When Statement

For more complex logic, I prefer to use UDFs for better readability:

val selectCase = udf((tc: String, amt: String) =>
  if (Seq("a", "b").contains(tc)) "Y"
  else if (tc == "a" && amt.toInt <= 0) "N"
  else null
)


dataset1.withColumn("REASON", selectCase(col("tc"), col("amt")))
  .show

There is no nesting here, therefore there is no need for otherwise. All you need is chained when:

import spark.implicits._

when($"tc" isin ("a", "b"), "Y")
  .when($"tc" === "a" && $"amt" >= 0, "N")

ELSE NULL is implicit so you can omit it completely.

Pattern you use, is more more applicable for folding over a data structure:

val cases = Seq(
  ($"tc" isin ("a", "b"), "Y"),
  ($"tc" === "a" && $"amt" >= 0, "N")
)

where when - otherwise naturally follows recursion pattern and null provides the base case.

cases.foldLeft(lit(null)) {
  case (acc, (expr, value)) => when(expr, value).otherwise(acc)
}

Please note, that it is impossible to reach "N" outcome, with this chain of conditions. If tc is equal to "a" it will be captured by the first clause. If it is not, it will fail to satisfy both predicates and default to NULL. You should rather:

when($"tc" === "a" && $"amt" >= 0, "N")
 .when($"tc" isin ("a", "b"), "Y")

Spark Dataframe Nested Case When Statement

Tags:

Sql

Dataframe

Apache Spark

Apache Spark Sql

Related

Recent Posts