Date and Interval Addition in SparkSQL
This is probably a matter of conversions to local time. INTERVAL
casts data to TIMESTAMP
and then back to DATE
:
scala> spark.sql("SELECT CAST('1997-09-19' AS DATE) + INTERVAL 1 weeks").explain
== Physical Plan ==
*Project [10130 AS CAST(CAST(CAST(1997-09-19 AS DATE) AS TIMESTAMP) + interval 1 weeks AS DATE)#19]
+- Scan OneRowRelation[]
(note the second and third CASTs
) and Spark is known to be inconsequent when handling timestamps.
DATE_ADD
should exhibit more stable behavior:
scala> spark.sql("SELECT DATE_ADD(CAST('1997-09-19' AS DATE), 7)").explain
== Physical Plan ==
*Project [10130 AS date_add(CAST(1997-09-19 AS DATE), 7)#27]
+- Scan OneRowRelation[]
As of Spark 3, this bug has been fixed. Let's create a DataFrame with the dates you mentioned and add a week interval. Create the DataFrame.
import java.sql.Date
val df = Seq(
(Date.valueOf("1999-09-19")),
(Date.valueOf("1997-09-19"))
).toDF("some_date")
Add a week interval:
df
.withColumn("plus_one_week", expr("some_date + INTERVAL 1 week"))
.show()
+----------+-------------+
| some_date|plus_one_week|
+----------+-------------+
|1999-09-19| 1999-09-26|
|1997-09-19| 1997-09-26|
+----------+-------------+
You can also get this same result with the make_interval()
SQL function:
df
.withColumn("plus_one_week", expr("some_date + make_interval(0, 0, 1, 0, 0, 0, 0)"))
.show()
We're working on getting make_interval() exposed as Scala/PySpark functions, so it's not necessary to use expr
to access the function.
date_add
only works for adding days, so it's limited. make_interval()
is a lot more powerful because it lets you add any combination of years / months / days / hours / minutes / seconds.