Get Last Monday in Spark
You can determine next date using next_day
and subtract a week. Required functions can be imported as follows:
from pyspark.sql.functions import next_day, date_sub
And as:
def previous_day(date, dayOfWeek):
return date_sub(next_day(date, "monday"), 7)
Finally an example:
from pyspark.sql.functions import to_date
df = sc.parallelize([
("2016-10-26", )
]).toDF(["date"]).withColumn("date", to_date("date"))
df.withColumn("last_monday", previous_day("date", "monday"))
With result:
+----------+-----------+
| date|last_monday|
+----------+-----------+
|2016-10-26| 2016-10-24|
+----------+-----------+
I found out that pyspark's function trunc
also works.
import pyspark.sql.functions as f
df = spark.createDataFrame([
(datetime.date(2020, 10, 27), ),
(datetime.date(2020, 12, 21), ),
(datetime.date(2020, 10, 13), ),
(datetime.date(2020, 11, 11), ),
], ["date_col"])
df = df.withColumn("first_day_of_week", f.trunc("date_col", "week"))