How to subtract a column of days from a column of dates in Pyspark?

I was able to solve this using selectExpr.

df.selectExpr('date_sub(date_col, day_col) as subtracted_dates')

If you want to append the column to the original DF, just add * to the expression

df.selectExpr('*', 'date_sub(date_col, day_col) as subtracted_dates')

Use expr function (if you have dynamic values from columns to substract):

>>> from pyspark.sql.functions import *
>>> df.withColumn('substracted_dates',expr("date_sub(date_col,days_col)"))

Use withColumn function(if you have literal values to substract):

>>> df.withColumn('substracted_dates',date_sub('date_col',<int_literal_value>))

Not the most elegant solution ever but if you don't want to hack SQL expressions in Scala (not that it should be hard, but these are private to sql) something like this should do the trick:

from pyspark.sql import Column

def date_sub_(c1: Column, c2: Column) -> Column:
    return ((c1.cast("timestamp").cast("long") - 60 * 60 * 24 * c2)
        .cast("timestamp").cast("date"))

For Python 2.x just drop type annotations.