Pyspark Creating timestamp column

I am not sure for 2.1.0, on 2.2.1 at least you can just:

from pyspark.sql import functions as F
df.withColumn('Age', F.current_timestamp())

Hope it helps!

Assuming you have dataframe from your code snippet and you want same timestamp for all your rows.

Let me create some dummy dataframe.

>>> dict = [{'name': 'Alice', 'age': 1},{'name': 'Again', 'age': 2}]
>>> df = spark.createDataFrame(dict)

>>> import time
>>> import datetime
>>> timestamp = datetime.datetime.fromtimestamp(time.time()).strftime('%Y-%m-%d %H:%M:%S')
>>> type(timestamp)
<class 'str'>

>>> from pyspark.sql.functions import lit,unix_timestamp
>>> timestamp
'2017-08-02 16:16:14'
>>> new_df = df.withColumn('time',unix_timestamp(lit(timestamp),'yyyy-MM-dd HH:mm:ss').cast("timestamp"))
>>> new_df.show(truncate = False)
+---+-----+---------------------+
|age|name |time                 |
+---+-----+---------------------+
|1  |Alice|2017-08-02 16:16:14.0|
|2  |Again|2017-08-02 16:16:14.0|
+---+-----+---------------------+

>>> new_df.printSchema()
root
 |-- age: long (nullable = true)
 |-- name: string (nullable = true)
 |-- time: timestamp (nullable = true)

Adding on to balalaika, if someone, like me just want to add the date, but not the time with it, then he can follow the below code

from pyspark.sql import functions as F
df.withColumn('Age', F.current_date())

Hope this helps

Pyspark Creating timestamp column

Tags:

Python

Datetime

Pyspark

Related

Recent Posts