How to find maximum value of a column in python dataframe
I'm coming from scala, but I do believe that this is also applicable on python.
val max = df.select(max("id")).first()
but you have first import the following :
from pyspark.sql.functions import max
The following can be used in pyspark:
df.select(max("id")).show()
You can use the aggregate max as also mentioned in the pyspark documentation link below:
Link : https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=agg
Code:
row1 = df1.agg({"id": "max"}).collect()[0]
if you are using pandas .max()
will work :
>>> df2=pd.DataFrame({'A':[1,5,0], 'B':[3, 5, 6]})
>>> df2['A'].max()
5
Else if it's a spark
dataframe:
Best way to get the max value in a Spark dataframe column