aggregate one variable on a subset of the dataframe and the other on the total dataframe pandas code example
Example 1: Groups the DataFrame using the specified columns
df.groupBy().avg().collect()
sorted(df.groupBy('name').agg({'age': 'mean'}).collect())
sorted(df.groupBy(df.name).avg().collect())
sorted(df.groupBy(['name', df.age]).count().collect())
Example 2: Aggregate on the entire DataFrame without group
df.agg({"age": "max"}).collect()
from pyspark.sql import functions as F
df.agg(F.min(df.age)).collect()