pyspark filter binary hex value code example

Example 1: pyspark filter

>>> df.filter(df.age > 3).collect()
[Row(age=5, name='Bob')]
>>> df.where(df.age == 2).collect()
[Row(age=2, name='Alice')]

Example 2: A distributed collection of data grouped into named columns

# A distributed collection of data grouped into named columns

people = sqlContext.read.parquet("...")

ageCol = people.age

# To create DataFrame using SQLContext
people = sqlContext.read.parquet("...")
department = sqlContext.read.parquet("...")

people.filter(people.age > 30).join(
  department, people.deptId == department.id).groupBy(
  department.name, "gender").agg({"salary": "avg", "age": "max"})