PySpark DataFrame - Join on multiple columns dynamically

Why not use a simple comprehension:

firstdf.join(
    seconddf, 
   [col(f) == col(s) for (f, s) in zip(columnsFirstDf, columnsSecondDf)], 
   "inner"
)

Since you use logical it is enough to provide a list of conditions without & operator.

@Mohan sorry i dont have reputation to do "add a comment". Having column same on both dataframe,create list with those columns and use in the join

col_list=["id","column1","column2"]
firstdf.join( seconddf, col_list, "inner")

Tags:

Python

Dataframe

Apache Spark

Pyspark

Apache Spark Sql

Related