How to select all columns of a dataframe in join - Spark-scala
With alias:
first_df.alias("fst").join(second_df, Seq("id"), "left_outer").select("fst.*")
Suppose you:
- Want to use the DataFrame syntax.
- Want to select all columns from df1 but only a couple from df2.
- This is cumbersome to list out explicitly due to the number of columns in df1.
Then, you might do the following:
val selectColumns = df1.columns.map(df1(_)) ++ Array(df2("field1"), df2("field2"))
df1.join(df2, df1("key") === df2("key")).select(selectColumns:_*)
We can also do it with leftsemi join. leftsemi join will select the data from left side dataframe from a joined dataframe.
Here we join two dataframes df1 and df2 based on column col1.
df1.join(df2, df1.col("col1").equalTo(df2.col("col1")), "leftsemi")