Left Anti join in Spark?
You can use the "left anti" join type - either with DataFrame API or with SQL (DataFrame API supports everything that SQL supports, including any join condition you need):
DataFrame API:
df.as("table1").join(
df2.as("table2"),
$"table1.name" === $"table2.name" && $"table1.age" === $"table2.howold",
"leftanti"
)
SQL:
sqlContext.sql(
"""SELECT table1.* FROM table1
| LEFT ANTI JOIN table2
| ON table1.name = table2.name AND table1.age = table2.howold
""".stripMargin)
NOTE: it's also worth noting that there's a shorter, more concise way of creating the sample data without specifying the schema separately, using tuples and the implicit toDF
method, and then "fixing" the automatically-inferred schema where needed:
import spark.implicits._
val df = List(
("mike", 26, true),
("susan", 26, false),
("john", 33, true)
).toDF("name", "age", "isBoy")
val df2 = List(
("mike", "grade1", 45, "baseball", new java.sql.Date(format.parse("1957-12-10").getTime)),
("john", "grade2", 33, "soccer", new java.sql.Date(format.parse("1978-06-07").getTime)),
("john", "grade2", 32, "golf", new java.sql.Date(format.parse("1978-06-07").getTime)),
("mike", "grade2", 26, "basketball", new java.sql.Date(format.parse("1978-06-07").getTime)),
("lena", "grade2", 23, "baseball", new java.sql.Date(format.parse("1978-06-07").getTime))
).toDF("name", "grade", "howold", "hobby", "birthday").withColumn("birthday", $"birthday".cast(DateType))
You can do it with the built in function except
(I would have used the code you provided, but you didn't include the imports, so I couldn't just c/p it :( )
val a = sc.parallelize(Seq((1,"a",123),(2,"b",456))).toDF("col1","col2","col3")
val b= sc.parallelize(Seq((4,"a",432),(2,"t",431),(2,"b",456))).toDF("col1","col2","col3")
scala> a.show()
+----+----+----+
|col1|col2|col3|
+----+----+----+
| 1| a| 123|
| 2| b| 456|
+----+----+----+
scala> b.show()
+----+----+----+
|col1|col2|col3|
+----+----+----+
| 4| a| 432|
| 2| t| 431|
| 2| b| 456|
+----+----+----+
scala> a.except(b).show()
+----+----+----+
|col1|col2|col3|
+----+----+----+
| 1| a| 123|
+----+----+----+