Spark structured streaming app reading from multiple Kafka topics
From a resource(Memory and Cores) point of view, there will be a difference If you are running it as multiple streams(multiple drives-executors) on the cluster.
For the first case, you mentioned-
df = spark.readStream.format("kafka").option("subscribe", "t1,t2,t3")...
t1df = df.select(...).where("topic = 't1'")...
t2df = df.select(...).where("topic = 't2'")...
Considering there will be a driver and 2 executers you have provided to above.
In the second case-
t1df = spark.readStream.format("kafka").option("subscribe", "t1")
t2df = spark.readStream.format("kafka").option("subscribe", "t2")
You can run these as different streams- 2 drivers and 2 executors(1 executor each). In second case there will require more memory and cores for extra driver required.
Each action requires a full lineage execution. Youre better off separating this into three separate kafka reads. Otherwise you'll read each topic N times, where N is the number of writes.
I'd really recommend against this but if you wanted to put all the topics into the same read then do this:
streamingDF.writeStream.foreachBatch { (batchDF: DataFrame, batchId: Long) =>
batchDF.persist()
batchDF.filter().write.format(...).save(...) // location 1
batchDF.filter().write.format(...).save(...) // location 2
batchDF.unpersist()
}