Kafka Structured Streaming KafkaSourceProvider could not be instantiated
I managed to solve this by ensuring that the spark-sql-kafka package's version matches the spark version.
In my case, I am now using --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.1
for my spark version 2.4.1, thereafter the .format("kafka")
part of the code can be resolved.
Also, v2.12 of the package (i.e., org.apache.spark:spark-sql-kafka-0-10_2.12:2.4.1
) does not seem stable at the time of writing, and using it will also cause the above error.
*EDIT: v2.12 spark-sql-kafka
packages seem to only work with Spark built with Scala v2.12. Hence, for Spark v2.X versions (pre-built with Scala v2.11 by default), there's a need to instead use Spark binaries built with Scala v2.12 (e.g. spark-2.4.1-bin-without-hadoop-scala-2.12.tgz
) if you really want to use spark-sql-kafka
v2.12 package. For Spark v3.X, they are pre-built with Scala v2.12 by default, hence you'll only see/use v2.12 of the package.