Read from BigQuery into Spark in efficient way?
Maybe a Googler will correct me, but AFAIK that's the only way. This is because under the hood it also uses the BigQuery Connector for Hadoop, which accordng to the docs:
The BigQuery connector for Hadoop downloads data into your Google Cloud Storage bucket before running a Hadoop job..
As a side note, this is also true when using Dataflow - it too performs an export of BigQuery table(s) to GCS first and then reads them in parallel.
WRT whether or not the copying stage (which is essentially a BigQuery export job) is influenced by your Spark cluster size, or if it's a fixed time - no. BigQuery export jobs are nondeterministic, and BigQuery uses its own resources for exporting to GCS i.e. not your Spark cluster.
spark-bigquery-connector uses the BigQuery storage API which is super fast.