findspark.init() IndexError: list index out of range error
This is most likely due to the SPARK_HOME
environment variable not being set correctly on your system. Alternatively, you can just specify it when you're initialising findspark
, like so:
import findspark
findspark.init('/path/to/spark/home')
After that, it should all work!
I was getting the same error and was able to make it work by entering the exact installation directory:
import findspark
# Use this
findspark.init("C:\Users\PolestarEmployee\spark-1.6.3-bin-hadoop2.6")
# Test
from pyspark import SparkContext, SparkConf
Basically, it is the directory where spark was extracted. In future where ever you see
spark_home
enter the same installation directory. I also tried using toree to create a kernal instead, but it is failing somehow. A kernal would be a cleaner solution.
You need to update the SPARK_HOME
variable inside bash_profile.
For me, the following command worked(in terminal):
export SPARK_HOME="/usr/local/Cellar/apache-spark/2.2.0/libexec/"
After this, you can use follow these commands:
import findspark
findspark.init('/usr/local/Cellar/apache-spark/2.2.0/libexec')