importing pyspark in python shell
Turns out that the pyspark bin is LOADING python and automatically loading the correct library paths. Check out $SPARK_HOME/bin/pyspark
:
export SPARK_HOME=/some/path/to/apache-spark
# Add the PySpark classes to the Python path:
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
I added this line to my .bashrc file and the modules are now correctly found!
If it prints such error:
ImportError: No module named py4j.java_gateway
Please add $SPARK_HOME/python/build to PYTHONPATH:
export SPARK_HOME=/Users/pzhang/apps/spark-1.1.0-bin-hadoop2.4
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH
Assuming one of the following:
- Spark is downloaded on your system and you have an environment variable
SPARK_HOME
pointing to it - You have ran
pip install pyspark
Here is a simple method (If you don't bother about how it works!!!)
Use findspark
Go to your python shell
pip install findspark import findspark findspark.init()
import the necessary modules
from pyspark import SparkContext from pyspark import SparkConf
Done!!!