How to run python3 on google's dataproc pyspark

I found an answer to this here such that my initialization script now looks like this:


# Install tools
apt-get -y install python3 python-dev build-essential python3-pip
easy_install3 -U pip

# Install requirements
pip3 install --upgrade google-cloud==0.27.0
pip3 install --upgrade google-api-python-client==1.6.2
pip3 install --upgrade pytz==2013.7

# Setup python3 for Dataproc
echo "export PYSPARK_PYTHON=python3" | tee -a  /etc/profile.d/  /etc/*bashrc /usr/lib/spark/conf/
echo "export PYTHONHASHSEED=0" | tee -a /etc/profile.d/ /etc/*bashrc /usr/lib/spark/conf/
echo "spark.executorEnv.PYTHONHASHSEED=0" >> /etc/spark/conf/spark-defaults.conf

Configure the Dataproc cluster's Python environment explained it in detail. Basically, you need init actions before 1.4, and the default is Python3 from Miniconda3 in 1.4+.