pyspark config in colab code example

Example: Install Spark on google Colab

# Install java
!apt-get install openjdk-8-jdk-headless -qq > /dev/null

# Install spark (change the version number if needed)
!wget -q https://archive.apache.org/dist/spark/spark-3.0.0/spark-3.0.0-bin-hadoop3.2.tgz

# Unzip the spark file to the current folder
!tar xf spark-3.0.0-bin-hadoop3.2.tgz

# Set your spark folder to your system path environment. 
import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["SPARK_HOME"] = "/content/spark-3.0.0-bin-hadoop3.2"

# Install findspark using pip
!pip install -q findspark

# Spark for Python
!pip install pyspark