NoClassDefFoundError com.apache.hadoop.fs.FSDataInputStream when execute spark-shell

The "without Hadoop" in the Spark's build name is misleading: it means the build is not tied to a specific Hadoop distribution, not that it is meant to run without it: the user should indicate where to find Hadoop (see https://spark.apache.org/docs/latest/hadoop-provided.html)

One clean way to fix this issue is to:

  1. Obtain Hadoop Windows binaries. Ideally build them, but this is painful (for some hints see: Hadoop on Windows Building/ Installation Error). Otherwise Google some up, for instance currently you can download 2.6.0 from here: http://www.barik.net/archive/2015/01/19/172716/
  2. Create a spark-env.cmd file looking like this (modify Hadoop path to match your installation): @echo off set HADOOP_HOME=D:\Utils\hadoop-2.7.1 set PATH=%HADOOP_HOME%\bin;%PATH% set SPARK_DIST_CLASSPATH=<paste here the output of %HADOOP_HOME%\bin\hadoop classpath>
  3. Put this spark-env.cmd either in a conf folder located at the same level as your Spark base folder (which may look weird), or in a folder indicated by the SPARK_CONF_DIR environment variable.

I had the same problem, in fact it's mentioned on the Getting started page of Spark how to handle it:

### in conf/spark-env.sh ###

# If 'hadoop' binary is on your PATH
export SPARK_DIST_CLASSPATH=$(hadoop classpath)

# With explicit path to 'hadoop' binary
export SPARK_DIST_CLASSPATH=$(/path/to/hadoop/bin/hadoop classpath)

# Passing a Hadoop configuration directory
export SPARK_DIST_CLASSPATH=$(hadoop --config /path/to/configs classpath)

If you want to use your own hadoop follow one of the 3 options, copy and paste it into spark-env.sh file :

1- if you have the hadoop on your PATH

2- you want to show hadoop binary explicitly

3- you can also show hadoop configuration folder

http://spark.apache.org/docs/latest/hadoop-provided.html


I too had the issue,

export SPARK_DIST_CLASSPATH=`hadoop classpath`

resolved the issue.

Tags:

Apache Spark