Spark on Windows - What exactly is winutils and why do we need it?
I know of at least one usage, it is for running shell commands on Windows OS. You can find it in org.apache.hadoop.util.Shell
, other modules depends on this class and uses it's methods, for example getGetPermissionCommand()
method:
static final String WINUTILS_EXE = "winutils.exe";
...
static {
IOException ioe = null;
String path = null;
File file = null;
// invariant: either there's a valid file and path,
// or there is a cached IO exception.
if (WINDOWS) {
try {
file = getQualifiedBin(WINUTILS_EXE);
path = file.getCanonicalPath();
ioe = null;
} catch (IOException e) {
LOG.warn("Did not find {}: {}", WINUTILS_EXE, e);
// stack trace comes at debug level
LOG.debug("Failed to find " + WINUTILS_EXE, e);
file = null;
path = null;
ioe = e;
}
} else {
// on a non-windows system, the invariant is kept
// by adding an explicit exception.
ioe = new FileNotFoundException(E_NOT_A_WINDOWS_SYSTEM);
}
WINUTILS_PATH = path;
WINUTILS_FILE = file;
WINUTILS = path;
WINUTILS_FAILURE = ioe;
}
...
public static String getWinUtilsPath() {
if (WINUTILS_FAILURE == null) {
return WINUTILS_PATH;
} else {
throw new RuntimeException(WINUTILS_FAILURE.toString(),
WINUTILS_FAILURE);
}
}
...
public static String[] getGetPermissionCommand() {
return (WINDOWS) ? new String[] { getWinUtilsPath(), "ls", "-F" }
: new String[] { "/bin/ls", "-ld" };
}
Though Max's answer covers the actual place where it's being referred. Let me give a brief background on why it needs it on Windows -
From Hadoop's Confluence Page itself -
Hadoop requires native libraries on Windows to work properly -that includes accessing the file:// filesystem, where Hadoop uses some Windows APIs to implement posix-like file access permissions.
This is implemented in HADOOP.DLL and WINUTILS.EXE.
In particular, %HADOOP_HOME%\BIN\WINUTILS.EXE must be locatable
And , I think you should be able to run both Spark and Hadoop on Windows.