sbt assembly task runs slowly after adding some dependencies
So 0__'s comment is right on:
Have you read the Readme. It specifically suggests that you might change the
cacheUnzip
andcacheOutput
settings. I would give it a try.
cacheUnzip
is an optimization feature, but cacheOutput
isn't. The purpose of cacheOutput
is so that you get the identical jar when your source has not changed. For some people, it's important to that output jars don't change unnecessarily. The caveat is that it's checking the SHA-1 hash of all *.class files. So the readme says:
If there are a large number of class files, this could take a long time
From what I can tell, unzipping and application of merge strategy together takes around a minute or two, but the checking of the SHA-1 seems to take forever. Here's assembly.sbt
that turns off the output cache:
import AssemblyKeys._ // put this at the top of the file
assemblySettings
mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) => {
case PathList("javax", "servlet", xs @ _*) => MergeStrategy.first
case PathList("org", "apache", "commons", xs @ _*) => MergeStrategy.first // commons-beanutils-core-1.8.0.jar vs commons-beanutils-1.7.0.jar
case PathList("com", "esotericsoftware", "minlog", xs @ _*) => MergeStrategy.first // kryo-2.21.jar vs minlog-1.2.jar
case "about.html" => MergeStrategy.rename
case x => old(x)
}
}
assemblyCacheOutput in assembly := false
The assembly finished in 58 seconds after cleaning, and the second run without cleaning took 15 seconds. Although some of the runs took 200+ secs too.
Looking at the source, I probably could optimize cacheOutput
, but for now, turning it off should make assembly much faster.
Edit:
I've added #96 Performance degradation when adding library dependencies based on this question, and added some fixes in sbt-assembly 0.10.1 for sbt 0.13.
sbt-assembly 0.10.1 avoids content hashing of the unzipped items of the dependent library jars. It also skips jar caching done by sbt, since sbt-assembly is already caching the output.
The changes make assembly task run more consistently. Using deps-heavy spark as sample project, assembly task was run 15 times after a small source change. sbt-assembly 0.10.0 took 19+/-157 seconds (mostly within 20 secs, but going 150+ secs 26% of the runs). On the other hand, sbt-assembly 0.10.1 took 16+/-1 seconds.