tm_map has parallel::mclapply error in R 3.0.1 on Mac
I suspect you don't have the SnowballC
package installed, which seems to be required. tm_map
is supposed to run stemDocument
on all the documents using mclapply
. Try just running the stemDocument
function on one document, so you can extract the error:
stemDocument(crude[[1]])
For me, I got an error:
Error in loadNamespace(name) : there is no package called ‘SnowballC’
So I just went ahead and installed SnowballC
and it worked. Clearly, SnowballC
should be a dependency.
I just ran into this. It took me a bit of digging but I found out what was happening.
I had a line of code 'rdevel <- tm_map(rdevel, asPlainTextDocument)'
Running this produced the error
In parallel::mclapply(x, FUN, ...) : all scheduled cores encountered errors in user code
- It turns out that 'tm_map' calls some code in 'parallel' which attempts to figure out how many cores you have. To see what it's thinking, type
> getOption("mc.cores", 2L) [1] 2 >
- Aha moment! Tell the 'tm_map' call to only use one core!
> rdevel <- tm_map(rdevel, asPlainTextDocument, mc.cores=1) Error in match.fun(FUN) : object 'asPlainTextDocument' not found > rdevel <- tm_map(rdevel, asPlainTextDocument, mc.cores=4) Warning message: In parallel::mclapply(x, FUN, ...) : all scheduled cores encountered errors in user code >
So ... with more than one core, rather than give you the error message, 'parallel' just tells you there was an error in each core. Not helpful, parallel! I forgot the dot - the function name is supposed to be 'as.PlainTextDocument'!
So - if you get this error, add 'mc.cores=1' to the 'tm_map' call and run it again.
I found an answer to this that was successful for me in this question:
Charles Copley, in his answer, indicates he thinks the new tm package requires lazy = TRUE
to be explicitly defined.
So, your code would look like this
library(tm)
data('crude')
tm_map(crude, stemDocument, lazy = TRUE)
I also tried it without SnowballC to see if it was a combination of those two answers. It did not appear to affect the result either way.