Generating Random Strings
Your performance problem comes from using the random
package in the first place: it's understandable that you could find the random::randomStrings()
function in an internet search and think it's a good way to generate random strings for use in a program, but the random
package is not intended for general-purpose programming. It works by querying the RANDOM.ORG server, which is intrinsically slower than R's built-in pseudo-random number generators.
From one of the vignettes from the random package:
There are a number of situations in which it is desirable to use non-deterministically determined random numbers. Examples include
- to seed distributed computing on different nodes with truly indepedent seeds;
- to obtain portable initializations for RNGs that do not depend on particular operating system or hardware features;
- to validate simulation results using non-deterministic random numbers;
- to provide indeterministic seeds used for lottery drawings or games ...
Note that most of these examples are about seeding or initializing (these are synonyms) R's built-in pseudo-random number generators, rather than replacing them ...
Using "stringi" as suggested by @akrun will be faster, but the following is also very fast and does not require any additional packages:
myFun <- function(n = 5000) {
a <- do.call(paste0, replicate(5, sample(LETTERS, n, TRUE), FALSE))
paste0(a, sprintf("%04d", sample(9999, n, TRUE)), sample(LETTERS, n, TRUE))
}
Example output:
myFun(10)
## [1] "BZHOF3737P" "EPOWI0674X" "YYWEB2825M" "HQIXJ5187K" "IYIMB2578R"
## [6] "YSGBG6609I" "OBLBL6409Q" "PUMAL5632D" "ABRAT4481L" "FNVEN7870Q"
We can use stri_rand_strings
from stringi
library(stringi)
sprintf("%s%s%s", stri_rand_strings(5, 5, '[A-Z]'),
stri_rand_strings(5, 4, '[0-9]'), stri_rand_strings(5, 1, '[A-Z]'))
Or more compactly
do.call(paste0, Map(stri_rand_strings, n=5, length=c(5, 4, 1),
pattern = c('[A-Z]', '[0-9]', '[A-Z]')))
Benchmarks
system.time({
do.call(paste0, Map(stri_rand_strings, n=5000, length=c(5, 4, 1),
pattern = c('[A-Z]', '[0-9]', '[A-Z]')))
})
# user system elapsed
# 0 0 0
Was able to reproduce the timings even for one part of the expected output using OP's method
system.time(string_5 <- as.vector(randomStrings(n=5000, len=5, digits=FALSE, upperalpha=TRUE,
loweralpha=FALSE, unique=TRUE, check=TRUE)))
# user system elapsed
# 0.86 0.24 5.52
You can directly perform what you want: Sample random 5 capital letters Sample 4 digits Sample 1 random capital letter
digits = 0:9
createRandString<- function() {
v = c(sample(LETTERS, 5, replace = TRUE),
sample(digits, 4, replace = TRUE),
sample(LETTERS, 1, replace = TRUE))
return(paste0(v,collapse = ""))
}
This will be more easily controlled, and won't take as long.