Match substring of two vectors and create a new vector combining them
A possible solution:
a <- setNames(a, substr(a, 2, 3))
b <- setNames(b, substr(b, 1, 2))
df <- merge(stack(a), stack(b), by = 'ind')
paste0(substr(df$values.x, 1, 1), df$values.y)
which gives:
[1] "1234" "1238" "2234" "2238" "4325" "4326" "2342"
A second alternative:
a <- setNames(a, substr(a, 2, 3))
b <- setNames(b, substr(b, 1, 2))
l <- lapply(names(a), function(x) b[x == names(b)])
paste0(substr(rep(a, lengths(l)), 1, 1), unlist(l))
which gives the same result and is considerably faster (see the benchmark).
Probably a little complex but works:
unlist( sapply( a, function(x) {
regex <- paste0( substr(x, 2, 3), '(\\d)')
z <- sub(regex, paste0(x, "\\1"), b)
z[!b %in% z]
} ))
which give: [1] "1234" "1238" "2342" "4325" "4326" "2234" "2238"
The main idea is to create a regex for each entry in a, apply this regex to b and replace the values with the current a value and append only the last digit captured (the (\\d)
part of the regex, then filter the resulting vector to get back only the modified values.
Out of curiosity, I did a small benchmark (adding sub_a and sub_b creation into Sotos and Heikki answers so everyone start on the same initial vectors a of 400 observations and b of 500 observations):
Unit: milliseconds
expr min lq mean median uq max neval
Jaap(a, b) 341.0224 342.6853 345.2182 344.3482 347.3161 350.2840 3
Tensi(a, b) 415.9175 416.2672 421.9148 416.6168 424.9134 433.2100 3
Heikki(a, b) 126.9859 139.6727 149.3252 152.3594 160.4948 168.6302 3
Sotos(a, b) 151.1264 164.9869 172.0310 178.8474 182.4833 186.1191 3
MattWBase(a, b) 286.9651 290.8923 293.3795 294.8195 296.5867 298.3538 3