Function to extract domain name from URL in R
I don't know of a function in a package to do this. I don't think there's anything in base install of R. Use a user defined function and store it some where to source
later or make your own package with it.
x1 <- "http://stackoverflow.com/questions/19020749/function-to-extract-domain-name-from-url-in-r"
x2 <- "http://www.talkstats.com/"
x3 <- "www.google.com"
domain <- function(x) strsplit(gsub("http://|https://|www\\.", "", x), "/")[[c(1, 1)]]
domain(x3)
sapply(list(x1, x2, x3), domain)
## [1] "stackoverflow.com" "talkstats.com" "google.com"
You can also use the relatively new urltools
package:
library(urltools)
URLs <- c("http://stackoverflow.com/questions/19020749/function-to-extract-domain-name-from-url-in-r",
"http://www.talkstats.com/", "www.google.com")
suffix_extract(domain(URLs))
## host subdomain domain suffix
## 1 stackoverflow.com <NA> stackoverflow com
## 2 www.talkstats.com www talkstats com
## 3 www.google.com www google com
It's backed by Rcpp
so it's wicked fast (significantly more so than using built- in R apply
functions.