How to remove extra white space between words inside a character vector using?
Another option is the squish function from the stringr library
library(stringr)
string <- "Hi, this is a good time to start working together."
str_squish(string)
#[1] ""Hi, this is a good time to start working together.""
The package textclean
has many useful tools for processing text. replace_white
would be useful here:
v <- "Hi, this is a good time to start working together."
textclean::replace_white(v)
# [1] "Hi, this is a good time to start working together."
gsub
is your friend:
test <- "Hi, this is a good time to start working together."
gsub("\\s+"," ",test)
#[1] "Hi, this is a good time to start working together."
\\s+
will match any space character (space, tab etc), or repeats of space characters, and will replace it with a single space " "
.
Since the title of the question is "remove the extra whitespace between words", without touching the leading and trailing whitespaces, the answer is (assuming the "words" are non-whitespace character chunks)
gsub("(\\S)\\s{2,}(?=\\S)", "\\1 ", text, perl=TRUE)
stringr::str_replace_all(text, "(\\S)\\s{2,}(?=\\S)", "\\1 ")
## Or, if the whitespace to leep is the last whitespace in those matched
gsub("(\\S)(\\s){2,}(?=\\S)", "\\1\\2", text, perl=TRUE)
stringr::str_replace_all(text, "(\\S)(\\s){2,}(?=\\S)", "\\1\\2")
See regex demo #1 and regex demo #2 and this R demo.
Regex details:
(\S)
- Capturing group 1 (\1
refers to this group value from the replacement pattern): a non-whitespace char\s{2,}
- two or more whitespace chars (in Regex #2, it is wrapped with parentheses to form a capturing group with ID 2 (\2
))(?=\S)
- a positive lookahead that requires a non-whitespace char immediately to the right of the current location.