A script that deletes extra spaces between letters in text
Use wordsegment
, a pure-Python word segmentation NLP package:
$ pip install wordsegment
$ python2.7 -m wordsegment <<<"T h e b o o k a l s o h a s a n a n a l y t i c a l p u r p o s e w h i c h i s m o r e i m p o r t a n t"
the book also has an analytical purpose which is more important
The following regex will remove the first space in any string of spaces. That should do the job.
s/ ( *)/\1/g
So something like:
perl -i -pe 's/ ( *)/\1/g' infile.txt
...will replace infile.txt with a "fixed" version.
Based on the fact that the input includes double spaces between words, there is a much simpler solution. You simply change the double spaces to an unused character, remove the spaces and change the unused character back to a space:
echo "T h e b o o k a l s o h a s a n a n a l y t i c a l p u r p o s e w h i c h i s m o r e i m p o r t a n t " | sed 's/ /\-/g;s/ //g;s/\-/ /g'
...outputs:
The book also has an analytical purpose which is more important