Remove the Salutations

Retina, 68% 72.8% (old) 74.8% 77.5% (new test battery)

i`^h(a[iy]|eya?|i(h?i|ya|)|ello)[ ,]+

T`l`L`^.

Try it online! Edit: Gained 4.8% (old) 2.7% (new) coverage with help from @MartinEnder's tips.

GNU sed, 78% 100%

Click to copy

/^\w*[wd]\b/!s/^[dghs][eruaio]\w*\W\+//i
s/./\U&/

(49 bytes)

The test battery is quite limited: we can count which words appear first on each line:

Click to copy

$ sed -e 's/[ ,].*//' inputs.txt | sort | uniq -ic
 40 aight
 33 alright
 33 dear
 33 g'd
 41 good
 36 greetings
 35 guys
 31 hai
 33 hay
 27 hello
 33 hey
 37 heya
 43 hi
 34 hihi
 29 hii
 35 hiya
 45 hola
 79 how
 37 howdy
 33 kowabunga
 39 salutations
 32 speak
 34 sweet
 40 talk
 36 wassup
 34 what's
 38 yo

The salutations to be removed begin with d, g, h or s (or uppercase versions thereof); the non-salutations beginning with those letters are

Click to copy

 33 g'd
 41 good
 79 how
 32 speak
 34 sweet

Ignoring lines where they appear alone, that's 220 false-positives. So let's just remove initial words beginning with any of those four letters.

When we see an initial word beginning with any of those (/ ^[dghs]\w*), case-insensitively (/i), and followed by at least one non-word character (\W\+), then replace with an empty string. Then, replace the first character with its uppercase equivalent (s/./\U&/).

That gives us

Click to copy

s/^[dghs]\w*\W\+//i
s/./\U&/

We can now refine this a bit:

The largest set of false-positives is how, so we make the substitution conditional by prefixing with a negative test:

Click to copy
```
 /^[Hh]ow\b/!
```
We can also filter on the second letter, to eliminate g'd, speak and sweet:

Click to copy
```
s/^[dghs][eruaio]\w*\W\+//i
```
That leaves only good as a false positive. We can adjust the prefix test to eliminate words ending in either w or d:

Click to copy
```
/^\w*[wd]\b/!
```

Demonstration

Click to copy

$ diff -u <(./123478.sed inputs.txt) replaced.txt | grep ^- | wc -l
0

PHP, 60.6%

50 Bytes

Click to copy

<?=ucfirst(preg_replace("#^[dh]\w+.#i","",$argn));

Try it online!

PHP, 59.4%

49 Bytes

Click to copy

<?=ucfirst(preg_replace("#^h\w+,? #i","",$argn));

Try it online!

PHP, 58.4%

50 Bytes

Click to copy

<?=ucfirst(preg_replace("#^[gh]\w+.#i","",$argn));

Try it online!

Remove the Salutations

Retina, 68% 72.8% (old) 74.8% 77.5% (new test battery)

GNU sed, 78% 100%

Demonstration

PHP, 60.6%

PHP, 59.4%

PHP, 58.4%

Tags:

Code Challenge

Test Battery

Related

Recent Posts