Deleting unwanted elements in lists of words
To make this post complete I have moved my comment here as an answer:
DeleteCases[ l, x_String /; StringContainsQ[x, "'" | "-" | NumberString ] , Infinity]
or
DeleteCases[l, _String?(StringContainsQ[#, "'" | "-" | NumberString ] &), Infinity]
Using RepeatedTiming
reveals that the Condition
version is minimally faster than the PatternTest
version. It takes about 0.00021 secs on my machine. Note, that looking for patterns with head String
only (e.g. _String
) will speed up the process.
Alan's suggestion ( Select[LetterQ] /@ l
) is very concise and readable but will take twice as long (0.00041 secs) on my machine. Probably LetterQ
does a bit more checks?
Update
kglr nicely shows how fast Pick
is here and I linked an old post for those interested on why that may be. As LetterQ
does more testing than the three Alternatives
given above it may be worth mentioning that DigitCharacter
as used by kglr also is faster than NumberString
:
DeleteCases[l, x_String /; StringContainsQ[x, "'" | "-" | NumberString], Infinity] // RepeatedTiming // First
0.00020
DeleteCases[l, x_String /; StringContainsQ[x, "'" | "-" | NumberString], Infinity] // RepeatedTiming // First
0.00014
Pick[l, StringFreeQ["-" | "'" | DigitCharacter] /@ l]
{{"adventures", "in", "wonderland", "lewis", "carroll", "the", "millennium", "fulcrum", "edition", "chapter", "i", "down", "alice", "was", "beginning", "to", "get", "very", "tired", "of", "sitting", "by", "her", "sister", "on", "bank", "and", "having", "nothing", "do", "once", "or", "twice", "she", "had", "peeped", "into", "book", "reading", "but", "it", "no", "pictures", "conversations", "what", "is", "use", "a", "thought", "without", "conversation", "so", "considering", "own", "mind", "as", "well", "could", "for", "hot", "day", "made", "feel", "sleepy", "stupid", "whether", "pleasure", "making", "would", "be", "worth", "trouble", "getting", "up", "picking", "daisies", "when", "suddenly", "white", "rabbit", "with", "pink", "eyes", "ran", "close"},
{"there", "was", "nothing", "so", "very", "remarkable", "in", "that", "nor", "did", "alice", "think", "it", "much", "out", "of", "the", "way", "to", "hear", "rabbit", "say", "itself", "oh", "dear"}}}
This is several times faster than DeleteCases
+StringContainsQ
combination on input l
:
Pick[l, StringFreeQ["-" | "'" | DigitCharacter] /@ l] // RepeatedTiming // First
0.000042
DeleteCases[l, x_String /; StringContainsQ[x, "'" | "-" | NumberString], Infinity] //
RepeatedTiming // First
0.00025