Use sed to encapsulate the first word of each paragraph with ?

Using sed,

if there's a letter at the beginning of the line, then
capture any amount of non-whitespace characters and
replace those captured characters with surrounding  ... .

like this:

sed '/^[a-zA-Z]/ s!\([^ ]*\)!<i>\1</i>!' < file > file.new

On this sample input:

Snapdragon  Plant with a two-lipped flower.

Snap-fastener  = *press-stud.

Snapper  Any of several edible marine fish.

Snappish  1 curt; ill-tempered; sharp. 2 inclined to snap.

The output is:

<i>Snapdragon</i>  Plant with a two-lipped flower.

<i>Snap-fastener</i>  = *press-stud.

<i>Snapper</i>  Any of several edible marine fish.

<i>Snappish</i>  1 curt; ill-tempered; sharp. 2 inclined to snap.

To break down the pieces of the sed command:

/^[a-zA-Z]/ -- this is an address filter; it means to apply the subsequent command only to lines that match this regular expression. The regular expression requires that a letter (either lower-case a-z or upper-case A-Z) must follow the beginning of the line ^.
s!$[^ ]*$!\1! -- this is the search and replace command. It uses a delimiter between the search and the replacement; the common delimiter is a forward-slash, but since the replacement text has a forward-slash, I changed the delimiter to an exclamation point !. The search term has two pieces to it: the capturing parenthesis, which have to be escaped, and the regular expression [^ ]*, which says: "match anything-except-a-space, zero or more times *. The replacement text refers back to that captured group with \1 and surrounds it with the HTML tag.

To additionally wrap each non-empty line with paragraph tags, add another sed expression:

sed -e '/^[a-zA-Z]/ s!\([^ ]*\)!<i>\1</i>!' -e '/./ { s/^/<p>/; s!$!</p>! }' < file

The additional expression says:

match lines that have one (any) character -- this skips blank lines
{ group the next two commands together
search and replace the beginning of line ^ with an opening paragraph tag
search and replace the end of line $ with a closing paragraph tag
} end the grouping

You can do this with sed:

$ sed '/^$/n;s#^\([^ ]*\)#<i>\1</i>#' input.txt
<i>Snapdragon</i>  Plant with a two-lipped flower.

<i>Snap-fastener</i>  = *press-stud.

<i>Snapper</i>  Any of several edible marine fish.

<i>Snappish</i>  1 curt; ill-tempered; sharp. 2 inclined to snap.

Explanation

The sed above includes 2 blocks. The first block detects any blank lines, /^$/ and skips them, n.

skip any blank lines /^$/n

The second block does all the heavy lifting s#..#..#, and detects sub-strings that do not include a space $[^ ]*$. This pattern is 'saved' via the $..$ that wraps it, so we can reuse it later on via the \1.

match sub-string up to first space $[^ ]*$
save match, \1, and wrap it with ...

Use sed to encapsulate the first word of each paragraph with <i> </i>?

Explanation

Tags:

Html

Awk

Sed

Text Formatting

Regular Expression

Related

Recent Posts