Use sed to encapsulate the first word of each paragraph with <i> </i>?
Using sed,
- if there's a letter at the beginning of the line, then
- capture any amount of non-whitespace characters and
- replace those captured characters with surrounding
<i>
...</i>
.
like this:
sed '/^[a-zA-Z]/ s!\([^ ]*\)!<i>\1</i>!' < file > file.new
On this sample input:
Snapdragon Plant with a two-lipped flower.
Snap-fastener = *press-stud.
Snapper Any of several edible marine fish.
Snappish 1 curt; ill-tempered; sharp. 2 inclined to snap.
The output is:
<i>Snapdragon</i> Plant with a two-lipped flower.
<i>Snap-fastener</i> = *press-stud.
<i>Snapper</i> Any of several edible marine fish.
<i>Snappish</i> 1 curt; ill-tempered; sharp. 2 inclined to snap.
To break down the pieces of the sed command:
/^[a-zA-Z]/
-- this is an address filter; it means to apply the subsequent command only to lines that match this regular expression. The regular expression requires that a letter (either lower-casea-z
or upper-caseA-Z
) must follow the beginning of the line^
.s!\([^ ]*\)!<i>\1</i>!
-- this is the search and replace command. It uses a delimiter between the search and the replacement; the common delimiter is a forward-slash, but since the replacement text has a forward-slash, I changed the delimiter to an exclamation point!
. The search term has two pieces to it: the capturing parenthesis, which have to be escaped, and the regular expression[^ ]*
, which says: "match anything-except-a-space, zero or more times*
. The replacement text refers back to that captured group with\1
and surrounds it with the HTML tag.
To additionally wrap each non-empty line with paragraph tags, add another sed expression:
sed -e '/^[a-zA-Z]/ s!\([^ ]*\)!<i>\1</i>!' -e '/./ { s/^/<p>/; s!$!</p>! }' < file
The additional expression says:
- match lines that have one (any) character -- this skips blank lines
{
group the next two commands together- search and replace the beginning of line
^
with an opening paragraph tag - search and replace the end of line
$
with a closing paragraph tag }
end the grouping
You can do this with sed
:
$ sed '/^$/n;s#^\([^ ]*\)#<i>\1</i>#' input.txt
<i>Snapdragon</i> Plant with a two-lipped flower.
<i>Snap-fastener</i> = *press-stud.
<i>Snapper</i> Any of several edible marine fish.
<i>Snappish</i> 1 curt; ill-tempered; sharp. 2 inclined to snap.
Explanation
The sed
above includes 2 blocks. The first block detects any blank lines, /^$/
and skips them, n
.
- skip any blank lines
/^$/n
The second block does all the heavy lifting s#..#..#
, and detects sub-strings that do not include a space \([^ ]*\)
. This pattern is 'saved' via the \(..\)
that wraps it, so we can reuse it later on via the \1
.
- match sub-string up to first space
\([^ ]*\)
- save match,
\1
, and wrap it with<i>...</i>