Extract string from each line of a file

What about grep?

grep -oP "(?<=\>).*(?=<)"  file

Output:

Wallmart
tastes

EDIT:

Following @Toby Speight comment, and assuming that between > and < there are only words, to avoid matching > and < in other contexts the command should be

grep -oP "(?<=\>)\w+(?=<)"  file

For awk:

awk -F '[><]' '{print $2}' file

That sets the field separator as either > or < and prints the second field which is what is between those two characters.

For sed:

sed 's|.*>\(.*\)<.*|\1|' file

That uses the () to print what is between the > and anything coming after it and the < and anything coming before it.

The output

Wallmart
tastes

I tried with below command and it worked fine

awk -F ">" '{print $2}' filename| sed  "s/<.*//g"

output

Wallmart
tastes

python

#!/usr/bin/python
o=open('filename','r')
for i in o:
    k=i.split('>')[1].split('<')[0].strip()
    print k

output

Wallmart
tastes

Extract string from each line of a file

Tags:

Gawk

Regular Expression

Related

Recent Posts