How to delete all lines before the first and after the last occurrence of a string?
Could you please try following. Written and tested with shown samples with GNU awk
.
awk '
/Lecture/{
found=1
}
found && NF{
val=(val?val ORS:"")$0
}
END{
if(val){
match(val,/.*Lecture [0-9]+/)
print substr(val,RSTART,RLENGTH)
}
}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
/Lecture/{ ##Checking if a line has Lecture keyword then do following.
found=1 ##Setting found to 1 here.
}
found && NF{ ##Checking if found is SET and line is NOT NULL then do following.
val=(val?val ORS:"")$0 ##Creating va and keep adding its value in it.
}
END{ ##Starting END block of this code here.
if(val){ ##Checking condition if val is set then do following.
match(val,/.*Lecture [0-9]+/) ##Matching regex till Lecture digits in its value.
print substr(val,RSTART,RLENGTH) ##Printing sub string of matched values here to print only matched values.
}
}' Input_file ##Mentioning Input_file name here.
Simply using grep 'Lecture' file
with the input you have shown in file
will work:
$ grep 'Lecture' file
Estimation of Working Capital Lecture 1
Estimation of Working Capital Lecture 2
Estimation of Working Capital Lecture 3
Money Market Lecture 254
Money Market Lecture 255
Money Market Lecture 256
International Trade Lecture 257
International Trade Lecture 258
International Trade Lecture 259
(note: this simply grabs all the lines containing Lecture
. See @RavinderSingh13 answer for protecting against non-Lecture
lines in between)
You could replace matches of the following regular expression (with the multiline flag set) with empty strings using your tool of choice. The regex engine need only support negative lookaheads.
\A(?:^(?!.*\bLecture\b).*\r?\n)*|^\r?\n|^.*\r?\n(?![\s\S]*\bLecture\b)
Start your engine!
The regex engine performs the following operations.
\A : match beginning of string (not line)
(?: : begin a non-capture group
^ : match beginning of line
(?!.*\bLecture\b) : assert the line does not contain 'Lecture'
.*\r?\n : match the line
) : end non-capture group
* : execute the non-capture group 0+ times
| : or
^\r?\n : match an empty line
| : or
^.*\r?\n : match a line
(?! : begin a negative lookahead
[\s\S]* : match 0+ characters, including line terminators
\bLecture\b : match 'Lecture'
) : end negative lookahead