gawk or grep: single line and ungreedy
Using any POSIX awk in any shell on every UNIX box:
$ cat tst.awk
/[[:space:]]*class[[:space:]]*/ {
inDef = 1
fname = FILENAME
sub(".*/","",fname)
def = out = ""
}
inDef {
out = out fname ":" FNR ": " $0 ORS
# Remove comments (not perfect but should work for 99.9% of cases)
sub("//.*","")
gsub("/[*]|[*]/","\n")
gsub(/\n[^\n]*\n/,"")
def = def $0 ORS
if ( /{/ ) {
if ( gsub(/,/,"&",def) > 2 ) {
printf "%s", out
}
inDef = 0
}
}
$ find tmp -type f -name '*.java' -exec awk -f tst.awk {} +
multiple-lines.java:1: class ClazzA<R extends A,
multiple-lines.java:2: S extends B<T>, T extends C<T>,
multiple-lines.java:3: U extends D, W extends E,
multiple-lines.java:4: X extends F, Y extends G, Z extends H>
multiple-lines.java:5: extends OtherClazz<S> implements I<T> {
single-line.java:1: class ClazzB<R extends A, S extends B<T>, T extends C<T>, U extends D, W extends E, X extends F, Y extends G, Z extends H> extends OtherClazz<S> implements I<T> {
The above was run using this input:
$ head tmp/*
==> tmp/X-no-parameter.java <==
class ClazzC /* no type parameter */ extends OtherClazz<S> implements I<T> {
public void method(Type<A, B> x) {
// ... code ...
}
}
==> tmp/X-one-parameter.java <==
class ClazzD<R extends A> // only one type parameter
extends OtherClazz<S> implements I<T> {
public void method(Type<X, Y> x) {
// ... code ...
}
}
==> tmp/X-two-line-parameters.java <==
class ClazzF<R extends A, // only two type parameters
S extends B<T>> // on two lines
extends OtherClazz<S> implements I<T> {
public void method(Type<X, Y> x) {
// ... code ...
}
}
==> tmp/X-two-parameters.java <==
class ClazzE<R extends A, S extends B<T>> // only two type parameters
extends OtherClazz<S> implements I<T> {
public void method(Type<X, Y> x) {
// ... code ...
}
}
==> tmp/multiple-lines.java <==
class ClazzA<R extends A,
S extends B<T>, T extends C<T>,
U extends D, W extends E,
X extends F, Y extends G, Z extends H>
extends OtherClazz<S> implements I<T> {
public void method(Type<Q, R> x) {
// ... code ...
}
}
==> tmp/single-line.java <==
class ClazzB<R extends A, S extends B<T>, T extends C<T>, U extends D, W extends E, X extends F, Y extends G, Z extends H> extends OtherClazz<S> implements I<T> {
public void method(Type<Q, R> x) {
// ... code ...
}
}
The above is just a best effort without writing a parser for the language and just having the OPs posted sample input/output to go on for what needs to be handled.
Note: Presence of comments can cause these solutions to fail.
With ripgrep
(https://github.com/BurntSushi/ripgrep)
rg -nU --no-heading '(?s)class\s+\w+\s*<[^{]*,[^{]*,[^{]*>[^{]*\{' *.java
-n
enables line numbering (this is the default if output is to the terminal)-U
enables multiline matching--no-heading
by default,ripgrep
displays matching lines grouped under filename as a header, this option makesripgrep
behave likeGNU grep
with filename prefix for each output line[^{]*
is used instead of.*
to prevent matching,
and>
elsewhere in the file, otherwise lines likepublic void method(Type<Q, R> x) {
will get matched-m
option can be used to limit number of matches per input file, which will give an additional benefit of not having to search entire input file
If you use the above regexp with GNU grep
, note that:
grep
matches only one line at a time. If you use-z
option,grep
will consider ASCII NUL as the record separator, which effectively gives you ability to match across multiple lines, assuming input doesn't have NUL characters that can prevent such matching. Another effect of-z
option is that NUL character will be appended to each output result (this could be fixed by piping results totr '\0' '\n'
)-o
option will be needed to print only matching portion, which means you won't be able to get line number prefix- for the given task,
-P
isn't needed,grep -zoE 'class\s+\w+\s*<[^{]*,[^{]*,[^{]*>[^{]*\{' *.java | tr '\0' '\n'
will give you similar result as theripgrep
command. But, you won't get line number prefix, filename prefix will be only for each matching portion instead of each matching line and you won't get rest of line beforeclass
and after{