Formally constructing Control Flow Graph
If your intention is to simply create something that looks a bit more formal, then you could express these matching operations as inference rules using the standard notation. You should express it in terms of a single step of reduction, rather than recursively, because then it is sufficient to simply keep applying these rules until no more can be applied.
That said, this definition is essentially going to say exactly the same thing as your scala code. If you really want to do anything "formal" the properties you need to prove are:
- Your CFG translation algorithm always terminates
- Whether your CFG is minimal with respect to a given AST input
- Whether there is a unique CFG derivable by your algorithm for a given AST input (i.e. it's not non-deterministic which CFG it produces).
I don't think your basic blocks approach (rather than a per-statement approach) is necessarily a bad idea, either. It seems perfectly reasonable that if you can match a basic block, you can write a rule that makes assertions about set membership based upon the presence of this match. It seems like the inductive definition you started sketching could work just fine.
Something else interesting might be to try to relate (formally) structured operational semantics and your construction of CFGs. There might already be work in this area, but I only did a cursory google search and didn't find any clearly stated relationship between the two, but intuitively it seems like one should exist.
Google's Closure Compiler implements a Control-Flow Analysis which transforms an AST for JavaScript into a Control-Flow Graph. The ideas for this implementation are inspired from the paper: Declarative Intraprocedural Flow Analysis of Java Source Code.