Converting grammar to Chomsky Normal Form?
Alternative answer: The grammar can only produce a finite number of strings, namely 6.
S -> aabbbaab | aabbb | bbaab | bb | abbaab | abb.
You can now condense this back to Chomsky Normal Form by hand.
By substitution, we can find the set of all strings produced. Your initial rules:
S -> AB | aB.
A -> aab | lambda.
B -> bbA.
First split up the S
rule:
S -> AB.
S -> aB.
Now substitute what A and B expand into:
S -> AB
-> (aab | lambda) bbA
-> (aab | lambda) bb (aab | lambda).
S -> aB
-> abbA
-> abb (aab | lambda).
Expand these again to get:
S -> aabbbaab.
S -> aabbb.
S -> bbaab.
S -> bb.
S -> abbaab.
S -> abb.
To change this finite set to Chomsky Normal Form, it suffices to do it by brute force without any intelligent factoring. First we introduce two terminal rules:
X -> a.
Y -> b.
Now for each string, we consume the first letter with a terminal variable and the remaining letters with a new variables. For example, like this:
S -> aabbb. (initial rule, not in Chomsky Normal Form)
S -> XC, where X->a and C->abbb.
C -> XD, where X->a and D->bbb.
D -> YE, where Y->b and E->bb.
E -> YY, where Y->b and Y->b.
We just go through this process for all 6 strings, generating a lot of new intermediate variables.
Wikipedia says:
In computer science, a context-free grammar is said to be in Chomsky normal form if all of its production rules are of the form:
- A -> BC, or
- A -> α, or
- S -> ε
where A, B, C are nonterminal symbols, α is a terminal symbol, S is the start symbol, and ε is the empty string. Also, neither B nor C may be the start symbol.
Continuing your work:
S0 -> S S -> AB | aB | B A -> aab B -> bbA | bb
Instead of using |
to denote different choices, split a rule into multiple rules.
S0 -> S S -> AB S -> aB S -> B A -> aab B -> bbA B -> bb
Create new rules Y -> a
and Z -> b
because we will need them soon.
S0 -> S S -> AB S -> aB S -> B A -> aab B -> bbA B -> bb Y -> a Z -> b
S -> aB
is not of the form S -> BC
because a
is a terminal. So change a
into Y
:
S0 -> S S -> AB S -> YB S -> B A -> aab B -> bbA B -> bb Y -> a Z -> b
Do the same for the B -> bb
rule:
S0 -> S S -> AB S -> YB S -> B A -> aab B -> bbA B -> ZZ Y -> a Z -> b
For A -> aab
, create C -> YY
; for B -> bbA
, create D -> ZZ
:
S0 -> S S -> AB S -> YB S -> B A -> CZ C -> YY B -> DA D -> ZZ B -> ZZ Y -> a Z -> b
For S -> B
, duplicate the one rule where S
occurs on the right hand side and inline the rule:
S0 -> B S0 -> S S -> AB S -> YB A -> CZ C -> YY B -> DA D -> ZZ B -> ZZ Y -> a Z -> b
Deal with the rules S0 -> B
and S0 -> S
by joining the right hand side to the left hand sides of other rules. Also, delete the orphaned rules (where the LHS symbol never gets used on RHS):
S0 -> DA S0 -> ZZ S0 -> AB S0 -> YB A -> CZ C -> YY B -> DA D -> ZZ B -> ZZ Y -> a Z -> b
And we're done. Phew!
Without getting into too much theory and proofs(you could look at this in Wikipedia), there are a few things you must do when converting a Context Free Grammar to Chomsky Normal Form, you generally have to perform four Normal-Form Transformations. First, you need to identify all the variables that can yield the empty string(lambda/epsilon), directly or indirectly - (Lambda-Free form). Second, you need to remove unit productions - (Unit-Free form). Third, you need to find all the variables that are live/useful (Usefulness). Four, you need to find all the reachable symbols (Reachable). At each step you might or might not have a new grammar. So for your problem this is what I came up with...
Context-Free Grammar
G(Variables = { A B S }
Start = S
Alphabet = { a b lamda}
Production Rules = {
S -> | AB | aB |
A -> | aab | lamda |
B -> | bbA | } )
Remove lambda/epsilon
ERRASABLE(G) = { A }
G(Variables = { A S B }
Start = S
Alphabet = { a b }
Production Rules = {
S -> | AB | aB | B |
B -> | bbA | bb | } )
Remove unit produtions
UNIT(A) { A }
UNIT(B) { B }
UNIT(S) { B S }
G (Variables = { A B S }
Start = S
Alphabet = { a b }
Production Rules = {
S -> | AB | aB | bb | bbA |
A -> | aab |
B -> | bbA | bb | })
Determine live symbols
LIVE(G) = { b A B S a }
G(Variables = { A B S }
Start = S
Alphabet = { a b }
Production Rules = {
S -> | AB | aB | bb | bbA |
A -> | aab |
B -> | bbA | bb | })
Remove unreachable
REACHABLE (G) = { b A B S a }
G(Variables = { A B S }
Start = S
Alphabet = { a b }
Production Rules = {
S -> | AB | aB | bb | bbA |
A -> | aab |
B -> | bbA | bb | })
Replace all mixed strings with solid nonterminals
G( Variables = { A S B R I }
Start = S
Alphabet = { a b }
Production Rules = {
S -> | AB | RB | II | IIA |
A -> | RRI |
B -> | IIA | II |
R -> | a |
I -> | b | })
Chomsky Normal Form
G( Variables = { V A B S R L I Z }
Start = S
Alphabet = { a b }
Production Rules = {
S -> | AB | RB | II | IV |
A -> | RL |
B -> | IZ | II |
R -> | a |
I -> | b |
L -> | RI |
Z -> | IA |
V -> | IA | })