How can I simplify token prediction DFA?
Grammars that are very large (many different tokens) have that problem, unfortunately (SQL grammars suffer from this too).
Sometimes this can be fixed by making certain lexer rules fragments
opposed to "full" lexer rules that produce tokens and/or re-arranging the way characters are matched inside the rules, but by looking at the way you already tried yourself, I doubt there can gained much in your case. However, if you're willing to post your lexer grammar here on SO, I, or someone else, might see something that could be changed.
In general, this problem is fixed by splitting the lexer grammar into 2 or more separate lexer grammars and then importing those in one "master" grammar. In ANTLR terms, these are called composite grammars. See this ANTLR Wiki page about them: http://www.antlr.org/wiki/display/ANTLR3/Composite+Grammars
EDIT
As @Gunther rightfully mentioned in the comment beneath the OP, see the Q&A: Why my antlr lexer java class is "code too large"? where a small change (the removal of a certain predicate) caused this "code too large"-error to disappear.
Well, actually it is not always easy to make a composite grammar. In many cases this AntTask helps to fix this problem (it must be run every time after recompiling a grammar, but this process is not so boring).
Unfortunately, even this magic script doesn't help in some complex cases. Compiler can begin to complaining about too large blocks of DFA transitions (static String[] fields).
I found an easy way to solve it, by moving (using IDE refactoring features) such fields to another class with arbitrarily generated name. It always helps when moving just one or more fields in such way.