Can an FPGA design be mostly (or completely) asynchronous?
A short answer would be: yes; a longer answer would be: it is not worth your time.
An FPGA itself can run a completely asynchronous design no problem. The result you get is the problem since timing through any FPGA is not very predictable. The bigger problem is the fact that your timing and resultant design will almost definitely vary between different place and route sessions. You can put in constraints on individual asynchronous paths making sure that they do not take too long, but I'm not quite sure that you can specify a minimum delay.
In the end it means that your design will be unpredictable and potentially completely variable with even a slight design change. You'd have to look through the entire timing report every time you change anything at all just to make sure that it would still work. On the other hand, if the design is synchronous, you just look for a pass or fail at the end of place and route (assuming your constraints are setup properly, which doesn't take long at all).
In practice people aim for completely synchronous designs but if you need to simply buffer or invert a signal, you don't need to go through a flip flop as long as you constrain it properly.
Hope this clears it up a bit.
"Can one build a complex bunch of logic and have stuff ripple through it as fast as it can?" Yes. Entire CPUs have been built that are completely asynchronous -- at least one of them was the fastest CPU in the world. http://en.wikipedia.org/wiki/Asynchronous_circuit#Asynchronous_CPU
It irks me that people reject asynchronous design techniques, even though they theoretically have several advantages over synchronous design techniques, merely because (as others here have said) asynchronous designs are not as well supported by the available tools.
To me, that's like recommending that all bridges be made out of wood, because more people have woodworking tools than steel-working tools.
Fortunately, some of the advantages of asynchronous design can be gained while still using mostly synchronous design techniques by using a global asynchronous local synchronous (GALS) design.
One factor not yet mentioned is metastability. If a latching circuit is hit with a sequence of input/transitions such that the resulting state would depend upon propagation delays or other unpredictable factors, there is no guarantee that the resulting state will be a clean "high" or "low". Consider, for example, an edge-triggered flip flop which is currently outputting a "low", and has its input change from low to high at almost the same time as a clock edge arrives. If the clock edge happens long enough before the input change, the output will simply sit low until the next clock edge. If the clock edge happens long enough after the input change, the output will quickly switch once from low to high and stay there until the next clock edge. If neither of those conditions applies, the output can do anything. It might stay low, or quickly switch once and stay high, but it might stay low for awhile and then switch, or switch and then some time later switch back, or switch back and forth a few times, etc.
If a design is fully synchronous, and all the inputs are double-synchronized, it is very unlikely that a timing pulse would hit the first latch of a synchronizer in such a way as to cause it to switch at the perfect time to confuse the second latch. In general, it is safe to regard such things as "just won't happen". In an asynchronous design, however, it is often much harder to reason about such things. If a timing constraint on a latching circuit (not just flip flops, but any combination of logic that would act as a latch) is violated, there's no telling what the output will do until the next time there's a valid input condition that forces the latch to a known state. It is entirely possible that delayed outputs will cause the timing constraints of downstream inputs to be violated, leading to unexpected situations, especially if one output is used to compute two or more inputs (some may be computed as though the latch was high, others as though it were low).
The safest way to model an asynchronous circuit would be to have almost every output circuit produce an "X" output for a little while whenever it switches between "0" and "1". Unfortunately, this approach often results in nearly all nodes showing "X", even in cases which would in reality have almost certainly resulted in stable behavior. If a system can work when simulated as having all outputs become "X" immediately after an input changes, and remain "X" until the inputs are stable, that's a good sign the circuit will work, but getting asynchronous circuits to work under such constraints is often difficult.