How can I implement a very simple asynchronous DRAM controller?
There are complete schematics for the IBM PC/XT in the IBM Personal Computer XT technical reference manual (Appendix D), which you may be able to find on line.
The problem here is that, given a strobe line which is activated upon a memory read or write, you wish to generate RAS, CAS and a control line (call it MUX) for the address multiplexer. For simplicity, I will assume unrealistically that the strobe, RAS, and CAS are all active-high.
Looking at the PC/XT schematic and schematics from some other computers around this time, I see three basic strategies, which are roughly the following:
Use the strobe for RAS. Use a delay line (a part whose output is a time-delayed version of its input) on RAS to generate MUX, and use another delay line to generate a still later version of RAS, which is used for CAS. This strategy is used by the PC/XT and the TRS-80 Model II.
An example (modern) delay line part is the Maxim DS1100.Use the strobe for RAS and delay it for MUX and CAS, but do this using a high-speed shift register instead of a delay line. This strategy is used by the TRS-80 Model I and the Apple II.
Use custom ICs. This is the strategy of the Commodore 64.
Your question is complicated enough that I'm not even sure what your actual problem is, but I'll try!
The "cleanest" 6502-based DRAM design I could find is from the Commodore PET 2001-N. It has a 6502 running at 1 MHz, but the DRAM logic is clocked at 16 MHz, likely to generate all the timings.
I have not analyzed the details, but the main action seems to happen with a 74191 4-bit counter connected to a 74164 shift register. This outputs 8 separate lines going into a 74157 MUX which is controlled by the R/W line. The output from the MUX goes into a 7474 flip-flop and some discrete logic to generate the final RAS/CAS signals. Here is an excerpt which links to the relevant page in the reference schematic.
Refresh is handled with a separate counter, and each address line is hooked up to a multiplexer that selects either the "real" address or the refresh address.
Parts of this logic also seems to generate timings for the video subsystem. I'm sure it can be simplified for your particular needs, but I think that something similar can be useful: A high frequency counter, shift register and multiplexers.