Is using floor plan tool during FPGA design ever actually useful or required?
Since nobody has answered, here are a few things you can do in the floorplanner (my experience is with Xilinx tools, but I expect the others are similar):
Verify "visually" that some particular resources have been used. For example carry chains, block RAMs, clock management tiles, etc.
Verify that highly interconnected logical functions have been placed where they can communicate using local rather than regional or global routing resources.
Verify that logic sharing the same clock has been efficiently placed with respect to clock routing resources.
Visually edit the connections to the debug logic
Many years ago, I manually placed certain logic very close to the associated I/O blocks to obtain the fastest possible I/O. I don't know if this would still be necessary or helpful.
- Why would one ever need to use these floor plan tools to lock design logic into specific regions? Is there any benefit to doing this? Is this ever really required?
There are certainly reasons why it is useful, but it really depends on the design.
For massively interconnected designs which don't have nice groupings (e.g. there are lots of processing cores which depend heavily on all the other cores, rather than each core operating independently), the synthesis tools can struggle to see the wood for the trees.
They try to bunch all of the logic as close together as possible for timing, but because the tools can't see how to group it into small sections, this actually can result in worse FMax as bits of cores get exploded around within other cores due to resource scarcity or routing conjestion.
By using LogicLock regions or equivalent, you can help the tools to see blocks which should be grouped together, and this can improve the timing performance as the tools can more tightly pack parts within the LogicLock regions.
If there are many clocks in a design, you can also LogicLock registers that belong to one clock into a specific region to try and reduce the number global clocks required. The synthesis tools are quite good at this nowadays, so probably not needed.
Another reason is if you have logic which is being pulled strongly in two directions (e.g. memory PHY in one corner, processor in the other corner, interconnect fabric in between). If one part was, say, running at a higher frequency than the other, then ideally any clock crossing would be closer to the high speed portion to cope with timing requirements, however if the logic is being pulled strongly in two directions it can be hard for the tools to optimise. There have been times where adding a LogicLock region for this sort of reason has taken designs I've worked on from failing timing to passing.
For more exotic use cases, such as Time to Digital conversion, you would use long carry chains to convert a pulse width into a multi-bit code. This technique typically requires precisely controlled and repeatable propagation delays, so constraining even to the exact register or LUT can be required.
- Also, if we have done this logic locking to specific regions, what if we want to add debug logic e.g SignalTap II (Quartus) instance into that logic or an Identify Instance (Libero)?
I can't speak for Libero, but for Quartus unconstrained logic can still be placed within unused portions of the LogicLock region (unless you specifically disallow this). If you add debug logic like SignalTap it will be free to place it wherever it wants (unless you constrain SignalTap to a region), including adding the tap logic within the logiclocked region.
Finally you might want to save a regions of the FPGA for a specific future expansion, so might constrain the current design to a smaller portion of the FPGA so that you know you have the space you need later on.
- How does one decide what part of design should be locked into what part of the FPGA floor plan? For complex designs, it will certainly be very difficult to make a decision on this by human. This is why I don't understand the point of these tools.
Unless you have a reason to do so, its usually best to leave it up to the synthesis tools and not overconstrain the design to begin with.
If you start running in to issues with, say, timing analysis, then you could start to investigate if there are lots of long timing paths that appear to be due to high speed logic being widely distributed rather than packed tightly. The Chip Planner is quite useful as it in Quartus at least you can get it to show timing paths.
The fix might be to add more pipelining, or to start constraining logic to certain regions. Adding regional constraints can also allow you to pick apart complex designs to say, group high speed logic, and then see how that affects other paths from perhaps lower speed regions which could then point towards good places to add pipelining.
The other answers already give several important points and I'll add another:
When you work in a safety critical environment, you might want to spatially separate functions in order to harden them against single event upsets (SEUs) (such as triplicating the functionality and then majority vote the results). There are several ways of doing this like triplicating all the registers, or triplicating whole blocks, etc. All these methods have in common that the redundant elements must be separated sufficiently as SEUs are physically local phenomenons (like a few hundreds of microns of diameter on the die when a particle passes through it) in order to efficiently against SEUs. You can either do this manually with constraints or use a standardized flow of your tool vendor. An example for this in the Xilinx world is described here: https://www.xilinx.com/support/documentation/application_notes/xapp1335-isolation-design-flow-mpsoc.pdf