I/O timing constraints for FPGA/ASIC #6: Sink-synchronous output

I/O timing constraints for FPGA/ASIC #6: Sink-synchronous output

6 actionable steps to get fool-proof and reliable constraints. This article aims to be as hands-on and practical as possible while also discussing the principles and theory behind reliable constraints. A full Vivado example build is included, where you can see the steps in action.

Introduction: Why are constraints needed?

An output interface sends data to a peripheral device, where it will at some point be sampled. Constraints make sure the device is never sampling data when it is metastable. This analysis is done all the time by your build tool timing engine when sending data between flip-flops. But for external interfaces, the behavior of the receiver is not known by the tool, so you have to provide this information.

If you have proper constraints in place and your routed timing passes, the tool guarantees that your interface will work exactly the same on all individuals/nodes, under all conditions, and in all upcoming builds.

The engineers of yesteryear commonly skipped constraints, which I would not recommend, since it leaves you with no guarantees whatsoever. Expect intermittent errors and build/device/temperature-dependent behavior if you go down this foolish route.

Step 0: Make sure you are reading the correct article

This article is about sink-synchronous output interfaces. Meaning situations where the peripheral device sends out a clock, and the FPGA/ASIC sends data to the peripheral that is synchronous to this clock (see banner picture above). Keep reading this article if your setup looks like this. Otherwise, I would recommend one of these articles:

Step 1: Understand the SDC notation

The SDC command set_output_delay expresses setup and hold requirements using a max and min value. The values can be illustrated like this:

Important to note that the command specifies the valid data boundaries. The max value is the time before the clock edge that data is guaranteed to have assumed a valid value. The min value is the time before the clock edge that data might assume an invalid value.

Arguments to the command are expressed in nanoseconds (ns). Values can be negative depending on the situation. It is worth noting that the meaning of max and min is NOT the same for an output constraint as for an input constraint.

Margin and pessimism

In upcoming steps we will often encounter uncertainties or ranges in our values, and in order to deal with that, we have to establish the following principle:

Margin/pessimism for an output constraint means making the valid window larger. I.e. adding to max and subtracting from min.

Step 2: Express peripheral timing requirement using SDC

Here is the first challenge because there are quite a few ways to express digital timing, and different datasheets use different representations. You need to consult the datasheet of your peripheral device and identify how it's expressed.

If it's expressed like the SDC max/min values, then you are in luck and can move on to the next step. If not, then one of the guides below will show you how to translate it to SDC notation. Note that the very general names "Ta" and "Tb" are used since there is no consensus on what these entities should be called. Expect your datasheet to use different names.

Formulation #1: Valid window around clock

Sometimes the datasheet specifies timing using the following:

  1. Ta = time before clock edge that data must be valid.

  2. Tb = time after clock edge that data is allowed to go invalid.

This resolves almost trivially to

  1. max = Ta

  2. − min = Tb ⇔ min = − Tb

Note that min is negative since the point where data is allowed to go invalid is after the clock edge. If your datasheet specifies ranges for one or both of these, you should use the most pessimistic values. In this case, that means the maximum for both values.

Formulation #2: Invalid window around clock

Other times, the datasheet might specify timing using

  1. Ta = time before clock edge that data is allowed to go invalid.

  2. Tb = time after clock edge that data must be valid.

Let's draw some more lines that will help us derive the SDC formulation:

Note that the distances at the bottom are negative max/min, since both the start and end of the valid window are after the clock edge. From this, we can deduce that

  1. max = Tb max = − Tb

  2. min = Tperiod Ta min = Ta Tperiod

If your datasheet specifies ranges for one or more of these, you should use the most pessimistic values. In this case, that means minimum Ta, minimum Tb, and maximum Tperiod.

Formulation #3: Invalid window after clock

Another possible datasheet timing formulation might be

  1. Ta = time after clock edge that data is allowed to go invalid.

  2. Tb = time after clock edge that data must be valid.

With some helper lines we will again easily find our SDC values:

Note that the distance in the bottom-right is negative min, since the end of the valid data window is after the clock edge. From this, we can deduce that

  1. max = Tperiod − Tb

  2. − min = Ta min = − Ta

If your datasheet specifies ranges for one or more of these, you should use the most pessimistic values. In this case, that means maximum Ta, minimum Tb, and maximum Tperiod.

Step 3: Compensate for trace delays

For discussion's sake, we can imagine three trace delay situations. First a reference situation, then with the data path elongated, and then with the clock path elongated.

It is clear that trace delays shift the valid data window, so we have to compensate for that in the FPGA/ASIC to place the valid window where the peripheral device wants it. Using logical reasoning, and sketches like the one above, it becomes quite obvious that with longer data and clock traces, the data window at the FPGA/ASIC output pin should be shifted backward.

Meaning, trace delays impact the SDC max/min values like this:

  1. max += Ttrace_clock_max + Ttrace_data_max

  2. min += Ttrace_clock_min + Ttrace_data_min

Important to note that we use the minimum/maximum range of the delays in order to apply the pessimism that makes the valid window larger.

Finding trace delays with the PCB CAD tool

Trace delays depend on the physical properties of the trace and the dielectric around it. If you have access to the CAD files for your board, the PCB tool should give you signal delay values.

I find it suitable to add a ±10% margin to form the minimum/maximum values. This is to compensate for tool uncertainties, material/production variations, temperature gradients, etc.

Finding trace delays the hard way

In some unfortunate scenarios, we might not have access to the delay figures. Typically when using an evaluation board from a manufacturer who does not understand that constraints are necessary. If we don't have any other options, we can do a visual estimation of the trace lengths, which, along with some assumptions about signal propagation speed, gives the delay. Since we don't know the materials, I would personally assume that signal propagation speed is somewhere between 30% and 100% of the speed of light.

  1. trace_delay_min [s] = trace_length_min [m] / (1 * c [m/s])

  2. trace_delay_max [s] = trace_length_max [m] / (0.3 * c [m/s])

See the example code of article #2 for an example of this. This approach should be a last resort, and I don't recommend it in general. It is unlikely to work for any kind of high-speed bus. If you do use it, remember to be pessimistic in the trace length range.

Step 4: Compensate for driver chips, cables, etc

If there are gate driver chips, or anything else in the clock/data paths, that needs to be taken into account also. One can not assume that the latency through parallel drivers is the same. Most driver datasheets will show a range for the latency. Use this, and the same reasoning as for trace delays above, to adapt max/min. Same thing for any cables, etc: Adapt max/min, while being as pessimistic as possible.

Devices/cables/etc on the data or clock path impact the SDC max/min like this:

  1. max += Tdelay_max

  2. min += Tdelay_min

Step 5: Putting it all together

Since TCL is a full-fledged scripting language, we can formulate our calculations and constraints in a very structured and formal way. Using variables, loops, printouts, etc, we can make a script that is quite readable and avoids any magic numbers.

An example TCL constraint script for a sink-synchronous output interface is available here on GitHub. It showcases the following:

  1. Converting peripheral device constraints to SDC notation.

  2. Looping over multiple data bits.

  3. Adjusting max/min for pessimistic trace delays.

  4. Adjusting max/min for parallel driver chips.

  5. Phase-shifting the launch clock using an MMCM.

All while keeping track of the unit and providing some useful printouts. This script is written for Vivado, but most of the code, and certainly the ideas behind it, should work in other tools also. Since this is a TCL file, it needs a flag when it's loaded:

read_xdc -unmanaged "<path.tcl>"

This is done automatically when using tsfpga. Some of the commands in the script might not work during synthesis, so enabling the constraint for implementation only could be a good idea (see here).

Step 6: Verification

The question of design verification is as relevant for I/O as it is for internal logic. If you can run a write-read loop on your device, or it has some test pattern, then that is great. This enables bit-exact testing, which is of course the gold standard.

However, I would argue that a test like this is an indication but not proof that the interface is correctly constrained. It does not prove that the interface will work on all devices, under all conditions, or even in the next build. It could very well be that your timing is on the edge and will fail if the device gets hot.

This is the same situation that I've been nagging about in my article series about clock domain crossings.

A passing on-device test definitely increases the confidence in our constraints, but I think we have to do some offline activities also. For example, we can increase confidence further by:

  1. Studying the topic (reading articles such as this).

  2. Observing the actual phase differences using an oscilloscope, as far as possible.

  3. Experimenting with unreasonable constraint values and seeing the timing fail.

  4. Code review from peers.

Please let me know in the comments if you have other tools, processes, or ideas. Other than that, this was the last step and I will close out by giving some useful tips below.

Tip 1: Use a clock-capable input pin

Using a clock-capable FPGA pin for the clock input is almost a must. These pins have dedicated routing to the global clock network and direct access to conveniently placed MMCM/PLL blocks. If using AMD/Xilinx, these are called "GC" or "CC" pins.

Before deciding your pinning and manufacturing your PCB, you must consult the datasheet of your FPGA and choose appropriate pins for all clock inputs. If you use a regular data pin for your clock, the resulting FPGA-internal clock that you launch data with is going to have very bad jitter properties. This means less margin in the output data window and a higher likelihood of timing failure.

Tip 2: Write constraints before you order PCBs

Theoretical discussions about device capabilities are almost always based on datasheets, at least in the early stage of a system design. But I would argue that the ground truth to many questions come not from the datasheet but from the build tool: Can the chosen pins be used the way we want to? Will the timing window be large enough? Is the chip fast enough? Can we generate the clocks that we need? And so on.

There are a lot of caveats, corner cases, and gotchas when it comes to these questions. I believe it should be part of the PCB design review process to set up an FPGA build that utilizes the chosen pins and has at least a rough version of the constraints in place. If this build passes, you can be very confident in the choice of FPGA pinning/architecture.

Tip 3: Use flip-flop in I/O buffer

Using the flip-flop in the FPGA I/O buffer (IOB) is very beneficial when sending out data. The example code of this article shows how to do this. It is only possible, however, if

  1. there is no logic between the flip-flop and the port, and

  2. the flip-flop value goes to the port and nowhere else.

You must construct the logic of your transmitter to fulfill these conditions if you want to use the IOB flip-flop. Given the restrictions above, it is possible to set an IOB constraint but then have it ignored by the placer. If using Vivado, I would recommend raising the "Place 30-722" message to severity "ERROR" and making sure your build system crashes if any "ERROR"-level messages occur. When using tsfpga, this is done automatically.

Tip 4: Phase shift the launch clock

Unless your required window of valid data is neatly placed after the clock edge with sufficient margin, it is quite likely that your timing will fail. It is quite common that you have to create phase-shifted clock variants that place the edge outside the valid data window. I won't go into the details of this since it is beyond the scope of the article, but you can see an example in the example code for this article.

I do want to comment on how this impacts the constraint, however: It doesn't. Your constraint of the data port shall still be applied relative to the clock port. I.e. the original incoming clock. Not the phase-shifted internal launch clock.

Tip 5: Debug

One can always hope, but it's quite likely that your timing will not pass on the first try. It's hard to say something general about debugging I/O constraints since there are so many unique situations. But I would recommend the commands

  1. report_timing -setup -to [get_ports "my_data_port"]

  2. report_timing -hold -to [get_ports "my_data_port"]

This will show you exactly the paths your clock and data take, how they are delayed in each step, and where the requirements are violated. You can also use the "Path report" tool in Vivado, which presents the same information but a little more visually. This information, along with printouts from the constraint script, should indicate where the issue is.

Note that the "-setup" report analyzes whether your SDC max requirement is satisfied, while "-hold" analyzes SDC min. I would recommend investing some time to understand these reports; they give a lot of information that might be hard to digest, but they are very useful.

Summary

These articles are the culmination of ten years of frustration whenever I had to write I/O constraints. If you're anything like me, you've probably felt the same frustration and wished for better tutorials. I did my best to present things in a systematic and clear way here. Hopefully I managed to strike a decent balance between theoretical and practical, easy-to-digest and exhaustive.

I hope that you learned something and that it will be useful for you. If nothing else, I'm happy to have these articles as a reference for myself whenever I have to write constraints in the future.

If you enjoyed this article you will probably enjoy my article series about clock domain crossings. Please Connect or Follow me here on Linkedin so you don't miss future FPGA articles that I publish.

To view or add a comment, sign in

Others also viewed

Explore topics