I/O timing constraints for FPGA/ASIC #3: Sink-synchronous input

I/O timing constraints for FPGA/ASIC #3: Sink-synchronous input

6 actionable steps to get fool-proof and reliable constraints. This article aims to be as hands-on and practical as possible while also discussing the principles and theory behind reliable constraints. A full Vivado example build is included, where you can see the steps in action.

Introduction: Why are constraints needed?

Your build tool timing engine needs to know where every single flip-flop in your design can sample its data input, relative to its clock input, in order to avoid metastable values. For device-internal flip-flops, this is calculated and handled automatically. But for external interfaces, the behavior of the data line is not known by the tool, so you have to provide this information through constraints.

If you have proper constraints in place and your routed timing passes, the tool guarantees that your interface will work exactly the same on all individuals/nodes, under all conditions, and in all upcoming builds as well.

The engineers of yesteryear commonly skipped constraints, which I would not recommend, since it leaves you with no guarantees whatsoever. Expect intermittent errors and build/device/temperature-dependent behavior if you go down this foolish route.

Step 0: Make sure you are reading the correct article

This article is about sink-synchronous input interfaces. Meaning situations where input data from a peripheral device is synchronous to a clock that is sent out by the FPGA/ASIC (see banner picture above). Keep reading this article if your setup looks like this. Otherwise, I would recommend one of these articles:

Step 1: Understand the SDC notation

The SDC command set_input_delay expresses setup and hold requirements using a min and max value. The values can be illustrated like this:

Article content

Important to note that the command specifies the invalid data boundaries. The min value is the time after the clock edge that data might assume an invalid value. The max value is the time after the clock edge that data is guaranteed to have assumed a new valid value.

Arguments to the command are expressed in nanoseconds (ns). Values can be negative depending on the situation. It is worth noting that the meaning of min and max are NOT the same for an input constraint as for an output constraint.

Margin and pessimism

The timing constraints we apply will instruct the build tool where the valid window of data is. I.e. where it is legal for the input flip-flop to sample data in relation to the clock. In upcoming steps we will often encounter uncertainties or ranges in our values, and in order to deal with that, we have to establish the following principle:

Margin/pessimism for an input constraint means making the valid window smaller. I.e. adding to max and subtracting from min.

Step 2: Express peripheral timing using SDC

Here is the first challenge because there are quite a few ways to express digital timing, and different datasheets use different representations. You need to consult the datasheet of your peripheral device and identify how it's expressed.

If it's expressed like the SDC min/max values, then you are in luck and can move on to the next step. If not, then one of the guides below will show you how to translate it to SDC notation. Note that the very general names "Ta" and "Tb" are used since there is no consensus on what these entities should be called. Expect your datasheet to use different names.

Formulation #1: Valid window around clock

Sometimes the datasheet specifies timing using the following:

  1. Ta = time before clock edge that data is guaranteed valid.
  2. Tb = time after clock edge that data might go invalid.

Article content

This is very common and quite easy to grasp. Let's draw some more lines that will help us derive the SDC formulation:

Article content

It can be seen quite clearly that

  1. min = Tb
  2. max = Tperiod − Ta

If your datasheet specifies ranges for one or more of these, you should use the most pessimistic values. In this case, that means minimum Ta, minimum Tb, and maximum Tperiod.

Formulation #2: Valid window after clock

Other times, the datasheet might specify timing using

  1. Ta = time after clock edge that data is guaranteed valid.
  2. Tb = time after clock edge that data might go invalid.

Article content

With some helper lines we will again easily find our SDC values:

Article content

Note that the distance in the bottom-left is negative min, since data goes invalid before the clock edge. From this, we can deduce that

  1. − min = Tperiod − Tb ⇔ min = Tb − Tperiod
  2. max = Ta

If your datasheet specifies ranges for one or more of these, you should use the most pessimistic values. In this case, that means maximum Ta, minimum Tb, and maximum Tperiod.

Formulation #3: Invalid window around clock

Another possible datasheet timing formulation might be

  1. Ta = time before clock edge that data might go invalid.
  2. Tb = time after clock edge that data is guaranteed valid.

Article content

This resolves almost trivially to

  1. − min = Ta ⇔ min = − Ta
  2. max = Tb

If your datasheet specifies ranges for one or both of these, you should use the most pessimistic values. In this case, that means the maximum for both values.

Step 3: Compensate for trace delays

For discussion's sake, we can imagine three trace delay situations. First a reference situation, then with the data path elongated, and then with the clock path elongated.

Article content

Using logical reasoning, and sketches like the one above, it is quite obvious that longer clock or data traces will both shift the data window forward. Meaning that trace delays impact the SDC min/max values like this:

  1. min += Ttrace_clock_min + Ttrace_data_min
  2. max += Ttrace_clock_max + Ttrace_data_max

Important to note that we use the minimum/maximum range of the delays in order to apply the pessimism that makes the invalid window larger.

Finding trace delays with the PCB CAD tool

Trace delays depend on the physical properties of the trace and the dielectric around it. If you have access to the CAD files for your board, the PCB tool should give you signal delay values.

I find it suitable to add a ±10% margin to form the minimum/maximum values. This is to compensate for tool uncertainties, material/production variations, temperature gradients, etc.

Finding trace delays the hard way

In some unfortunate scenarios, we might not have access to the delay figures. Typically when using an evaluation board from a manufacturer who does not understand that constraints are necessary. If we don't have any other options, we can do a visual estimation of the trace lengths, which, along with some assumptions about signal propagation speed, gives the delay. Since we don't know the materials, I would personally assume that signal propagation speed is somewhere between 30% and 100% of the speed of light.

  1. trace_delay_min [s] = trace_length_min [m] / (1 * c [m/s])
  2. trace_delay_max [s] = trace_length_max [m] / (0.3 * c [m/s])

See the example code of article #2 for an example of this. This approach should be a last resort, and I don't recommend it in general. It is unlikely to work for any kind of high-speed bus. If you do use it, remember to be pessimistic in the trace length range.

Step 4: Compensate for driver chips, cables, etc

If there are gate driver chips, or anything else in the clock/data paths, that needs to be taken into account also. One can not assume that the latency through parallel drivers is the same. Most driver datasheets will show a range for the latency. Use this, and the same reasoning as for trace delays above, to adapt min/max. Same thing for any cables, etc: Adapt min/max, while being as pessimistic as possible.

Devices/cables/etc on the clock or data path impact the SDC min/max like this:

  1. min += Tdelay_min
  2. max += Tdelay_max

Step 5: Putting it all together

Since TCL is a full-fledged scripting language, we can formulate our calculations and constraints in a very structured and formal way. Using variables, loops, printouts, etc, we can make a script that is quite readable and avoids any magic numbers.

An example TCL constraint script for a sink-synchronous input interface is available here on GitHub. It showcases the following:

  1. Converting peripheral device constraints to SDC notation.
  2. Looping over multiple data bits.
  3. Adjusting min/max for pessimistic trace delays.
  4. Automatically creating the output clock.
  5. Using an ODDR primitive to minimize the output clock jitter.

All while keeping track of the unit and providing some useful printouts. This script is written for Vivado, but most of the code, and certainly the ideas behind it, should work in other tools also. Since this is a TCL file, it needs a flag when it's loaded:

read_xdc -unmanaged "<path.tcl>"

This is done automatically when using tsfpga. Some of the commands in the script might not work during synthesis, so enabling the constraint for implementation only could be a good idea (see here).

Step 6: Verification

The question of design verification is as relevant for I/O as it is for internal logic. If your peripheral can send out a test pattern, or you can run a write-read loop on it, then that is great. This enables bit-exact testing, which is of course the gold standard.

However, I would argue that a test like this is an indication but not proof that the interface is correctly constrained. It does not prove that the interface will work on all devices, under all conditions, or even in the next build. It could very well be that your timing is on the edge and will fail if the device gets hot.

This is the same situation that I've been nagging about in my article series about clock domain crossings.

A passing on-device test definitely increases the confidence in our constraints, but I think we have to do some offline activities also. For example, we can increase confidence further by:

  1. Studying the topic (reading articles such as this).
  2. Observing the actual phase differences using an oscilloscope, as far as possible.
  3. Experimenting with unreasonable constraint values and seeing the timing fail.
  4. Code review from peers.

Please let me know in the comments if you have other tools, processes, or ideas. Other than that, this was the last step and I will close out by giving some useful tips below.

Tip 1: Avoid this topology

While sending out a clock on an FPGA pin is technically possible and can work quite well, it's never going to be that great, you know. Broadly speaking, I would be very hesitant to use this topology for any clock rate above 50 MHz. If there are cables or other devices between the FPGA and the peripheral, the limit would be even lower. Converting to a system-synchronous interface is preferred since a discrete clock generator is going to have much better jitter and electrical drive properties.

The advantage of using the FPGA as a clock source is, of course, that you save in terms of footprint and cost. It could be suitable for slow interfaces where the BOM really matters.

Tip 2: Use an ODDR primitive for less output clock jitter

When using AMD/Xilinx FPGAs, any pin can be used to send out a clock. Rather than simply assigning the clock signal to the output port, however, an ODDR primitive yields much better clock performance. A quick experiment on a 7-series device showed that the ODDR primitive resulted in a 0.8 ns larger setup+hold slack window. This is because the clock signal never has to leave the dedicated clock network.

See the example code of this article for details on how to do this.

Tip 3: Write constraints before you order PCBs

Theoretical discussions about device capabilities are almost always based on datasheets, at least in the early stage of a system design. But I would argue that the ground truth to many questions come not from the datasheet but from the build tool: Can the chosen pins be used the way we want to? Will the timing window be large enough? Is the chip fast enough? Can we generate the clocks that we need? And so on.

There are a lot of caveats, corner cases, and gotchas when it comes to these questions. I believe it should be part of the PCB design review process to set up an FPGA build that utilizes the chosen pins and has at least a rough version of the constraints in place. If this build passes, you can be very confident in the choice of FPGA pinning/architecture.

Tip 4: Use flip-flop in I/O buffer

Using the flip-flop in the FPGA I/O buffer (IOB) to capture incoming data is very beneficial. The example code of this article shows how to do this. It is only possible, however, if

  1. there is no logic on the flip-flop input, and
  2. the port value goes to this flip-flop and nowhere else.

You must construct the logic of your receiver to fulfill these conditions if you want to use the IOB flip-flop. Given the restrictions above, it is possible to set an IOB constraint but then have it ignored by the placer. If using Vivado, I would recommend raising the "Place 30-722" message to severity "ERROR" and making sure your build system crashes if any "ERROR"-level messages occur. When using tsfpga, this is done automatically.

Tip 5: Phase shift either the output or capture clock

Unless your window of valid data is neatly placed around the clock edge with sufficient margin, it is quite likely that your timing will fail. It is quite common that you have to create phase-shifted clock variants that place the edge in the middle of the valid data window. The details of how to do that are beyond the scope of this article, but you can see an example in the example code of article #2.

In the case of a sink-synchronous input, you can choose to shift either the output clock or the capture clock. Remember that the constraint of the data port shall always be applied relative to the clock port. I.e. whatever is actually sent out. Depending on how your clock creation is formulated, you might have to update the constraint as you change the output clock.

Tip 6: Debug

One can always hope, but it's quite likely that your timing will not pass on the first try. It's hard to say something general about debugging I/O constraints since there are so many unique situations. But I would recommend the commands

  1. report_timing -setup -from [get_ports "my_data_port"]
  2. report_timing -hold -from [get_ports "my_data_port"]

This will show you exactly the paths your clock and data take, how they are delayed in each step, and where the requirements are violated. You can also use the "Path report" tool in Vivado, which presents the same information but a little more visually. This information, along with printouts from the constraint script, should indicate where the issue is.

Note that the "-setup" report analyzes whether your SDC max requirement is satisfied, while "-hold" analyzes SDC min. I would recommend investing some time to understand these reports; they give a lot of information that might be hard to digest, but they are very useful.

Summary

These articles are the culmination of ten years of frustration whenever I had to write I/O constraints. If you're anything like me, you've probably felt the same frustration and wished for better tutorials. I did my best to present things in a systematic and clear way here. Hopefully I managed to strike a decent balance between theoretical and practical, easy-to-digest and exhaustive.

I hope that you learned something and that it will be useful for you. If nothing else, I'm happy to have these articles as a reference for myself whenever I have to write constraints in the future.

If you enjoyed this article you will probably enjoy my article series about clock domain crossings. Please Connect or Follow me here on Linkedin so you don't miss future FPGA articles that I publish.

Thank you Lukas Vik for sharing this insightful article! I was wondering if you could also help me understand whether these concepts still apply when using an IDELAY unit, since it allows adjusting the input delay as needed. If they do, could you kindly explain how?

Ahi R.

one step at a time

4mo

Thanks for sharing, Lukas

To view or add a comment, sign in

Others also viewed

Explore topics