I/O timing constraints for FPGA/ASIC #2: System-synchronous input

I/O timing constraints for FPGA/ASIC #2: System-synchronous input

6 actionable steps to get fool-proof and reliable constraints. This article aims to be as hands-on and practical as possible while also discussing the principles and theory behind reliable constraints. A full Vivado example build is included, where you can see the steps in action.

Introduction: Why are constraints needed?

Your build tool timing engine needs to know where every single flip-flop in your design can sample its data input, relative to its clock input, in order to avoid metastable values. For device-internal flip-flops, this is calculated and handled automatically. But for external interfaces, the behavior of the data line is not known by the tool, so you have to provide this information through constraints.

If you have proper constraints in place and your routed timing passes, the tool guarantees that your interface will work exactly the same on all individuals/nodes, under all conditions, and in all upcoming builds as well.

The engineers of yesteryear commonly skipped constraints, which I would not recommend, since it leaves you with no guarantees whatsoever. Expect intermittent errors and build/device/temperature-dependent behavior if you go down this foolish route.

Step 0: Make sure you are reading the correct article

This article is about system-synchronous input interfaces. Meaning situations where input data from a peripheral device is synchronous to a clock from a separate clock generator circuit (see banner picture above). Keep reading this article if your setup looks like this. Otherwise, I would recommend one of these articles:

Step 1: Understand the SDC notation

The SDC command set_input_delay expresses setup and hold requirements using a min and max value. The values can be illustrated like this:

Important to note that the command specifies the invalid data boundaries. The min value is the time after the clock edge that data might assume an invalid value. The max value is the time after the clock edge that data is guaranteed to have assumed a new valid value.

Arguments to the command are expressed in nanoseconds (ns). Values can be negative depending on the situation. It is worth noting that the meaning of min and max are NOT the same for an input constraint as for an output constraint.

Margin and pessimism

The timing constraints we apply will instruct the build tool where the valid window of data is. I.e. where it is legal for the input flip-flop to sample data in relation to the clock. In upcoming steps we will often encounter uncertainties or ranges in our values, and in order to deal with that, we have to establish the following principle:

Margin/pessimism for an input constraint means making the valid window smaller. I.e. adding to max and subtracting from min.

Step 2: Express peripheral timing using SDC

Here is the first challenge because there are quite a few ways to express digital timing, and different datasheets use different representations. You need to consult the datasheet of your peripheral device and identify how it's expressed.

If it's expressed like the SDC min/max values, then you are in luck and can move on to the next step. If not, then one of the guides below will show you how to translate it to SDC notation. Note that the very general names "Ta" and "Tb" are used since there is no consensus on what these entities should be called. Expect your datasheet to use different names.

Formulation #1: Valid window around clock

Sometimes the datasheet specifies timing using the following:

  1. Ta = time before clock edge that data is guaranteed valid.

  2. Tb = time after clock edge that data might go invalid.

This is very common and quite easy to grasp. Let's draw some more lines that will help us derive the SDC formulation:

It can be seen quite clearly that

  1. min = Tb

  2. max = Tperiod − Ta

If your datasheet specifies ranges for one or more of these, you should use the most pessimistic values. In this case, that means minimum Ta, minimum Tb, and maximum Tperiod.

Formulation #2: Valid window after clock

Other times, the datasheet might specify timing using

  1. Ta = time after clock edge that data is guaranteed valid.

  2. Tb = time after clock edge that data might go invalid.

With some helper lines we will again easily find our SDC values:

Note that the distance in the bottom-left is negative min, since data goes invalid before the clock edge. From this, we can deduce that

  1. − min = Tperiod − Tb ⇔ min = Tb − Tperiod

  2. max = Ta

If your datasheet specifies ranges for one or more of these, you should use the most pessimistic values. In this case, that means maximum Ta, minimum Tb, and maximum Tperiod.

Formulation #3: Invalid window around clock

Another possible datasheet timing formulation might be

  1. Ta = time before clock edge that data might go invalid.

  2. Tb = time after clock edge that data is guaranteed valid.

This resolves almost trivially to

  1. − min = Ta ⇔ min = − Ta

  2. max = Tb

If your datasheet specifies ranges for one or both of these, you should use the most pessimistic values. In this case, that means the maximum for both values.

Step 3: Compensate for trace delays

For discussion's sake, we can imagine four trace delay situations. First a reference situation, then with the data path elongated, and then with either clock path elongated.

Using logical reasoning, and sketches like the one above, we can infer that relative to the clock at the FPGA/ASIC input:

  1. A longer data trace moves the data window forward.

  2. A longer clock-to-FPGA/ASIC trace moves the data window backward.

  3. A longer clock-to-peripheral trace moves the data window forward.

Meaning, trace delays impact the SDC min/max values like this:

  1. min += Ttrace_data_min + Ttrace_clock_to_peripheral_min − Ttrace_clock_to_fpga_max

  2. max += Ttrace_data_max + Ttrace_clock_to_peripheral_max − Ttrace_clock_to_fpga_min

Important to note that we use the minimum/maximum range of the delays in order to apply the pessimism that makes the invalid window larger.

Finding trace delays with the PCB CAD tool

Trace delays depend on the physical properties of the trace and the dielectric around it. If you have access to the CAD files for your board, the PCB tool should give you signal delay values.

I find it suitable to add a ±10% margin to form the minimum/maximum values. This is to compensate for tool uncertainties, material/production variations, temperature gradients, etc.

Finding trace delays the hard way

In some unfortunate scenarios, we might not have access to the delay figures. Typically when using an evaluation board from a manufacturer who does not understand that constraints are necessary. If we don't have any other options, we can do a visual estimation of the trace lengths, which, along with some assumptions about signal propagation speed, gives the delay. Since we don't know the materials, I would personally assume that signal propagation speed is somewhere between 30% and 100% of the speed of light.

  1. trace_delay_min [s] = trace_length_min [m] / (1 * c [m/s])

  2. trace_delay_max [s] = trace_length_max [m] / (0.3 * c [m/s])

This approach should be a last resort, and I don't recommend it in general. It is unlikely to work for any kind of high-speed bus. If you do use it, remember to be pessimistic in the trace length range.

Step 4: Compensate for driver chips, cables, etc

If there are gate driver chips, or anything else in the clock/data paths, that needs to be taken into account also. One can not assume that the latency through parallel drivers is the same. Most driver datasheets will show a range for the latency. Use this, and the same reasoning as for trace delays above, to adapt min/max. Same thing for any cables, etc: Adapt min/max, while being as pessimistic as possible.

Devices/cables/etc on the data path or the clock-to-peripheral path impact the SDC min/max like this:

  1. min += Tdelay_min

  2. max += Tdelay_max

While devices/cables/etc on the clock-to-FPGA/ASIC path have an impact like this:

  1. min += −Tdelay_max

  2. max += −Tdelay_min

Step 5: Putting it all together

Since TCL is a full-fledged scripting language, we can formulate our calculations and constraints in a very structured and formal way. Using variables, loops, printouts, etc, we can make a script that is quite readable and avoids any magic numbers.

An example TCL constraint script for a system-synchronous input interface is available here on GitHub. It showcases the following:

  1. Converting peripheral device constraints to SDC notation.

  2. Calculating pessimistic trace delays based on rough length estimates.

  3. Looping over multiple data bits.

  4. Adjusting min/max for trace delays.

  5. Phase-shifting the capture clock using an MMCM.

All while keeping track of the unit and providing some useful printouts. This script is written for Vivado, but most of the code, and certainly the ideas behind it, should work in other tools also. Since this is a TCL file, it needs a flag when it's loaded:

read_xdc -unmanaged "<path.tcl>"

This is done automatically when using tsfpga. Some of the commands in the script might not work during synthesis, so enabling the constraint for implementation only could be a good idea (see here).

Step 6: Verification

The question of design verification is as relevant for I/O as it is for internal logic. If your peripheral can send out a test pattern, or you can run a write-read loop on it, then that is great. This enables bit-exact testing, which is of course the gold standard.

However, I would argue that a test like this is an indication but not proof that the interface is correctly constrained. It does not prove that the interface will work on all devices, under all conditions, or even in the next build. It could very well be that your timing is on the edge and will fail if the device gets hot.

This is the same situation that I've been nagging about in my article series about clock domain crossings.

A passing on-device test definitely increases the confidence in our constraints, but I think we have to do some offline activities also. For example, we can increase confidence further by:

  1. Studying the topic (reading articles such as this).

  2. Observing the actual phase differences using an oscilloscope, as far as possible.

  3. Experimenting with unreasonable constraint values and seeing the timing fail.

  4. Code review from peers.

Please let me know in the comments if you have other tools, processes, or ideas. Other than that, this was the last step and I will close out by giving some useful tips below.

Tip 1: Use a clock-capable input pin

Using a clock-capable FPGA pin for the clock input is almost a must. These pins have dedicated routing to the global clock network and direct access to conveniently placed MMCM/PLL blocks. If using AMD/Xilinx, these are called "GC" or "CC" pins.

Before deciding your pinning and manufacturing your PCB, you must consult the datasheet of your FPGA and choose appropriate pins for all clock inputs. If you use a regular data pin for your clock, the resulting FPGA-internal clock that you sample data with is going to have very bad jitter properties. This means a narrower sampling window and a higher likelihood of timing failure.

Tip 2: Write constraints before you order PCBs

Theoretical discussions about device capabilities are almost always based on datasheets, at least in the early stage of a system design. But I would argue that the ground truth to many questions come not from the datasheet but from the build tool: Can the chosen pins be used the way we want to? Will the timing window be large enough? Is the chip fast enough? Can we generate the clocks that we need? And so on.

There are a lot of caveats, corner cases, and gotchas when it comes to these questions. I believe it should be part of the PCB design review process to set up an FPGA build that utilizes the chosen pins and has at least a rough version of the constraints in place. If this build passes, you can be very confident in the choice of FPGA pinning/architecture.

Tip 3: Use flip-flop in I/O buffer

Using the flip-flop in the FPGA I/O buffer (IOB) to capture incoming data is very beneficial. The example code of this article shows how to do this. It is only possible, however, if

  1. there is no logic on the flip-flop input, and

  2. the port value goes to this flip-flop and nowhere else.

You must construct the logic of your receiver to fulfill these conditions if you want to use the IOB flip-flop. Given the restrictions above, it is possible to set an IOB constraint but then have it ignored by the placer. If using Vivado, I would recommend raising the "Place 30-722" message to severity "ERROR" and making sure your build system crashes if any "ERROR"-level messages occur. When using tsfpga, this is done automatically.

Tip 4: Phase shift the capture clock

Unless your window of valid data is neatly placed around the clock edge with sufficient margin, it is quite likely that your timing will fail. It is quite common that you have to create phase-shifted clock variants that place the edge in the middle of the valid data window. I won't go into the details of this since it is beyond the scope of the article, but you can see an example in the example code for this article.

I do want to comment on how this impacts the constraint, however: It doesn't. Your constraint of the data port shall still be applied relative to the clock port. I.e. the original incoming clock. Not the phase-shifted internal sampling clock.

Tip 5: Debug

One can always hope, but it's quite likely that your timing will not pass on the first try. It's hard to say something general about debugging I/O constraints since there are so many unique situations. But I would recommend the commands

  1. report_timing -setup -from [get_ports "my_data_port"]

  2. report_timing -hold -from [get_ports "my_data_port"]

This will show you exactly the paths your clock and data take, how they are delayed in each step, and where the requirements are violated. You can also use the "Path report" tool in Vivado, which presents the same information but a little more visually. This information, along with printouts from the constraint script, should indicate where the issue is.

Note that the "-setup" report analyzes whether your SDC max requirement is satisfied, while "-hold" analyzes SDC min. I would recommend investing some time to understand these reports; they give a lot of information that might be hard to digest, but they are very useful.

Summary

These articles are the culmination of ten years of frustration whenever I had to write I/O constraints. If you're anything like me, you've probably felt the same frustration and wished for better tutorials. I did my best to present things in a systematic and clear way here. Hopefully I managed to strike a decent balance between theoretical and practical, easy-to-digest and exhaustive.

I hope that you learned something and that it will be useful for you. If nothing else, I'm happy to have these articles as a reference for myself whenever I have to write constraints in the future.

If you enjoyed this article you will probably enjoy my article series about clock domain crossings. Please Connect or Follow me here on Linkedin so you don't miss future FPGA articles that I publish.

Sebastian Hellgren

FPGA expert with a passion for testing and continous integration

5mo

Nice write up. Saved as bookmark. As you mentioned, the hardest part which actually requires some mental gymnastics is the suppliers' data sheet. It seems to me like every other manufacturer of for example ADCs has their own notation of data valid window, and often I have had to consult multiple colleaugues on how to interpret the numbers. 😆

Rezwanur Rahman, PhD

Principal Digital Hardware Engineer

5mo

Excellent!

Debayan Paul

360° with FPGAs(SRAM & Flash based) used in embedded devices with domain knowledge in automotive data loggers and data acquisition systems.

5mo

A typical RMII interface would look the block diagram shown, where the Peripheral Device can be the Ethernet PHY.

Joseph (Yousef) Mahdian

FPGA & Embedded Systems Engineer | VHDL & Embedded C | Zynq SoC/MPSoC/STM32

5mo

Very informative. I know many developers who are a mixture of ignorance and laziness. They're ignorant to the consequences of ignoring timing constraints and the importance of that in serious projects. And when you tell them, they're too lazy to learn it.

To view or add a comment, sign in

Others also viewed

Explore topics