Rajit Manohar Cornell NYC Tech, New York, NY 10011 February 21, 2014 Technical note: AVLSI-TN-2014-1 Since the publication of papers on the topic of timing speculation, I have had to review innumerable incorrect papers on related ideas that all exhibit a lack of understanding of the problem of metastability [1]. This is not surprising because the issue is quite subtle, textbooks don't actually explain it very well, and there are a large number of published papers that make the same mistake. People that build on results from other papers don't always spend the effort on carefully verifying them (indeed if they did research would not move forward---we would put the "re" back into research). There appears to be a widely held belief that it is possible to build a "metastability detector" circuit that can respond in a bounded amount of time. When the question gets asked: "what happens if there is metastability"? it is met with the quick reply: "oh, detect it and take corrective action." The purpose of this note is to provide a simple, clear explanation as to why this is impossible--as far as we know today. I don't believe this is a new result; but the collection of papers that I went through (e.g. by Marino and others) did not state exactly what I wanted. If someone reading this can point me to a reference, that would be appreciated. It is clear that bounded time metastability resolution cannot exist if we assume classical physics; whether quantum mechanics provides an out is yet to be seen but many attempts have failed thanks to the uncertainty principle [2]. To make the argument about metastability detectors as simple as possible, I will state it by relying on an impossibility result that most researchers are aware of, and then connecting that to the metastability detector problem. A one-bit synchronizer is a circuit that has two inputs: a data input, and a clock input. The data input is sampled by the synchronizer using the positive edge of the clock. The output of the synchronizer is the sampled data value. If the clock edge arrives when the input is changing, the output of the synchronizer is permitted to be 0 or 1. Synchronizers are required on asynchronous inputs to synchronous circuits. It is known that we cannot build a failure-free circuit that has this property---there is a non-zero probability that the output will be metastable. In practice these circuits are designed to make their failure probability small enough so that the overall system-level failure rate is acceptable. (The good news is that error probability dies exponentially with the delay budgeted for the synchronizer circuit.) This brings us to our assumption---which is an assumption because we don't have a definitive answer from quantum mechanics (we know it is true in the classical case). Assumption. It is impossible to build a failure-free bounded time synchronizer. What is a metastability detector? The simplest definition I have seen is that it is a circuit with one input and one output. The output is a 0 if the input is not metastable, and 1 if the input is metastable. How does one define a bounded-time metastability detector? This question is actually more subtle than one might expect. The obvious answer is: a metastability detector that produces the correct output in a bounded amount of time. This interpretation is fraught with peril. Consider the following scenarios:
Claim. It is impossible to build a failure-free bounded delay
metastability detector circuit.
Why is that the case? Because if F is not going to be metastable, then M.o tells us in a bounded amount of time. In that case the output of the OR gate is F.o (a copy of F.i at the clock edge), which is valid because we know it is not metastable. If F is metastable, then M.o = 1 which means the output of the OR gate is 1, ignoring F.o. This is also valid, since the synchronizer can report any value when the input is changing. Hence the output of the OR gate satisfies the requirements of a synchronizer. I have omitted a lot of details to provide the basic intuition. I've also assumed that the metastability detector is fast enough that the OR gate output is produced before the next clock edge. What if M is really slow and larger than the clock period of the sampling clock? We have a basic issue here which is that the metastability detector has a lower throughput than the data it is supposed to operate on. If we assume that a "wave of input changes" can propagate through M, and they stay spaced out, then we can insert a matched delay on the other input to the OR gate and use a delayed version of the sample clock to sample the output of the OR gate when we know it is safe. If all we know is that M can only operate say once every k clock cycles, we can have k parallel detectors that operate in a phase staggered manner to keep up with the sampled data throughput. All of these are possibilities but the reality is their feasibility would depend on the specific internals of a proposed M. If we are happy with not requiring full-throughput sampling, the circuit can be easily modified with a clock enable. The clock enable goes high when a sample is needed, and cannot go high again for at least k clock cycles, where k is large enough so that k clock periods are longer than the time required by the metastability detector and OR gate. That is equivalent to requiring that the effective clock frequency in the circuit above is low enough. In summary, since we know C does not exist and we know that F and O do exist, the only conclusion to be drawn is that M does not exist. Is that so terrible? Not really, apart from causing some minor embarassment---minor because, as I noted earlier, this is a subtle issue that is rarely explained well. Is this catastrophic for design? Not really, because it is well-known that we can make the failure probability as small as we want by paying for it in latency. Failure rates below 10-50 can be achieved with practical circuits. It is just that there is a difference between zero failure and a very very small probability of failure. Note that one can reliably identify the exit from metastability [3]. The issue is that we cannot place a bound on how long that will take. A final comment. I have taken some liberties with terminology such as saying "an input is metastable." To be precise, it really means that the circuit that generated the input is in a metastable state, and the input signal samples part of that state.
I hope this note is helpful to anyone that sees it, and helps to dispell the
myth of the bounded time metastability detector circuit. I also hope some
of the discussion helps people appreciate the tricky issues involved.
© 2014 Rajit Manohar |
|
![]() |