# VARIABILITY OF MULTISTAGE Synchronizers

Salomon Beer, Member, IEEE, Jerome Cox, Fellow, IEEE, Ran Ginosar, Senior Member, IEEE, Tom Chaney, Life Member, IEEE, David M. Zar, Member, IEEE

Abstract— System on Chip (SoC) designs typically employ multiple clock domains to interface several externally clocked circuits operating at different frequencies, and to reduce power and area by breaking large clock trees into multiple small ones. The principal challenge of such Globally Asynchronous Locally Synchronous (GALS) architectures is the need to reliably communicate between the different clock domains. To achieve high reliability margins in modern process technologies, multistage synchronizers are often used. In this work we develop analytical formulae to calculate the probability of failure and the number of stages to use in such synchronizers. This work compares the model developed to previous publications and shows that while most of the existing models overestimate MTBF, some models overestimate it. The model developed here calculates an MTBF lower bound with significantly smaller margins. The concept of an effective resolution time-constant for multistage synchronizers is introduced and the important effects of clock duty cycle and process variability are addressed. These process variability effects can be minimized by use of simple design rules for the synchronizer. For safety-critical applications, calculation of the probability of a failure-free lifetime for all products in a production run is developed and a simple lower bound is derived.

## *Index Terms*—Metastability, MTBF, multistage synchronizers, synchronization, synchronizer, tau effective.

### I. INTRODUCTION

THE SYSTEM ON CHIP (SOC) designer who wishes to use a synchronizer from a standard cell library would like to know the MTBF (Mean Time Between Failures) of the system including the synchronizer before design signoff. This knowledge is increasingly valuable in nanoscale SoC designs because several factors have emerged that jeopardize the reliability of synchronizers. In particular, the number of synchronizers in a design is growing rapidly; the variability of semiconductor parameters is troubling as is the sensitivity to operational conditions.

Prediction of *MTBF* in clock-domain-crossing (CDC) scenarios (Figure 1) depends on a variety of parameters. Some of these parameters are extrinsic; they are related to how a synchronizer is used in the application at hand. For example, the clock frequency  $f_c$  (1/*T*), rate of data transitions  $f_D$ , clock duty cycle  $\alpha$  and the number of stages in the synchronizer *N* are all parameters related to the application. Note the input flip-flop in the receiving clock domain (Figure 1); there are *N*+1 flip-flops in total, allocating *N* clock cycles for metastability resolution [1].

Other essential parameters are related to synchronizer intrinsic characteristics. The most important of these are the resolution time-constants  $\tau_i$  of the synchronizer's bi-stable stages,  $i = 1, 2 \dots N$ . Also important is the aperture width  $T_W$ . These parameters must be determined by physical

measurement, or by circuit simulation. They are strongly dependent on the characteristics of the semiconductor process and the synchronizer operating conditions, such as supply voltage and temperature.



Figure 1. A typical multistage synchronizer

Finding values for all of these parameters and determining their influence on *MTBF* is challenging. Physical measurement of synchronizer characteristics is usually limited to the very first stage [2][3][4], because of the unbounded time required to carry out measurements on later synchronizer stages. Reliable simulation of the entire synchronizer is now possible, however, due to state of the art simulation methods[14], and has been validated against first stage measurements[15]. Thus, the overall *MTBF* of a multistage synchronizer can be evaluated by simulation for a selected set of extrinsic and intrinsic parameters.

Due to long compute time, it is desirable to avoid simulating for a large variety of extrinsic parameter combinations. To discern the contribution of each parameter, we seek a formula that calculates *MTBF* for an arbitrary set of extrinsic parameters, and is based on the set of intrinsic parameters determined from simulations. This approach would still require simulations for each synchronizer circuit, for each transistor model and for each set of operating conditions, but the variations in results arising from changes in extrinsic parameters can be dealt with analytically. Another reason for the importance of an accurate analytical expression is that currently available formulae provide pessimistic lower bounds on the *MTBF*. The result is a relatively large increase in latency due to unneeded synchronizer stages that degrade the overall performance.

Separation of extrinsic and intrinsic parameters has substantial advantages for both the synchronizer circuit designer and the SoC designer. In today's silicon IP marketplace these roles are likely to be performed by different individuals who may work for different organizations. Because of the trend toward developing synchronizers as specialized standard cells, only the cell designer may have access to the semiconductor process models necessary to support estimation of the intrinsic parameters of a synchronizer cell. Similarly, extrinsic parameters depend on the application and are decided by the system integrator or SoC designer. This work develops a formula that separates intrinsic and extrinsic parameters and enables *MTBF* estimation in multistage synchronizers. The formula is an intuitive expression for *MTBF* that the SoC designer will find easier to use than most published methods. Section II provides a survey of previously published *MTBF* formulae for multistage synchronizers. In Section III we develop a novel formula for multistage *MTBF* and introduce the concepts of  $\tau_{eff}$  and  $T_W(N)$ , an effective resolution time-constant and an effective aperture width. Section IV provides a discussion of the model and an analysis of the effects of process variability on reliability and Section V is a comparison of two synchronizer form-factors based on the model. Section VI shows simulations that confirm the derived formulae followed by conclusions. In appendix A we provide proofs of derivations shown in section IV.A and IV.B while appendix B demonstrates formulae used in IV.C.

#### II. RELATED WORK

Several *MTBF* models have been explored since the discovery of the metastability effect [5] Table I shows a summary of published formulae for multistage *MTBF* calculation.

The column *Formula in work* presents the *MTBF* formula as it appears in each publication, the *Unified model* column uses a standardized nomenclature in order to compare the expressions more easily. In [6] the term  $t_s^S$  represents the average position of the metastability window in the slave input. In [7],  $t_{su}$  represents the setup time of the latches used in the flip-flops (FFs). In [10]  $\Delta t_{in_j}(T_{s_j})$  represents the data-clock separation at the input of stage *j* that generates a resolution time of  $T_{s_j}$  at its output. In [13]  $T_j^W$  and  $\tau_j$  represent the aperture width and the resolution time-constant of stage *j*.

In [9],[11] and [12] the MTBF of N + 1 flip-flop stages is proportional to waiting NT cycles for metastability resolution. Such a proposition assumes that all flip-flops concatenated are exactly the same, an assumption that fails on modern technology due to the high variability in the fabrication process. On the other hand the formulae assume that the effect of N + 1flip-flops is equivalent to a resolution time of NT. While this assumption is convenient and greatly simplifies the equation, we demonstrate in the following sections that it leads to high inaccuracy. To account for that inaccuracy, formulae [5]-[8] and [11] subtract either one or both the propagation delay and the setup time of each flip-flop in the chain from the resolution time NT. Even though this provides a better estimate compared to the basic formula [12] it still represents a heuristic correction of the model providing loose bounds. On the other hand, [6], [7], and [13] predict  $T_W$  has an exponential relation with N. In all the surveyed papers except [10] and [13], the flipflops in the synchronizer were taken to be identical, and no differentiation has been made between the master and slave latches composing the flip-flops. Formulae [6],[10] and [13] provide higher accuracy compared to the others but their usage is non-trivial since several independent simulations are needed to estimate  $t_s^S$ ,  $\tau_I$ ,  $\Delta t_{in_j}(T_{s_j})$  and  $T_j^W$  for each stage. The accuracy obtained by these formulae can be traded for the ease of calculation in [9] and [12]. The influence of clock duty cycle and the effect of process variability on the flip-flops in the synchronizer is not discussed in any of the surveyed formulae.

TABLE I Summary of existing multistage synchronizers MTBF models

| SUMMARY OF EXISTING MULTISTAGE SYNCHRONIZERS MIDF MODELS |                   |                                                                                                                                                         |                                                                                                                                                           |
|----------------------------------------------------------|-------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------|
| Ref                                                      | YEAR              | Formula in work<br>(MTBF)                                                                                                                               | UNIFIED MODEL<br>(MTBF)                                                                                                                                   |
| [5]                                                      | 1987              | $\frac{e^{\frac{NT_{C}-(N-1)t_{p}}{\tau}}}{\lambda T_{o}}$                                                                                              | $\frac{e^{\frac{NT-(N-1)t_{pd}}{\tau}}}{T_W f_C f_d}$                                                                                                     |
| [6]                                                      | 1992              | $\left[\alpha_D f^{N+1} \left(\frac{\tau_I e^{-1/f\tau} e^{2t_s^S/\tau}}{\tau}\right)^N\right]^{-1}$                                                    | $\tau^N \frac{e^{\frac{NT-2Nt_s^S}{\tau}}}{T_W^{2N} f_c^N f_d}$                                                                                           |
| [7]                                                      | 1997              | $\frac{e^{\frac{t_r - \frac{1}{f_c} - t_{su}}{\tau}}}{T_W^2 f_c^2 f_d}$                                                                                 | $\frac{e^{\frac{2T-t_{pd}-t_{su}}{\tau}}}{T_W^2 f_c^2 f_d}$                                                                                               |
| [8]                                                      | 2003              | $\frac{e^{\frac{N(T-t_{pd})}{\tau}}}{\lambda T_o/T}$                                                                                                    | $\frac{e^{\frac{N(T-t_{pd})}{\tau}}}{T_W f_c f_d}$                                                                                                        |
| [9]                                                      | 2007              | $\frac{e^{\frac{NT}{\tau}}}{T_W f_c f_d}$                                                                                                               | $\frac{e^{\frac{NT}{\tau}}}{T_w f_c f_d}$                                                                                                                 |
| [10]                                                     | 2009 <sup>a</sup> | $\overline{T_W f_c f_d}^{-1} \left[ f_c f_d \Delta t_{in,N}(T_{s,N}) \prod_{j=1}^{N-1} \frac{\Delta t_{in,j}(T_{s,j})}{\tau_j} \right]^{-1}$            |                                                                                                                                                           |
| [11]                                                     | 2010              | $\frac{e^{\frac{\sum_{i=1}^{N}T_{met,i}}{C_2}}}{c_1F_cF_d}$                                                                                             | $\frac{e^{\frac{N(T-t_{pd})}{\tau}}}{T_W f_c f_d}$                                                                                                        |
| [12]                                                     | 2011              | $\frac{e^{\frac{NT_c}{\tau}}}{T_W F_c F_d}$                                                                                                             | $\frac{e^{\frac{NT}{\tau}}}{T_w f_c f_d}$                                                                                                                 |
| [13]                                                     | 2012              | $\left(\prod_{j=1}^{N-1} \frac{\tau_j e^{\sum_{i=1}^N \frac{T_i^S}{\tau_i}}}{T_j^W}\right) \frac{e^{\sum_{i=1}^N \frac{T_i^S}{\tau_i}}}{F_c F_d T_N^W}$ | $\left(\prod_{j=1}^{N-1} \frac{\tau_j e^{NT \sum_{i=1}^{N} \frac{1}{\tau_i}}}{T_W^j}\right) \frac{e^{NT \sum_{i=1}^{N} \frac{1}{\tau_i}}}{f_c f_d T_W^N}$ |
|                                                          |                   |                                                                                                                                                         |                                                                                                                                                           |

<sup>a</sup> Original formula in paper was for N = 4 latches. Result can be extended for N latches.

#### III. MODEL

We start by analyzing a master-slave flip-flop and then extend the results to a chain of an arbitrary number of flipflops in the next sub-section.



Figure 2. Master-slave circuit

#### A. Master-Slave Analysis

The circuit shown in Figure 2 is used throughout this paper. The master and slave regenerating inverter pairs are within the dashed lines. The master latch is transparent when the clock (C) is low and captures the data (D) when C goes high. The slave latch is transparent when C is high so the captured D appears at  $Q_s$  a clock-to-Q delay later ( $t_{pd}$ ). When C falls, the state of the

master is captured by the slave. If, however, D changes during a window of vulnerability near the rising edge of the clock C,  $Q_S$  may fail to be a valid voltage at the next rising edge of C. This presents a metastability hazard and a possible system failure. Failure may occur when  $Q_S$  is not a valid voltage (in the excluded range in Figure 3,  $V_{Q_S} \in (V_{IL}, V_{IH})$ ). If  $Q_S$  is delivered to multiple flip-flops, some may register a high and others a low logic level. Although all of these flip-flops may each have valid outputs, a system failure may occur because an illegal system state may exist if not all versions of  $Q_S$  are the same.

Figure 3 shows a simulation of a master-slave synchronizer flip-flop exhibiting metastability. In this simulation, D changed close to C causing metastability at  $Q_M$ .  $Q_M$  is changing near the falling edge of C causing metastability at  $Q_S$ .



Figure 3. Simulation of metastable nodes in a master-slave synchronizer



Figure 4. Timing diagram of a master-slave synchronizer

Figure 4 shows timing diagrams of the outputs of the master and the slave during metastability. The timing diagram shows only the resolution of the outputs, but is useful as an introduction to the theory developed in this section. This theory disregards second order effects such as latch propagation delays, realistic rise and fall times, inter-stage delays, nonlinear effects, setup-time delays and the effects of noise. These realities are addressed in sub-section D where it is shown that these simplifications incur no loss in generality. In the top case in Figure 4, for a data-clock offset  $\delta$  in the red vulnerability window for D, the output  $Q_M$  resolves at a time near or past the falling edge of C. Specifically,  $Q_M$  resolves high for  $\delta > \delta_M$  and low for  $\delta < \delta_M$  as shown by the arrows. In the bottom case, the narrower window of vulnerability causes the output  $Q_S$  to resolve near or past the next rising edge of C. As above,  $Q_S$  resolves high for  $\delta > \delta_S$  and low for  $\delta < \delta_S$ . In this case, when  $Q_{\rm S}$  is still metastable at the next rising edge of the clock, a synchronizing error for the complete flip-flop may occur. The precise data-clock offsets,  $\delta_M$  and  $\delta_S$ , are the theoretical values that would produce indefinite metastability in the master and the slave, respectively, and their values are not necessarily the same.

There are two significant observations associated with Figure 3 and Figure 4. One is that while the clock is high, the resolving behavior at  $Q_M$  is a function of  $\tau_M$ , the master resolution time-constant, and while the clock is low,  $Q_S$  is a function of  $\tau_S$ , the slave resolution time-constant. The second observation is that if  $Q_M$  is changing within the vulnerability window for the slave latch as the clock goes low, metastable behavior at  $Q_S$  will ensue.

Three voltage constants and two voltage functions are defined in the analysis of the master-slave chain:

| $V_{Q_M}(t,\delta)$ | Voltage at $\boldsymbol{Q}_{\boldsymbol{M}}$ , a function of time <i>t</i> and offset                                                                                            |  |
|---------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| $V_{Q_S}(t,\delta)$ | Voltage at $Q_S$ , a function of time t and offset $\delta$                                                                                                                      |  |
| $V_m(M)$            | Metastable voltage at $Q_M$ , generated by time-offset $\delta_M$<br>$(V_m(M) = \lim_{t \to \infty} V_{QM}(t, \delta_M))$                                                        |  |
| $V_m(S)$            | $V_m(S) \qquad \begin{array}{l} \text{Metastable voltage at } Q_S, \text{ generated by time-offset } \delta_S \\ (V_m(S) = \lim_{t \to \infty} V_{QS}(t, \delta_S)) \end{array}$ |  |
| $V_{v}(M)$          | Vulnerability voltage at $Q_M$ , causes slave metastability                                                                                                                      |  |

Due to noise, perfectly constant metastable voltages  $V_m(M)$ and  $V_m(S)$ , are not physically achievable, but the idea does define the line of separation, or separatrix, between the highresolving and low-resolving outputs of a latch. As shown in Figure 4, the vulnerability window around  $\delta_M$  is wider than that around  $\delta_S$  and this wider window contains the narrower window. Also,  $\delta_M$  is always within the wider window and  $\delta_S$  is always within the narrower window.

With these definitions, let the origin of time (t = 0) be at the first rising clock edge, T be the clock period and  $\alpha$  be the fraction of T for which the clock is high. After the normal propagation time  $t_{pd}$ , before  $\alpha T$  and near metastability, the master output at  $Q_M$  is linear and for small variations away from  $V_m(M)$  the behavior of  $V_{Q_M}(t, \delta)$  is given, for  $t > t_{pd}$ , by the equation:

$$V_{Q_M}(t,\delta) - V_m(M) = G_{tv} \exp\left(\frac{t}{\tau_M}\right) (\delta - \delta_M) \tag{1}$$

Here,  $\delta$  is the data-clock offset in time and  $\delta_M$  is the particular offset that produces an indefinitely long period of metastability of the master, meaning  $V_{Q_M} = V_m(M)$ . Near metastability we assume linearity, which means that all voltage and current values are continuous and that the relevant circuit parameters, such as  $\tau_S$  and  $\tau_M$ , are constant. Therefore, the circuit can be modeled by a set of linear ordinary differential equations. There must be at least one positive root of the associated characteristic equation if there is to be regeneration and the resulting growing exponential behavior. Assume the solution associated with the largest positive root characterizes the eventual circuit behavior, and solutions associated with other roots are neglected. The coefficient  $G_{tv}$  of this exponential solution is the time-to-voltage gain through the circuit from the data-clock offset  $\delta$  to the node  $Q_M$  and has the units V/sec. The value of  $G_{tv}$  depends on the origin of time and we define it at the midpoint of the rising clock edge, for convenience. This convention implies that (1) is invalid for  $t < t_{pd}$ . The fact that Near the falling clock edge and for a data-clock offset  $\delta_s$  at the input to the master, there will be a critical voltage  $V_{Q_M}(\alpha T, \delta_s) = V_v(M)$  at the input to the slave that causes marginal triggering of the slave. This vulnerability voltage,  $V_v(M)$ , becomes significant some time before the falling clock edge at  $\alpha T$ , causing the output of the slave, after  $t_{pd}$ , to reside at  $V_m(S)$  indefinitely. Thus,  $V_m(S)$  is the slave separatrix between high and low resolving traces. Assume the setup time is negligible so that an expression similar to (1) for  $V_{Q_S}(t, \delta)$  for  $t \in (\alpha T + t_{pd}, T)$  can be written:

$$V_{Q_S}(t,\delta) - V_m(S) = G_{vv} \exp\left(\frac{t-\alpha T}{\tau_S}\right) \left(V_{Q_M}(\alpha T,\delta) - V_v(M)\right)$$
(2)

Later, in Section D, we justify how the non-negligible setup time can be covered in this analysis. Linearity of the slave circuit near  $V_m(S)$  is used to establish the linearity of (2). The coefficient  $G_{vv}$  is a voltage-to-voltage gain between the slave input and the node  $Q_S$ . Combining (2) and (1) for  $t \in (\alpha T + t_{pd}, T)$  yields:

$$V_{Q_S}(t,\delta) - V_m(S) = G_{vv} e^{\left(\frac{t-\alpha T}{\tau_S}\right)} \left[ V_m(M) + G_{tv} e^{\left(\frac{\alpha T}{\tau_M}\right)} (\delta - \delta_M) - V_v(M) \right]$$
(3)

After  $t_{pd}$ , the data-clock offset  $\delta_S$  leads to indefinite metastability in the slave and a constant slave output  $V_m(S)$ . To make (3) independent of time during metastability, the value of  $\delta_S$  must be such that the bracketed expression in (3) vanishes:

$$V_m(M) + G_{tv} exp\left(\frac{\alpha T}{\tau_M}\right) (\delta_S - \delta_M) - V_v(M) = 0$$
<sup>(4)</sup>

Subtracting (4) from the bracketed expression in (3) and evaluating at t = T yields:

$$V_{Q_S}(T,\delta) - V_m(S) = G_{tv}G_{vv}\exp\left(\frac{\alpha T}{\tau_M} + \frac{(1-\alpha)T}{\tau_S}\right)(\delta - \delta_S) \quad (5)$$

From (5), we define  $\delta_+$  as the clock-data separation that yields the voltage  $V_{Q_S}(T, \delta_+) = V_{IH}$  at time t = T. Likewise, define  $\delta_-$  so that  $V_{Q_S}(T, \delta_-) = V_{IL}$ . It is then possible to calculate the vulnerability window within which a data-clock offset  $\delta$  will produce an invalid output,

$$\delta_{+} - \delta_{-} = \underbrace{\frac{V_{b}}{\left(V_{Q_{S}}(T,\delta_{+}) - V_{Q_{S}}(T,\delta_{-})\right)}}_{G_{tv}G_{vv}} \underbrace{\exp\left(-\frac{\alpha T}{\tau_{M}} - \frac{(1-\alpha)T}{\tau_{S}}\right)}_{G_{tv}G_{vv}} \tag{6}$$

The coefficient  $V_b$  defines the voltage difference between borderline valid voltages at the output of the second latch. Only between these voltages will the slave cause marginal triggering of any following flip-flops. Note that  $V_b/G_{vv}$  is the voltage window of vulnerability at the input to the slave.

For a uniform distribution of data-clock offsets  $\delta$  over the clock period *T*, the probability of failure is bounded by:

$$\Pr(fail) \le \frac{\delta_+ - \delta_-}{T} \tag{7}$$

All data-clock offsets inside the metastability window ( $\delta_+ - \delta_-$ ) will generate traces with voltages within an output window

whose size is  $V_b$  at t = T and hence are prone to produce metastability in following stages. Since the details of the next stages may be unknown, not all traces in this window will actually produce metastability in a following stage. Hence, the inequality represents an upper bound on the failure probability. (For now, we assume the availability of a full clock period of resolution time. Logic delays, multiple destinations or long wires may interfere with that assumption and such circumstances are addressed in section D.)

From (7), with a data transition rate  $f_D$ , the *MTBF* is:

$$\text{MTBF} = \frac{1}{\Pr(fail) f_D} \ge \frac{T}{(\delta_+ - \delta_-)f_D} = \frac{G_{tv}G_{vv}T}{V_b f_D} \exp\left(\frac{\alpha T}{\tau_M} + \frac{(1-\alpha)T}{\tau_S}\right)$$
(8)

To make (8) resemble the familiar formula for *MTBF* of a single latch, we define an effective resolution time-constant:

$$\tau_{eff} = \left(\frac{\alpha}{\tau_M} + \frac{(1-\alpha)}{\tau_S}\right)^{-1} \tag{9}$$

The lower bound on the *MTBF* of a master-slave flip-flop (8) then becomes:

$$MTBF \ge \frac{G_{tv}G_{vv}T}{V_b f_D} \exp\left(\frac{T}{\tau_{eff}}\right)$$
(10)

#### B. N Concatenated Flip-flops

Eq. (10) provides the lower bound on the *MTBF* of a single master-slave flip-flop. To extend this result to a chain of *N* flip-flops, the process described in (1) to (8) for a master-slave can be repeated multiple times. Each flip-flop after the first one aggregates an additional factor  $G(i) = G_{vv}^M(i)G_{vv}^S(i)$  and an additional term in the exponent. The general equation for the *MTBF* for *N* flip-flops becomes:

$$MTBF(N) = \frac{T \cdot \prod_{i=1}^{N} G(i)}{V_b(N) f_D} \exp\left(\sum_{i=1}^{N} \frac{T}{\tau_{eff}(i)}\right)$$
(11)

where  $G(1) = G_{tv}G_{vv}$  and  $G(i) = G_{vv}^{M}(i)G_{vv}^{S}(i)$  for i > 1;  $\tau_{eff}(i)$  is the effective resolution time-constant for the *i*<sup>th</sup> flipflop and  $V_b(N)$  is the borderline voltage range for the last flipflop. Define  $G_{tv}^*(N) = \prod_{i=1}^N G(i)$  as the overall time-to-voltage gain from the D input of the first flip-flop to the Q output of the  $N^{th}$  and last flip-flop. When all flip-flops are identical, G(i) =G(2) for i > 1 and  $G_{tv}^*(N)$  is given by:

$$G_{tv}^*(N) = G_{tv}G_{vv}(G_{vv}^M G_{vv}^S)^{N-1} = G(1)(G(2))^{N-1}$$
(12)

We also define an overall effective resolution time-constant  $\tau_N$  by:

$$\frac{1}{\tau_N} = \frac{1}{N} \sum_{i=1}^{N} \frac{1}{\tau_{eff}(i)}$$
(13)

The combination of (12) and (13) with (11) gives a familiar bound on the *MTBF* of an *N*-flip-flop chain

$$MTBF(N) \ge \frac{G_{tv}^{*}(N)T}{V_{b}(N)f_{D}} \exp\left(\frac{NT}{\tau_{N}}\right)$$
(14)

Often,  $G_{tv}$  and  $V_b$  are lumped together in a single constant  $T_W(N) = V_b(N)/G_{tv}^*(N)$ , that has dimensions of time. Using this simplification, and the clock frequency  $f_c = 1/T$ , we obtain:

$$MTBF(N) \ge \frac{1}{T_W(N)f_D f_C} \exp\left(\frac{NT}{\tau_N}\right)$$
(15)

### C. Multistage Synchronizer with Identical Stages

If all *N* flip-flops of a synchronizer standard cell have identical characteristics, (15) can be evaluated in a straightforward manner. In this case, from (9) and (13) it can be shown that  $\tau_N = \tau_{eff} = \left(\frac{\alpha}{\tau_M} + \frac{(1-\alpha)}{\tau_S}\right)^{-1}$ . Both  $\tau_M$  and  $\tau_S$  can be found using simulation methods such as in [3][14].

The value of  $T_W(N)$  can be calculated from  $G_{tv}^*(N)$  and  $V_b(N)$  or simulated directly using

$$T_W(N) = \frac{V_b(N)}{G_{tv}^*(N)} = (\delta_+ - \delta_-) \exp\left(\frac{NT}{\tau_{eff}}\right)_-$$
(16)

Here,  $\delta_+$  and  $\delta_-$  are those values of data-clock offset that just reach  $V_{IH}$  and  $V_{IL}$ , respectively. For identical stages,  $V_b$  is independent of the value of N and we may combine (12) and (16) to obtain a recurrence relation for  $T_W(N)$ :

$$T_W(N) = \frac{V_b}{G(1)G(2)^{N-1}} = \frac{T_W(N-1)}{G(2)}$$
(17)

If the standard-cell vendor characterizes the synchronizer flip-flops and provides the parameters  $T_W(1)$ ,  $T_W(2)$ ,  $\tau_M$  and  $\tau_S$ , all the terms in (17) are then available to the SoC designer for the estimation of *MTBF*. The parameters *N*,  $\alpha$ ,  $f_D$  and  $f_C$  come from the application. The effective resolution time-constant  $\tau_{eff}$  can be calculated from (9) given  $\alpha$ ,  $\tau_M$  and  $\tau_S$ . Thus, (15) cleanly separates extrinsic and intrinsic parameters. This approach disentangles the design of the logic inside the synchronous clock domain from the design of the synchronizer.

### D. Model Assumptions

As shown in Figure 4, when metastability spans multiple stages, each latch may be metastable for almost half of a clock period. During the first half of the period, the voltage at the master output grows with the resolution-time constant  $\tau_M$  and during the last half period, the slave output grows with  $\tau_s$ . This exponential behavior is repeated for each succeeding pair of latches throughout a multistage synchronizer, but delayed by the partial period between clock edges as metastability flows from latch to latch. Circuit simulation can identify the parameters associated with the *i*<sup>th</sup> clock period so that G(i) and  $\tau_{eff}(i)$  can be evaluated. From (6) the difference between slave-output voltage that resolves high and that resolves low is

$$V_{Qs}(t,\delta_{+}(i)) - V_{Qs}(t,\delta_{-}(i)) = G_{tv}^{*}(i) \left(\delta_{+}(i) - \delta_{-}(i)\right) e^{\left(\frac{\alpha T}{\tau_{M}(i)} + \frac{t - \alpha T}{\tau_{S}(i)}\right)}$$
(18)

Here the resolution time *t* is the same for the traces resolving high as those resolving low for the clock-data offsets  $\delta_+(i)$  and  $\delta_-(i)$ , respectively. If we sample the voltage at  $t = T - \varepsilon$ , instead of at t = T, the result is equivalent to multiplying (18) by a factor  $\beta = \exp\left(\frac{-\varepsilon}{\tau_{\mathcal{S}}(i)}\right)$  which can be incorporated into the

During the normal propagation time following a clock edge, there will be substantial transients. In our analysis, however, we are interested in the synchronizer's behavior during metastability, behavior that can be adequately characterized by four intrinsic parameters:  $T_W(1)$ ,  $T_W(2)$ ,  $\tau_M$  and  $\tau_S$ . By determining these parameters through simulation, we include the effects of all nonlinear transients on the following metastable epoch, but do so only implicitly. However, these nonlinear transients are explicitly included in the simulations that yield the four intrinsic parameter estimates.

Simplifications about signal edges were made in the derivation of (15). For example, realistic clock edges will have non-zero rise and fall times. However, they can be modeled by a zero-rise or fall time edge that is slightly shifted in time. This observation introduces a small variation in timing of the various clock edges, but because of the argument associated with (18) this variation does not change the general character of the result. Similarly, the setup time, preceding the falling clock edge at  $\alpha T$  in (2), only changes the multiplicative coefficient. In both cases, the simulation discovers the modified coefficients  $T_W(1)$  and  $T_W(2)$  so that (15) gives a tight bound on *MTBF*.

It is important to note that most prior work described in section II considers logic gate delay to be time "lost" to synchronization and is deducted from the exponent in MTBF. In fact, these gates do contribute some gain and contribute to the overall gain-bandwidth product of the synchronizer. Neglecting these contributions causes these models to underestimate MTBF significantly.

There may be multiple exponentially decaying solutions to the linear differential equations modeling the metastable behavior of the master-slave synchronizer. These transients are not modeled in the above equations, but their effects can be largely removed from simulation by techniques for handling common-mode effects [9]. Since the metastable voltage is reached after those transient effects have ended, the clock period should be constrained to be greater than some minimum value in order to provide sufficient time for the metastable condition to develop.

A last note regarding model assumption, concerns the last stage flip-flop in Figure 1, the one that is contained within the block labelled "Synchronous clock domain." This flip-flop serves two purposes: 1) a known electrical load for the  $N^{th}$ stage flip-flop and 2) confinement of any logic delay within the synchronous clock domain. The known electrical load is needed in Section C above in order to obtain a uniform characterization of all N stages. Confinement of any delay associated with interstage logic is needed to avoid any compromise of the available resolution time of the N<sup>th</sup> stage flip-flop. This flip-flop could be viewed as stage N + 1 of the synchronizer, but since the inter-stage logic delay is unknown to the synchronizer designer, its contribution to the available resolution time NT must be assumed negligible. We therefore prefer to place it outside the synchronizer boundary cleanly separating extrinsic and intrinsic parameters.

#### IV. Effective $\tau$

We start this section with an analysis of the effect of duty cycle variations on the effective resolution time-constant for a master-slave synchronizer  $\tau_{eff}$  (9) and for a multistage synchronizer  $\tau_N$  (13). Variations in process parameters also give rise to deviations in these resolution time-constants. Finally, in this section, we determine a useful measure of the risk of synchronizer failure in the presence of these random parameter variations.

### A. Dependence on Clock Duty Cycle

Figure 5 and Figure 6 shows results for  $\tau_{eff}$  as a function of duty cycle (9). When the resolution time-constant for master and the slave latch are equal ( $\tau_M = \tau_S = \tau$ ), then  $\tau_{eff} = \tau$ . When there is a big mismatch between  $\tau_M$  and  $\tau_S$ , the behavior of  $\tau_{eff}$  is highly dependent on the duty cycle of the circuit. For duty cycles  $\alpha$  ranging from 0 to 1,  $\tau_{eff}$  changes non-linearly from  $\tau_S$  to  $\tau_M$ . Different values of  $\tau_S$  and  $\tau_M$  may arise from intra-die process variations [14]-[18]. A significant mismatch between  $\tau_S$  and  $\tau_M$  can lead to considerable variation in  $\tau_{eff}$  with respect to duty cycle. However, for small differences the change of  $\tau_{eff}$  with respect to duty cycle is nearly linear.



Figure 5.  $\tau_{eff}$  vs. duty cycle



Following (13),  $\tau_N$  is the harmonic mean of  $\tau_{eff}(i)$ . It can be observed (Appendix A) that the function  $\tau_N(\alpha)$  is also monotonic for any values of  $\tau_{eff}(i)$ . A known property of the harmonic mean is that it is always lower than the arithmetic mean and often close to the minimum value; hence  $\tau_N$  will be lower than the average  $\tau_{eff}(i)$  of the constituent flip-flop stages. In Figure 7a,  $\tau_{N=2}(\alpha)$  for two concatenated flip-flops (FF1:  $\tau_M = 20psec, \tau_S = 100psec$ , FF2: = 100psec, = 20*psec*) is shown.  $\tau_{eff}(1)$ ,  $\tau_{eff}(2)$  are also shown to demonstrate the symmetric behavior of the resolution timeconstants of the constituents. Even though  $\tau_{eff}(i)$  of the two flip-flops vary with  $\alpha$ , the resulting  $\tau_2$  is constant for every duty cycle, and is lower than the average  $(\tau_{eff}(1) + \tau_{eff}(2))/2$ . Figure 7b also shows a two flip-flop synchronizer (FF1:  $\tau_M =$ 85psec,  $\tau_s = 112psec$ , FF2:  $\tau_M = 105psec$ ,  $\tau_s = 65psec$ ), but without the symmetry. The resulting  $\tau_{N=2}(\alpha)$  is monotonic and lower than the average, as expected. Figure 7c shows  $\tau_{N=5}(\alpha)$  for a pipeline of five flip-flops, where the resolution time-constants,  $\tau_M(i), \tau_S(i), i \in \{1, ..., 5\}$ , for each flip-flop were drawn from a random sample with distribution  $\mathcal{N}(\mu =$ 100,  $\sigma = 20$ ). Note how the resulting time-constant is monotonic and does not demonstrate any local optimum along the duty cycle axis, in spite of the large number of flip-flops. Below we show that different copies of the same circuit may demonstrate either increasing or decreasing resolution timeconstant as a function of the duty cycle. Without any further



Figure 7. Effective  $\tau$  different multistage synchronizers. (a) two flip-flop synchronizer with constant  $\tau_2$  (b) two flip-flop synchronizer with non constant  $\tau_2$  (c) five flip-flop synchronizer.

knowledge, it appears that 50% duty cycle would be a good engineering choice; when it is known whether  $\tau_s$  is larger (or smaller) than  $\tau_M$ , other values of duty cycle may be preferred.

### B. Variability in Resolution Time-Constants

In this sub-section, we analyze the impact on  $\tau_{eff}$  and  $\tau_N$  of random process variations. Modern *VLSI* technologies have variations in process parameters that may be high in submicron technologies [17]. In-die variability, the variability between adjacent transistors in the same die, may exceed 50% in 40 nm technologies and below [18]

When a multistage synchronizer is designed, the common practice is to reproduce the same flip-flop several times to create a homogeneous pipeline synchronizer. Even when these stages are designed identically, there will exist a mismatch in their resolution time-constants after fabrication (and possibly also because of voltage and temperature variations). This mismatch affects  $\tau_N$  and hence the *MTBF*. Noise is manifested in clock trees as jitter in the timing of clock edges, producing random fluctuations in the duty cycle. We start our analysis by assuming normal distributions of the form  $\mathcal{N}(\mu_{\alpha}, \sigma_{\alpha}^2)$ ,  $\mathcal{N}(\mu_M, \sigma_M^2)$  and  $\mathcal{N}(\mu_S, \sigma_S^2)$  for  $\alpha(i)$ ,  $\tau_M(i)$  and  $\tau_S(i)$ , respectively, in stage *i* of the synchronizer. We further assume that these random variables are mutually independent and are identically distributed.

In the general case, the probability distribution of  $\tau_{eff}$ , calculated from the weighted harmonic mean of  $\tau_M$  and  $\tau_S$  presents a strongly asymmetric and bimodal character as a consequence of the reciprocal transformation. The result is a distribution with Cauchy-like tails [19], for which none of the integer moments exists. However, as shown in Appendix A, it turns out [19] that if the mean and standard deviation of the denominator variable ( $\tau_M$  and  $\tau_S$ ) are such that the probability of a zero or negative denominator are negligible, the distribution of the ratio may be approximated reasonably well by a normal distribution with mean and variance given by:

$$E(\tau_{eff}(i)) = \left[\frac{\mu_{\alpha}}{\mu_{M}} + \frac{1 - \mu_{\alpha}}{\mu_{S}}\right]^{-1}$$
(19)

$$\operatorname{var}(\tau_{eff}) \cong \left[\frac{\mu_{\alpha}}{\mu_{M}} + \frac{1 - \mu_{\alpha}}{\mu_{S}}\right]^{-4} \left[ \left(\frac{1}{\mu_{M}} - \frac{1}{\mu_{S}}\right)^{2} \sigma_{\alpha}^{2} + \frac{\mu_{\alpha}^{2}}{\mu_{M}^{4}} \sigma_{M}^{2} + \frac{(1 - \mu_{\alpha})^{2}}{\mu_{S}^{4}} \sigma_{S}^{2} \right]$$
(20)

Assuming no clock jitter ( $\sigma_{\alpha}^2 = 0$ ), equal master and slave resolution time-constants ( $\mu_M = \mu_S$ ) and equal variances ( $\sigma_M^2 = \sigma_S^2 = \sigma^2$ ), the variance of  $\tau_{eff}$  in (20) becomes

$$var(\tau_{eff}) = \sigma_{\tau_{eff}}^2 \cong (1 - 2\mu_{\alpha} + 2\mu_{\alpha}^2)\sigma^2$$
(21)

This leads to a reduction in the variance by a factor depending on the duty cycle. As shown by (23) and Figure 9, a duty cycle of 50% ( $\mu_{\alpha} = \frac{1}{2}$ ) cuts the variance in half.



Figure 9.  $\sigma^2_{\tau_{eff}}/\sigma^2$  vs. duty cycle.

In an approach similar that used for (19) and (20), Appendix A shows that the mean and variance of  $\tau_N$ , are given by:

$$E(\tau_N) = \left[\frac{\mu_\alpha}{\mu_M} + \frac{1 - \mu_\alpha}{\mu_S}\right]^{-1}$$
(22)

$$var(\tau_N) \cong \frac{var(\tau_{eff})}{N}$$
 (23)

Because of the way  $\tau_N$  is defined, its expected value is the same for a single master-slave synchronizer as for a chain of N such synchronizers. However, the variance of  $\tau_N$  diminishes as the number of flip-flops in the synchronizer increases. For synchronizers with N stages, each with  $\mu_M = \mu_S$ , assuming equal variations on both master and slave  $((\sigma_M^2 = \sigma_S^2 = \sigma^2))$ , and a fixed 50% duty cycle ( $\mu_{\alpha} = 0.5$ ), the resulting standard deviation is  $\sigma_{\tau_N} = \sigma/\sqrt{2N}$ . This is an important result because it indicates that the variability of synchronizer chains, which may be needed in submicron technologies [12], diminishes with the number of flip-flops in the synchronizer and can be minimized for  $\mu_M = \mu_S$  and a 50% duty cycle.

# C. Failure Estimates under Conditions of Parameter Variability

Failures of a multistage synchronizer can be modeled as a random point Poisson process with independent increments (Appendix B). This model is usually employed to obtain the estimates of mean time between failures (MTBF) and *failures in time* (FIT). The FIT is the inverse of MTBF, providing both measures are expressed in compatible units. A conversion factor may be required since MTBF is often expressed as the expected number of years before the first failure while FIT, to avoid a large number of zeroes, is often expressed in failures per billion device-hours of operation. The MTBF metric has the advantage that it directly indicates the failure-free lifetime of a component. On the other hand, the FIT of a multi-component system is merely the sum of the FIT values of the components. Modern SoC designs have several features that lead to added

Modern SoC designs have several features that lead to added complexity in the estimation of both MTBF and FIT.

- Increased variability of transistor parameters appears at each new semiconductor process node
- A variety of synchronizer designs may be required to meet all reliability goals and power budgets

- Thousands of synchronizers may be required in some multiprotocol designs
- Safety-critical, cyber-physical systems produced in huge volumes typically require extreme reliability

The calculation of MTBF for a single multistage synchronizer is given by (15). Clearly then

$$FIT(N) = \frac{1}{MTBF(N)} \le T_W(N) f_D f_C \exp\left(-\frac{NT}{\tau_N}\right)$$
(24)

The results of Appendix A provide the basis for a preliminary estimate of these failure measures taking process variability into account. For the optimum case, where  $\mu_M = \mu_S$  and duty cycle is 50%, we can set  $\tau_N = \mu_M + \eta(\sigma_M / \sqrt{2N})$  giving a conservative estimate of FIT. With  $\eta = 2$  and  $\eta = 3$ , evaluation of  $\tau_N$  can be used to predict synchronizer performance at the 95% and 99% levels of confidence respectively. The multiplicative parameter  $T_W(N)$  also depends on process variability, but its impact is small compared to that of  $\tau_N$ .

Appendix B considers the case of a SoC made in high volume for a safety-critical application. An individual SoC may include several different synchronizer cells each with a different distribution of failure rates. It is shown that an upper bound on the failure rate is obtained by summing all the individual failure-rate upper bounds. This result is independent of the variance of the individual distributions of failure rates.

To illustrate this approach, consider a production run with P units, each unit contains Q different styles of  $N_i$ -stage synchronizers and  $S_i$  different instances of an individual style. The overall FIT is then upper-bounded by

$$FIT = P \sum_{i=1}^{Q} S_i FIT(N_i)$$

$$\leq P \sum_{i=1}^{Q} S_i T_{Wi}(N_i) f_{Di} f_{Ci} \exp\left(-\frac{N_i T_i}{\tau_{N_i}}\right)$$
(25)

By taking the inverse of FIT the MTBF is obtained. The probability of successfully providing a failure-free lifetime L for all products in a production run is

$$P(success) \ge e^{-FIT \cdot L} \ge 1 - FIT \cdot L \tag{26}$$

Because *FIT* in (25) contains a sum of exponentials, the exponential term in (26) introduces a product of double exponentials. However, since we wish P(success) to be close to unity,  $FIT \cdot L$  must be small compared to unity. Consequently, the inequality at the right will generally be tight.

#### V. SYNCHRONIZER MTBF PERFORMANCE

In this section, we use the derived model to compare two similar synchronizer configurations based on their synchronization performance. The first configuration (Figure 10) is the one studied so far relying on a series of concatenated flip-flops. The second configuration (Figure 11) is a two flipflop synchronizer that allows for an effective resolution time equal to that of the concatenated flip-flops, but assumes that the Asynchronous input changes slowly enough to be captured by a sampling rate  $f_C/N$ .



Figure 10. NT resolution time synchronizer composed of N+1 flip-flops, operating at f<sub>c</sub>



# Figure 11. NT resolution-time synchronizer composed of two concatenated flip-flops and operating at f<sub>c</sub>/N

The MTBF in the former configuration (*conf.*1) can be bounded by using (15)

$$MTBF(N) \ge \frac{T}{T_W(N)f_D} \exp\left(\frac{NT}{\tau_N}\right)$$
 (27)

In the second configuration (*conf.*2), the MTBF is obtained using the synchronizer formula from (15) with N = 1

$$MTBF(1) \ge \frac{TN}{T_W(1)f_D} \exp\left(\frac{TN}{\tau_{eff}}\right)$$
(28)

/ MT

Combining equations (27) and (28)

$$\frac{MTBF_{conf.1}}{MTBF_{conf.2}} \approx \frac{\frac{1}{T_W(N)f_D} \exp\left(\frac{NI}{\tau_N}\right)}{\frac{TN}{T_W(1)f_D} \exp\left(\frac{TN}{\tau_{eff}}\right)}$$
(29)

Assuming that all flip-flops are identical,  $\tau_i = \tau \quad \forall i = 1 \dots N$ , and  $\tau_N = \tau_{eff} = \tau$ , then (29) can be simplified to obtain:

$$\frac{MTBF_{conf.1}}{MTBF_{conf.2}} \approx \frac{T_W(1)}{T_W(N)} \frac{1}{N}$$
(30)

Usually  $T_W(N)$  will be smaller than  $T_W(1)$  since the metastability window decreases as the number of flip-flops in the synchronizer increases. If, as N is increased,  $T_W(N)$  decreases faster than O(N), then *conf*.1 would perform better. On the other hand, if  $T_W(N)$  decreases slower than O(N),

*conf.*2 is more robust. Based on (17) the ratio in (30) becomes  $\frac{T_W(1)}{T_W(N)N} = (G_{vv}^M G_{vv}^S)^{N-1} \frac{1}{N}$  and the discussion is reduced to whether  $G_{vv}^M G_{vv}^S > N^{1/(N-1)}$ . For N > 1 the right hand side of this inequality is never greater than 2. In most cases, the gain near the metastability point will be higher than that, leading to the likelihood of better performance of the concatenated flip-flops. In fact, the additional flip-flops add gain, and thus contribute to the overall gain-bandwidth product of the synchronizer.

#### VI. SIMULATIONS

In this section we present simulations of the model derived and discuss implications of the results.

Figure 9 shows MTBF using the formulae of previous publications referenced in Table I and the formula derived in this work (15). The calculations are compared with simulations performed using method [14]. MTBF is calculated for different clock periods for a four flip-flop synchronizer. All four stages were taken to be identical with a 50% duty cycle and  $f_d$ =200 Mhz. Simulation values, parameters for calculation and circuit netlists were obtained using a commercial 90nm process. The comparisons include formulae [5], [6] and [12]. Since in those publications there is no differentiation between  $\tau_M$  and  $\tau_S$ , we provide two calculations for both cases. Calculations using the published formulae, but with  $\tau_{eff}$  as in (9), are also shown. The values of  $T_W$  for a single flip-flop were used for all the referenced calculations. Results show a significant improvement in accuracy, by our model, representing the tightest lower bound on the MTBF. Formulae from [10] and [13] may provide a similar accuracy as our model, but their formulations require knowledge of additional parameters and are less intuitive so it is hard to make comparisons over a wide range of situations.

Figure 14 shows an example set of calculations and simulations for multiple flip-flop synchronizers and the match between simulation and the developed model. The calculated points (red circles) are all calculated using the intrinsic parameters ( $T_W(1)$ ,  $T_W(2)$ ,  $\tau_M$  and  $\tau_S$ ) obtained at T = 800 ps. The simulated points (black squares) show a departure from the

expected straight line on the log plot for T < 800 ps. This is due to the fact that, at these small clock periods, and at this process corner, minimum clock width requirements of the latches have been violated.



Figure 14. MTBF for different stage synchronizers; calculations vs simulations

#### CONCLUSIONS

We developed an expression to accurately estimate a lower bound on the MTBF of multistage synchronizers that enables calculation for an arbitrary number of stages. The formula is based on four extrinsic parameters, N,  $\alpha$ ,  $f_D$  and  $f_C$ , and four intrinsic parameters,  $T_W(1)$ ,  $T_W(2)$ ,  $\tau_M$  and  $\tau_S$ . We introduced the concept of  $\tau_{eff}$  and showed the influence of the duty cycle on the formula for the MTBF bound. Variations in transistor parameters and in clock waveforms resulting from processing differences, both inter-die and intra-die, lead to variations in MTBF estimates. Conditions that minimize these variations were derived, both for a single master-slave synchronizer and for multistage synchronizers. In fact, these variations in MTBF diminish with the number of flip-flops in a multistage synchronizer. For safety-critical applications, calculation of the probability of a failure-free lifetime for all products in a production run was shown to depend on the mean values of the



Figure 12. MTBF comparison for 4 flip-flop synchronizer

various random variables and we derived a simple bound.

The formula for MTBF derived here was compared with previously published formulae. Some formulae compromise accuracy for ease of use, while others provide good estimates, but their computation requires many parameters. Our formula was demonstrated to be accurate, easy to use and intuitive. Unlike the other methods, ours provides a tight lower bound on the *MTBF*. For example, Figure 9 shows an expanded view of a typical result for a clock period of 1 ns. The simulation indicates an MTBF that is about a factor of two greater than the slightly more conservative bound calculated according to (15). The formulae from the literature give bounds that are two to five orders of magnitude more conservative.

#### ACKNOWLEDGMENT

The work of Salomon Beer was supported in part by HPI institute for scalable computing. The work of Jerome Cox, Tom Chaney and David Zar was supported in part by the National Science Foundation under Grant No. 0924010 and in part by the National Innovation Fund. Many years ago our late colleague, Charles E. Molnar, developed many of the basic notions used in this work.

#### REFERENCES

- S. Beer, R. Ginosar, J. Cox, T. Chaney, D. Zar, "MTBF Bounds for Multistage Synchronizers," Asynchronous Circuits and Systems (ASYNC), 2013 IEEE 19th International Symposium on, pp.158,165, 19-22 May 2013
- [2] D. Kinniment, K. Heron and G. Russell, "Measuring Deep Metastability," ASYNC 2006.
- [3] C. Dike and E. Burton, "Miller and noise effects in synchronizing flipflop," JSSC, 34(6):849-855, 1999.
- [4] S. Beer, R. Ginosar, M. Priel, R.Dobkin, A. Kolodny, "An on-chip metastability measurement circuit to characterize synchronization behavior in 65nm", ISCAS 2011
- [5] S. Lubkin, Asynchronous Signals in Digital Computers, Discussion, Proc ACM 1952, pp 238-241; L. Kleeman and A. Cantoni, "Metastable behavior in Digital Systems", IEEE Design & Test of Computers, 4(6), 4-19, 1987.
- [6] T.J. Gabara, G.J. Cyr and C.E. Stroud, "Metastability of CMOS masterslave flip-flops", IEEE Transactions on Circuits and Systems II - Analog and Digital Signal Processing, 734-740, 1992.
- [7] C. Brown and K. Feher, "Measuring metastability and its effect on communication signal processing systems", IEEE Transactions on Instrumentation and Measurement, 46(1), 1997.
- [8] C. Myers, E. Mercer, and H. Jacobson, Verifying synchronization strategies, in Formal Methods for Globally Asynchronous Locally Synchronous (GALS) Architecture, 2003.
- [9] D. Kinniment, Synchronization and Arbitration in Digital Systems, Wiley 2007.
- [10] I.W. Jones, S. Yang and M. Greenstreet, "Synchronizer Behavior and Analysis," ASYNC 2009.
- [11] D. Chen, D. Singh et al., "A comprehensive approach to modelling, characterizing and optimizing for metastability in FPGAs," FPGA 2010.
- [12] S. Beer, R. Ginosar, et.al "The Devolution of synchronizers," ASYNC 2010.
- [13] Terrence Mak, Truncation Error Analysis of MTBF Computation for Multi-Latch Synchronisers, Microelectronics Journal, pp. 1-10, 2011.
- [14] S. Yang and M. Greenstreet, "Computing synchronizer failure probabilities," DATE 2007.
- [15] S. Beer, R. Ginosar, J. Cox, D. Zar, T. Chaney, "Metastability challenges for 65nm and beyond; Simulations and measurements", DATE 2013.

- [16] S. Nassif, K. Bernstein, D. Frank, A. Gattiker, W. Haensch, B. Ji, E. Nowak, D. Pearson and N.J Rohrer, "High Performance CMOS Variability in the 65nm Regime and Beyond," IEEE Int. Electron Devices Meeting, vol. 10, no. 12, pp. 569--571, Dec. 2007.
- [17] International Technology Roadmap for Semiconductors (ITRS), 2006 update.
- [18] J.Zhou, D.Kinniment, G.Russell, and A. Yakovlev, "Adapting synchronizers to the effects of on chip variability," Proc. IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC), 2008.
- [19] G. Marsaglia, "Ratios of Normal Variables" Journal of statistical software, vol. 16, no. 4, May, 2006.

#### APPENDIX A

In this appendix we derive several results used in Section IV. The slope, with respect to  $\alpha$ , of a master-slave synchronizer's effective resolution time-constant  $\tau_{eff}$  is easily obtained from (9),

$$\frac{d\tau_{eff}}{d\alpha} = \frac{d\left[\frac{\alpha}{\tau_M} + \frac{1-\alpha}{\tau_S}\right]^{-1}}{d\alpha} = \frac{\frac{1}{\tau_S} - \frac{1}{\tau_M}}{\left[\frac{\alpha}{\tau_M} + \frac{1-\alpha}{\tau_S}\right]^2}$$
(31)

Because of the square in the denominator of (31), it is clear that the slope of  $\tau_{eff}$  with respect to  $\alpha$  is monotonic. In particular, it is always is positive for  $\tau_M > \tau_S$ , always negative for  $\tau_S > \tau_M$  and vanishes for  $\tau_M = \tau_S$ . These results are based on the functional form of  $\tau_{eff}$  and apply equally well to  $E(\tau_{eff})$  below. Because  $\tau_N$  is the harmonic mean of the  $\tau_{eff}(i)$ , similar observations can be made for  $\tau_N$  and  $E(\tau_N)$ .

Next we study the effect of random variations in the circuit parameters on  $\tau_{eff}(\alpha, \tau_M, \tau_S)$ . Introducing incremental changes in the random variables  $\alpha$ ,  $\tau_M$  and  $\tau_S$  with respect to their mean values,  $\mu_{\alpha}$ ,  $\mu_M$  and  $\mu_S$  yields,

$$\tau_{eff}(\alpha, \tau_M, \tau_S) = \left[\frac{\alpha}{\tau_M} + \frac{1-\alpha}{\tau_S}\right]^{-1} = \left[\frac{\mu_\alpha + \Delta\alpha}{\mu_M + \Delta\tau_M} + \frac{1-\mu_\alpha - \Delta\alpha}{\mu_S + \Delta\tau_S}\right]^{-1}$$
(32)

These random variables are assumed to be drawn from the independent, normal distributions  $\mathcal{N}(\mu_{\alpha}, \sigma_{\alpha}^2)$ ,  $\mathcal{N}(\mu_M, \sigma_M^2)$  and  $\mathcal{N}(\mu_S, \sigma_S^2)$ , respectively. Also assume the standard deviations,  $\sigma_{\alpha}$ ,  $\sigma_M$  and  $\sigma_S$ , are small compared to their respective mean values so that  $\tau_{eff}(\alpha, \tau_M, \tau_S)$  can be expressed as a multivariate Taylor series in the incremental variables,  $\Delta \alpha$ ,  $\Delta \tau_M$  and  $\Delta \tau_S$ ,

$$\tau_{eff}(\alpha, \tau_M, \tau_S) = \mu_{\tau} - \left[\frac{\mu_{\alpha}}{\mu_M} + \frac{1 - \mu_{\alpha}}{\mu_S}\right]^{-2} \left[ \left(\frac{1}{\mu_M} - \frac{1}{\mu_S}\right) \Delta \alpha - \frac{\mu_{\alpha}}{\mu_M^2} \Delta \tau_M - \frac{1 - \mu_{\alpha}}{\mu_S^2} \Delta \tau_S + O((\Delta \alpha)^2, (\Delta \tau_M)^2, (\Delta \tau_S)^2) \right]$$
(33)

where  $\mu_{\tau} = \tau_{eff}(\mu_{\alpha}, \mu_{M}, \mu_{S})$  is the value of (32) when all incremental variables vanish. In addition, the distributions are such that  $\alpha \in \{0,1\}$  and  $\tau_{M}, \tau_{S} > 0$ . If the incremental variables in (33) satisfy these constraints, it is clear that

$$E(\tau_{eff}) \cong \mu_{\tau} = \left[\frac{\mu_{\alpha}}{\mu_{M}} + \frac{1 - \mu_{\alpha}}{\mu_{S}}\right]^{-1}$$
(34)

where the expectations of all the linear terms in the incremental variables vanish and the approximation reflects the fact that the expectations of terms beyond the linear ones are small and can be ignored. Also, by retaining the linear terms and dropping the higher order terms, the variance of  $\tau_{eff}$  becomes,

$$\sigma_{\tau}^{2} \triangleq var(\tau_{eff}) \cong \left[\frac{\mu_{\alpha}}{\mu_{M}} + \frac{1 - \mu_{\alpha}}{\mu_{S}}\right]^{-4} \left[ \left(\frac{1}{\mu_{M}} - \frac{1}{\mu_{S}}\right)^{2} \sigma_{\alpha}^{2} + \frac{\mu_{\alpha}^{2}}{\mu_{M}^{4}} \sigma_{M}^{2} + \frac{(1 - \mu_{\alpha})^{2}}{\mu_{S}^{4}} \sigma_{S}^{2} \right]$$
(35)

Since each of the component distributions is normal, the distribution of the resulting linear combination is also normal,  $\mathcal{N}(\mu_{\tau}, \sigma_{\tau}^2)$ . The variance in (35) can be reduced by making the coefficient of  $\sigma_{\alpha}^2$  vanish. This can be done by making  $\mu_M = \mu_S$ . In addition, the condition  $\sigma_M^2 = \sigma_S^2$  causes the variance  $\sigma_{\tau}^2$  to be simplified as follows,

$$\sigma_{\tau}^2 = (1 - 2\mu_{\alpha} + 2\mu_{\alpha}^2)\sigma_M^2 \tag{36}$$

For a 50% duty cycle ( $\mu_{\alpha} = 0.5$ ), the coefficient of  $\sigma_M^2$  is minimized and (36) becomes  $\sigma_{\tau}^2 = \frac{1}{2}\sigma_M^2$ . Thus, for minimum  $\sigma_{\tau}^2$ , the synchronizer designer should strive to make  $\mu_M = \mu_S$ ,  $\sigma_M^2 = \sigma_S^2$  and  $\mu_{\alpha} = \frac{1}{2}$ .

Next, we derive equations for the variability of a multistage synchronizer's resolution time-constant  $\tau_N$ . These equations are based on a normal distribution of the duty cycle  $\alpha$  and the resolution time-constants  $\tau_M$  and  $\tau_S$  in a single-stage master-slave synchronizer. We have shown above that the  $\tau_{eff}$  for a single master-slave stage has a normal distribution  $\mathcal{N}(\mu_{\tau}, \sigma_{\tau}^2)$ . As we assumed in connection with (33), the distribution mean  $(\mu_{\tau})$  is always positive and the probability of  $\tau_{eff}$  being zero or negative is negligible. In practical cases when  $\mu_{\tau} \gg \sigma_{\tau}$  these assumptions will hold, since the probability of any physical time constant being zero vanishes. We also assume that the  $\tau_{eff}$  of each stage is statistically independent of the  $\tau_{eff}$  of any other stage in the synchronizer.

Now assume all stages,  $i \in (1, N)$ , have identical design, but the  $\tau_{eff}(i)$  fluctuates with process and duty cycle variations. Using an approach similar to (32)–(35) we get approximate expressions for the resolution time-constant  $\tau_N$  and its mean and variance,

$$\tau_{N} = \left[\frac{1}{N}\sum_{1}^{N}\frac{1}{\tau_{eff}(i)}\right]^{-1} = \left[\frac{1}{N}\sum_{1}^{N}\frac{1}{\mu_{\tau} + \Delta\tau(i)}\right]^{-1}$$
$$\cong \mu_{\tau_{N}} + N\left[\sum_{1}^{N}\frac{1}{\mu_{\tau}}\right]^{-2}\left[\sum_{1}^{N}\frac{\Delta\tau(i)}{\mu_{\tau}^{2}}\right] \quad (37)$$
$$= \mu_{\tau_{N}} + \frac{1}{N}\sum_{1}^{N}\Delta\tau(i)$$

$$E(\tau_N) \triangleq \mu_{\tau_N} \cong \mu_{\tau} = \left[\frac{\mu_{\alpha}}{\mu_M} + \frac{1 - \mu_{\alpha}}{\mu_S}\right]^{-1}$$
(38)

$$var(\tau_{N}) \triangleq \sigma_{\tau_{N}}^{2} \cong \frac{1}{N^{2}} \sum_{1}^{N} \sigma_{\tau}^{2} = \frac{1}{N} \sigma_{\tau}^{2}$$
$$= \frac{1}{N} \left[ \frac{\mu_{\alpha}}{\mu_{M}} + \frac{1 - \mu_{\alpha}}{\mu_{S}} \right]^{-4} \left[ \left( \frac{1}{\mu_{M}} - \frac{1}{\mu_{S}} \right)^{2} \sigma_{\alpha}^{2} + \frac{\mu_{\alpha}^{2}}{\mu_{M}^{4}} \sigma_{M}^{2} + \frac{(1 - \mu_{\alpha})^{2}}{\mu_{S}^{4}} \sigma_{S}^{2} \right]$$
(39)

For  $\mu_M = \mu_S$ ,  $\sigma_M^2 = \sigma_S^2$  and a 50% duty cycle, (38) and (39) become,

$$E(\tau_N) = \mu_{\tau_N} \cong \mu_M \tag{40}$$

$$var(\tau_N) = \sigma_{\tau_N}^2 \cong \frac{1}{2N} \sigma_M^2$$
 (41)

Finally, the coefficient  $T_W(N)$  in (15) is also subject to random variations in circuit parameters. We assume  $T_W(N)$  is a random variable drawn from a distribution  $\mathcal{N}(\mu_W, \sigma_W^2)$ . From (40) and (41) we know that  $\tau_N$  can be represented as a random variable drawn from the distribution  $\mathcal{N}(\mu_{\tau_N}, \sigma_{\tau_N}^2)$ . Now, define  $g(T_W(N), \tau_N) \triangleq T_W(N) \exp(-NT/\tau_N)$ , a function of these two random variables. Using this definition the inequality (15) can be rewritten as

$$\frac{FIT(N)}{f_D f_C} \le g(T_W(N), \tau_N) = T_W(N)e^{\left[-\frac{NT}{\tau_N}\right]}$$
(42)

Following the approach used in (33), (34) and (35) above and assuming that  $T_W(N) > 0$  we have

$$g(T_{W}(N), \tau_{N}) = g(\mu_{W}, \mu_{\tau_{N}}) + T_{W}(N)e^{\left[-\frac{NT}{\tau_{N}}\right]} \left[\frac{\Delta T_{W}}{T_{W}(N)} + \left(\frac{NT}{\tau_{N}}\right)\frac{\Delta \tau_{N}}{\tau_{N}} + O((\Delta T_{W})^{2}, (\Delta \tau_{N})^{2})\right]e^{\left[-\frac{NT}{\tau_{N}}\right]} \left[\frac{\Delta T_{W}}{T_{W}(N)} + \left(\frac{NT}{\tau_{N}}\right)\frac{\Delta \tau_{N}}{\tau_{N}} + O((\Delta T_{W})^{2}, (\Delta \tau_{N})^{2})\right]$$

$$(43)$$

Dropping the second order terms, the mean and variance can then be approximated,

$$E[g(T_W(N), \tau_N)] \cong g(\mu_W, \mu_{\tau_N}) = \mu_W e^{\left|-\frac{NT}{\mu_{\tau_N}}\right|}$$
(44)

$$\begin{aligned} \operatorname{par}[g(T_W(N), \tau_N)] &\cong T_W(N)^2 e^{\left[-\frac{2NT}{\tau_N}\right]} \left[\frac{\sigma_W^2}{T_W(N)^2} + \left(\frac{NT}{\tau_N}\right) \frac{\sigma_{\tau_N}^2}{\tau_N^2}\right] \end{aligned}$$
(45)

Since it is desirable to make  $NT/\tau_N$  large to achieve a small FIT or a large MTBF, the  $\sigma_{\tau_N}^2$  term in (45) will tend to dominate the  $\sigma_W^2$  term. In any case, the variance of the bound in (42) diminishes rapidly as  $NT/\tau_N$  becomes large allowing the

variance to be ignored and we can use the expected value (44) in calculations of failure risk.

#### APPENDIX B

A homogeneous Poisson process can often be used as a model for a sequence of synchronizer failures. If the probability of synchronizer failure in a single clock period corresponds to (7), a sequence of such clock periods becomes a Bernoulli trials process, a discrete-time process that, as the number of trials becomes large, approaches a continuous-time Poisson process. Alternatively, if the occurrence of synchronizer trials can be modeled as a Poisson process, a series of those trials that produce synchronizer failures is also a Poisson process. Let us define success as the failure-free operation of a synchronizer over a product lifetime of length L. Assuming a homogeneous Poisson process with a failure rate  $\lambda \triangleq FIT = 1/MTBF$ , the probability of success is bounded by

$$P(success|\lambda) \ge e^{-\lambda L} \tag{46}$$

We have  $\lambda, L > 0$  and desire  $\lambda L \ll 1$  so that the probability of success is close to unity. We also expect that the product is operating successfully at the beginning of its life so that  $P(success|\lambda) = 1$  for L = 0.

Increasingly, as a result of semiconductor process variability,  $\lambda$  is randomly distributed across all the instances of synchronizers in a chip and across chips in a production run. Modeling the failures of each instance of a synchronizer by an independent Poisson process with a failure rate  $\lambda_i$ , leads to an aggregate Poisson process with a failure rate

$$\lambda = \sum_{i=1}^{N} \lambda_i \tag{47}$$

In most cases N is the number of synchronizers in the product, but when a synchronizer failure may put human lives at risk, N should include all synchronizers in the entire production run.

The synchronizers in a chip may be of differing designs in order to satisfy different specifications. However, assuming their failure rates  $\lambda_i$  are drawn from normal distributions, the aggregate failure-rate distribution  $P_{\lambda}(\lambda)$ can, in turn, be modeled by a normal distribution,  $\mathcal{N}(\mu_{\lambda}, \sigma_{\lambda}^2)$  where  $\sigma_{\lambda} \ll \mu_{\lambda}$  so that  $P_{\lambda}(\lambda) \cong 0$  for  $\lambda \le 0$ . Under these conditions the probability of success (no failures over a product lifetime *L*) is

$$P(success) = \int_{0}^{\infty} P(success|\lambda) P_{\lambda}(\lambda) d\lambda$$

$$\geq \int_{0}^{\infty} e^{-\lambda L} \frac{e^{-\frac{1}{2}\left(\frac{\lambda-\mu_{\lambda}}{\sigma_{\lambda}}\right)^{2}}}{\sqrt{2\pi}\sigma_{\lambda}} d\lambda$$

$$= \frac{e^{-(\mu_{\lambda}L - \frac{1}{2}\sigma_{\lambda}^{2}L^{2})}}{\sqrt{2\pi}} \left[ \int_{-\infty}^{\infty} e^{-\frac{1}{2}y^{2}} dy - \int_{-\infty}^{-y_{0}} e^{-\frac{1}{2}y^{2}} dy \right]$$

$$= e^{-(\mu_{\lambda}L - \frac{1}{2}\sigma_{\lambda}^{2}L^{2})} [1 - P(y \leq -y_{0})]$$

$$\approx e^{-(\mu_{\lambda}L - \frac{1}{2}\sigma_{\lambda}^{2}L^{2})} > e^{-\mu_{\lambda}L}$$
(48)

where  $y = \frac{\lambda - \mu_{\lambda}}{\sigma_{\lambda}} + \sigma_{\lambda}L$  and  $y_0 = \frac{\mu_{\lambda}}{\sigma_{\lambda}} - \sigma_{\lambda}L$ . The approximation in the last line of (3) holds providing the normal probability distribution function,  $P(y \le -y_0) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{-y_0} e^{-\frac{1}{2}y^2} dy$ , is small compared to unity.  $P(y \le -2) = 0.023$  so for  $-y_0 < -2$  there is only a small error. Furthermore, the second term in the exponent can be dropped to produce a slightly looser lower bound on the probability of success. Alternatively, we can say that  $\mu_{\lambda}$  gives an upper bound on the overall failure rate.