31. A
L
k
CCMMOOSS LLooggiicc IImmpplleemmeennttaattiioonnss
4x B 4x C 4x
2x B 4x C 4x
B F
Without sizing
1x 3x C 2x C C
t A in
A
p1
kB
3x B 2x
With sizing
7
C
3
t
3
in CL
3x p 2 A A
k
k
A
C
37. FFaann--IInn CCoonnssiiddeerraattiioonnss
A B C D
A
B C3
CL Distributed RC model
(Elmore delay)
C C2
D C1
tdHL = 0.69 Reqn(C1+2C2+3C3+4CL)
Propagation delay deteriorates
rapidly as a function of fan-in –
quadratically in the worst case.
38. ttpp aass aa FFuunnccttiioonn ooff FFaann--IInn
1250
1000
750
500
250
quadratic
Gates with a
fan-in
greater than
4 should be
avoided.
0
2 4 6 8 10 12 14 16
fan-in
tp
HL
td
tp(psec)
56. z
MMuullttiissttaaggee llooggiicc nneettwwoorrkkss
10
A w
x y 20
B
g=1
f=w/10
g=4/3
f=x/w
g=5/3
f=y/x
g=1
f=z/y
g=5/3
f=20/z
•Path Parasitic Delay
P pi
•Path Delay
D P gi fi
•How do we minimize D? How do we select the sizing?
57. PPaatthh eeffffoorrtt iiss aann iinnddiirreecctt mmeeaassuurree ooff tthhee ppaatthh ddeellaayy
•Path Electrical Effort F
Cout
Cin
fi
•Path Logical Effort G gi
•Path Effort H GF
• The above does not include any consideration of the effect of
fanout within the path
• H counts only the fanout of the output
• We need to express the branching behavior along the path
59. PPaatthh EEffffoorrtt
H GFB gi
PPaatthh DDeellaayy
D P gi fi
fi gi fi
MMiinniimmiizzeedd wwhheenn eeaacchh ssttaaggee ddeellaayy iiss eeqquuaall
gi fi hˆ
D P Nhˆ
P N
N
H
N
H
61. 1
EExxaammppllee
10
A w
B
x y
z 20
g=1
f=w/10
g=4/3
f=x/w
G g
g=5/3
f=y/x
4 5 5
1
g=1
f=z/y
100
g=5/3
f=20/z
F
Cout
Cin
B 1
i
3 3
20
2
10
3 27
H GFB
100
2 1
200
27 27
62. 200
5
27
3 3
z
10
A w
x y 20
B
g=1
f=w/10
g=4/3
f=x/w
hˆ
ˆ
g=5/3
f=y/x
1.49
5 20
g=1
f=z/y
g=5/3
f=20/z
h g5 f5
3 z
1.49 z 22.3
hˆ g f 1
z
4
y
1.49 y 15.0
hˆ g f
5
y
3 x
1.49 x 16.8
hˆ g f 4 x
1.49 w 15.0
2 2
3 w
N
H
4
63. EExxaammppllee
A
B
G gi
z
10
1
4
4
3 3
F
Cout
Cin
10
10
1
B
Ctotal
Conpath
2z
2
z
H GFB
4
10 2
80
3 3
1 z
80. D
C
E
EEuulleerr ppaatthh mmeetthhoodd
CCoonnvveerrtt sscchheemmaattiicc iinnttoo aa ggrraapphh
EEgg.. FF==AABB++CC++DDEE
A
B
y
A
D
C
B E
94. LLaayyoouutt ssttrraatteeggiieess
•Ordering of inputs
A AB
• If A arrives before B, it will charge upthe
capacitance to VDD. When B arrives, it
needs to discharge the capacitor.
• If B arrives before A, the capacitor will be
discharged before A arrives.
• Try and put the early arriving signalclose
to ground - can speed NAND up to 20%
B
102. WWhheerree DDooeess PPoowweerr GGoo iinn CCMMOOSS??
• Dynamic Power Consumption
Charging and Discharging Capacitors
• Short Circuit Currents
Short Circuit Path between Supply Rails during Switching
• Leakage
Leaking diodes and transistors
106.
PPoowweerr ddiissssiippaattiioonn
DDyynnaammiicc ppoowweerr ddiissssiippaattiioonn
1 T
T
P 2 in (t)Vout dt T ip (t)VDD Vout
T 0
2
1
C V dV C VDD
V
V dV
T L
C V
VDD
2
out
V 2
out L 0 DD out
out
L
DD
DD
T 2 2
CLVDD
2
f
0
118. TTeecchhnnoollooggyy SSccaalliinngg MMooddeellss
• Full Scaling (Constant Electrical Field)
ideal model — dimensions and voltage scale
together by the same factor S
• Fixed Voltage Scaling
most common model until recently —
only dimensions scale, voltages remain constant
• General Scaling
most realistic for todays situation —
voltages and dimensions scale with different factors
138. PPoowweerr ddiissssiippaattiioonn
C V f
T
T
P 2 in (t)Vout dt T ip (t)VDD Vout
T 0
2
1
C V dV C VDD
V
V dV
T L
C V
VDD
2
out
V 2
out L 0 DD out
out
L
DD
DD
T 2 2
2
L DD 01
P CLVDD 01 f
iiss tthhee aaccttiivviittyy ffaaccttoorr -- II..ee.. tthhee pprroobbaabbiilliittyy
tthhaatt aa cclloocckk eevveenntt rreessuullttss iinn aa 0011
ttrraannssiittiioonn
0
2
162. FA HA
FA HAFA
FA
FAFA
FA
TThhee MMxxNN AArrrraayy MMuullttiipplliieerr
—— CCrriittiiccaall PPaatthh
Critical Path 1
Critical Path 2
Critical Path 1 & 2
FA
HA HA
194. DDyynnaammiicc CCMMOOSS
N
logic
nnMMOOSS llooggiicc ssttrruuccttuurree wwiitthh pprreecchhaarrggeedd
ppuulllluupp
INPUTS
•Precharge to VDD when clock is low
•Evaluate when clock is high
CLK
200. Mp
O
CL
Me
IIssssuueess iinn DDyynnaammiicc DDeessiiggnn 11::
CChhaarrggee LLeeaakkaaggee
Clk
ut
A
Clk
Leakage sources
CLK
VOut
Precharge
Evaluate
Dominant component is subthreshold current
201. SSoolluuttiioonn ttoo CChhaarrggee LLeeaakkaaggee
Keeper
Clk
A
B
Clk
•Same approach as level restorer for pass-transistor logic
•Increase size of inverter to increase capacitance
CL
Out
Me
MkpMp
202. Mp
O
CL
CA
Me CB
IIssssuueess iinn DDyynnaammiicc DDeessiiggnn 22::
CChhaarrggee SShhaarriinngg
Clk
B=0
Clk
Charge stored originally on
CL is redistributed (shared)
ut over CL and CA leading to
reduced robustnessA
203. o i
DDyynnaammiicc CCMMOOSS
CChhaarrggee SShhaarriinngg
Co
Ci •Assume that the internal capacitances have been discharged
•In the precharge phase, the output capacitancegets charged
C •During evaluation, if all the inputs are high except thebottom
i
one, the output capacitance gets distributed to the internal
capacitance
C
•The output voltage will drop to
V
Co
DD
C 2C
CLK
i
•This could be low enough to trigger the inverter, causinga
wrong value on the output
Ci
205. DDyynnaammiicc CCMMOOSS
CCaassccaaddee pprroobblleemm
INPUTS N
logic
CLK
Since the evaluation from the first stage takes some time, the
second stage will start evaluating with the precharged input
rather than the evaluated input
N
logic
207. DDoommiinnoo LLooggiicc
SSoollvveess ccaassccaaddee pprroobblleemm
INPUTS N
logic
CLK
Since the precharged output from the first stage is 0, it will
never activate the pulldown network in the second stage until
the first stage evaluation has completed.
N
logic
208. NNPP DDoommiinnoo ((ZZiippppeerr)) CCMMOOSS
N
logic
INPUTS
CLK
Since the second stage is build from p-logic, the precharged
output from the first stage will not activate the inputs of the
second stage
CLK
P
logic
210. LLaattcchh vveerrssuuss RReeggiisstteerr
Latch
stores data when
clock is low
RReeggiisstteerr
ssttoorreess ddaattaa wwhheenn
cclloocckk rriisseess
Clk
D
Clk
D
Q Q
Clk
D Q
Clk
D Q
218. WWrriittiinngg iinnttoo aa SSttaattiicc LLaattcchh
Use the clock as a decoupling signal,
that distinguishes between the transparent and opaque states
CLK
Q D D
CLK
D
CLK
Converting into a MUX
Forcing the state
(can implement as NMOS-only)
CLK
CLK
230. 33--TTrraannssiissttoorr DDRRAAMM CCeellll
BL1 BL2
X
BL 1
BL 2
VDD
VDD 2 VT
VDD 2 VT
DV
No constraints on device ratios
Reads are non-destructive
Value stored at node X when writing a “1” = VWWL-VTn
WWL
RWL
WWL
RWL
M3
M1 X
M2
CS
232. DDRRAAMM CCeellll
Write 1 Read 1
WL
X GND VDD 2 VT
BL
VDD /2
VDD
V
sensing
Write: CS is charged or discharged by asserting WL and BL.
Read: Charge redistribution takes places between bit line and storage capacitance
CS
V = VBL – VPRE = VBIT – VPRE
------------
CS + CBL
Voltage swing is small; typically around 250 mV.
11--TTrraaBL nnssiissttoorr
WL
M1
CS
CBL
233. DDRRAAMM CCeellll OObbsseerrvvaattiioonnss
1T DRAM requires a sense amplifier for each bit line, due
to charge redistribution read-out.
DRAM memory cells are single ended in contrast to
SRAM cells.
The read-out of the 1T DRAM cell is destructive; read
and refresh operations are necessary for correct
operation.
Unlike 3T cell, 1T cell requires presence of an extra
capacitance that must be explicitly included in the design.
When writing a “1” into a DRAM cell, a threshold voltage
is lost. This charge loss can be circumvented by
bootstrapping the word lines to a higher value than VDD
235. DDRRAAMM CCeellll
Capacitor
Metal word line
n+ n+
Poly
Poly
Inversion layer
induced by
plate bias
SiO2
Field Oxide Diffused
bit line
Polysilicon
gate
M1 word
line
Polysilicon
plate
Cross-section Layout
Uses Polysilicon-Diffusion Capacitance
Expensive in Area
245. MMOOSS NNOORR RROOMM LLaayyoouutt
BL0 BL1 BL2 BL3
Cell (9.5 x 7)
WL0
WL1
WL2
WL3
Programmming using the
Active Layer Only
Polysilicon
Metal1
Diffusion
Metal1 on Diffusion
GND
GND
246. MMOOSS NNOORR RROOMM LLaayyoouutt
Cell (11 x
Programmming using
the Contact Layer Only
Polysilicon
Metal1
Diffusion
Metal1 on Diffusion
7)
247. MMOOSS NNAANNDD RROOMM
VDD
Pull-up devices
BL [0] BL[1] BL[2] BL [3]
WL[0]
WL[1]
WL[2]
WL[3]
All word lines high by default with exception of selected row
248. No contact to VDD or GND necessary;
drastically reduced cell size
Loss in performance compared to NOR ROM
MMOOSS NNAANNDD RROOMMC
LLLe
al
al (
yyy8
oox
uu7
tt )
Programmming using
the Metal-1 Layer Only
Polysilicon
Diffusion
Metal1 on Diffusion
251. DDeeccrreeaassiinngg WWoorrdd LLiinnee DDeellaayy
Driver
WL Polysilicon word line
Metal word line
(a) Driving the word line from both sides
Metal bypass
WL K cells
(b) Using a metal bypass
Polysilicon word line
(c) Use silicides
252. PPrreecchhaarrggeedd MMOOSS NNOORR RROOMM
f pre
VDD
Precharge devices
WL[0]
WL[1]
GND
WL[2]
WL[3]
GND
BL [0] BL[1] BL[2] BL[3]
PMOS precharge device can be made as large as necessary,
but clock driver becomes harder to design.
253. Source Drain
tox
tox
n+ p n+_
NNoonn--VVoollaattiillee MMeemmoorriieess
TThhee FFllooaattiinngg--ggaattee ttrraannssiissttoorr
((FFAAMMOOSS))
Floating gate Gate
Substrate
Device cross-section Schematic symbol
D
G
S
264. SSttaattiicc CCAAMM MMeemmoorryy CCeellll
Bit
Word
Bit Bit Bit
Bit
M8
M4
Bit
M9
M5
M6 M7
Word
Word
Match
S S
int
M3 M2
M1
Wired-NOR Match Line
CAM
••••••
•••
•••
CAM
CAMCAM
265. HitLogic
Sense Amps / Input Drivers
SRAM
ARRAY
CAM
ARRAY
CCAAMM iinn CCaacchhee MMeemmoorryy
Address Tag Hit R/W Data
AddressDecoder
Input Drivers
276. SSeennssee AAmmpplliiffiieerrss
C V
make V as smallas
possible
tp = ----------------
Iav
large
Idea: Use Sense Amplifer
small
outputinput
s.a.
small
transition