2DCompsitionEngine

Agenda:
• Architecture Overview

BB_2DHWA Feature Summary
• Block Copy/Draw Operations
• Rotation (90/180/270 degrees) and Mirror/Flip operations
• Scaling (1/16x ~ 16x)
• Color Space and format Conversion
• Chroma Up/Down sampling
• ROP2/3 operations
• Alpha Blending/Compositing (Porter Duff Compositing)
• Destination Clipping
• Source Pattern Repeat

Image Attributes….
source base address
SrcWidth
SrcHeight
(SrcXoffset,
SrcYoffset)
SurfWidth
SurfHeight
Stride_Size
pattern SrcPatHeight
SrcPatWidth
Data Types
•LUT/MONO-1/2/4/8
•YUV (420_2,422,444)
•RGB (aRGB16/24,32)
•Component Ordering
•Pre-multiplied
•Embedded Alpha
DMA Attributes
• Base Address
• Width/Height
• Stride
• Offsets
Operation Commands (SRC)
• CSC/CHRUS
• VC-remapping
• Color Expand
• Scaling
• Rotation
• Pattern RepeatOperation Commands (DST)
• Blending/Compositing
• ROP2/3
• Clipping
• Color Fill
• CSC/CHRDS

BB_2DHWA Operation Overview
SRC-1
Image Data
SRC-2
Image Data
Alpha
(Remote/Mask)
Color Space Conv
Color Expand
Scaling
Rotate
SRC-dst
Image
Color Space Conv
Color Expand
Scaling
Rotate
SRC-dst
Image
Blending
(Compositing)
ROP-2/3
ColorFill
Clipping
Color Space Conv
DST
Image
Data
DST
Image
Data

BB_2DHWA SRC  DST types
Any Source data types  Non-sub-byte and non-LUT Dest data types

Architecture Block Diagram
bb_2dhwa
bb_2dhwa_dp_core
bb_2dhwa_dp_cntl
L4 I/F (ocp2mmr)
bb_2dhwa_vpdma
uv read
R-client
ListMgr
L3I/F
bufbuf
BIMCDMA
ARB
pmem
packed data
R-client
SCR
vbusp_m vbusp_s
bb_2dhwa_dp_dst
alpha
444
to
422/
420
Color Red
&
Dithering
C
S
C
ROP/Blend
src1src2
smem
vbusp_s
dst
cfg
alpha
R-client
uv read
R-client
uv read
W-client
packed data
W-client cfg
bb_2dhwa_dp_src
ROT
rmem
CSC
Color
Exp
V
Scaler
H
Scaler
SLmem
SAmem
cmem
420to422
YC_aligner
422to444
Lmem
VC1Range
bb_2dhwa_dp_src
ROT
rmem
CSC
Color
Exp
V
Scaler
H
Scaler
SLmem
SAmem
cmem
420to422
YC_aligner
422to444
Lmem
VC1Range
packed data
R-client
bb_2dhwa_clkc_int
INTC
vbusp_s
CLK/RST
L3
L4
VPDMA
DP_SRC
DP_SRC DP_DST
DP_CORE

BB_2DHWA Architecture Block Diagram
bb_2dhwa
bb_2dhwa_dp_core
bb_2dhwa_dp_cntl
L4 I/F (ocp2mmr)
bb_2dhwa_vpdma
uv read
R-client
ListMgr
L3I/F
bufbuf
BIMCDMA
ARB
pmem
packed data
R-client
SCR
vbusp_m vbusp_s
bb_2dhwa_dp_dst
alpha
444
to
422/
420
Color Red
&
Dithering
C
S
C
ROP/Blend
src1src2
smem
vbusp_s
dst
cfg
alpha
R-client
uv read
R-client
uv read
W-client
packed data
W-client cfg
bb_2dhwa_dp_src
ROT
rmem
CSC
Color
Exp
V
Scaler
H
Scaler
SLmem
SAmem
cmem
420to422
YC_aligner
422to444
Lmem
VC1Range
bb_2dhwa_dp_src
ROT
rmem
CSC
Color
Exp
V
Scaler
H
Scaler
SLmem
SAmem
cmem
420to422
YC_aligner
422to444
Lmem
VC1Range
packed data
R-client
bb_2dhwa_clkc_int
INTC
vbusp_s
CLK/RST
L3
L4
VPDMA
DP_SRC
DP_SRC DP_DST
DP_CORE
VPDMA FW
Initialization
List Start
Descriptor
DownLoad
Descriptor
Copied
Client
Configuration
DMA
Read Req
Src Data
Processing
Dest Data
Generation
DMA
Write Req
(List) Cmd Done
IRQ

BB_2DHWA External Interfaces
MMR
Interconnect
HP
Interconnect
Interrupt
Interconnect
Clock/Reset
Interconnect
DFT
Interconnect
Memory BIST
Interconnect
_mmr_slv
_vpdma_mst
intr
l3_clk/clkdiv
l4_clk/clkdiv
rst_main_arst_n
dft
gpi
gpo
BB_2DHWA

Core Processing Unit (dp_src)
Cmem
Color
Exp
Rotate
Engine
Rmem
V
Scaler
H
Scaler
SLmem SAmem
422
to
444
Lmem
YC_aligner
420to422
u/v
y, yuv
(a)rgb, bm
8
32
cmem_mux rmem_mux
clut_loader
argb
32
vpi_invpi_in
VC-1rangemap
uv_2x
CSC
(clut 32)

Core Processing Unit (dp_dst)
dp_dst_src_gen
src1_pipe_fifo src2_pipe_fifo
blend_pd
rop_engine
alpha_pipe_fifo
Color_Red csc
Clip_Cntl
(dst_col_fill)
chr_ds
vpi_out_y vpi_out_uv
argb 32 argb 32 alpha-1/8/32
rgb
yuv444, yuv422,
y(420), mono-8
32 8
32 32 32
u/v(420)

Terminologies
• Tile Mode
• Vslice Mode
• Chroma Expansion

Tile Mode (Rotation)
90d rotate + scale
vpi i/f
reverse blocked
reverse raster order
vpdma (src)
90d rotate
with mirror-y
scale
to
32x32 blk
vpdma
(dst)

Tile Mode (Rotation Modes)
scale + 90d rotate
vpdma (src)
90d rotate
with mirror-y
scale
vpdma
(dst)
scale + 90d rotate + mirror-y
vpi i/f
(to core)
vpdma
(dst)
scale + 90d rotate + mirror-x
vpdma (src)
90d rotate
with mirror-y
scale
vpdma (src)
90d rotate
with mirror-y
scale
vpdma
(dst)
vpi i/f
normal blocked
reverse ROW
raster order
scale
vpdma
(dst)
vpdma
TB-RL tile read
TB-RL row ordering
90d rotate
with mirror-y
scale
scale +270 rotate

Scan Order Determination FlowChart
Any Src
90/270
Rotated?
src
flip/mirror?
Rot_mir_
mode
LtUp
90
(mode 1)
270
(mode 3)
yes
overlapped
copy
x or y
axis?
LtDn RtUp
y-axis
x-axis
RtDn
copy dir
RtDn RtUp LtDn
U | UR | UL | L
D | DR | DL
R
UpRt (Tile)
RtUp
DnLt (Tile)
LtDn
no
LtUp(Tile)
LtUp
180 Rot?
yes
yes
yes
Flip (only)
RtDn (Tile)
RtDn
0
(mode 0)
x or y
axis
LtDn(Tile)
LtDn
RtUp(Tile)
RtUp
no
yes (modes 4 & 5)
x-axisy-axis
no
no
No
180
(mode 2)
DnRt(Tile)
RtDn
UpLt(Tile)
LtUp
90+mx
(mode 6)
90+my
(mode 7)

Vslice Mode
YUV420 Source Data
Or
Any Data (scale_en)
> 1020 pixels wide
src2
Vslice_tar_w
Src2_in_w
Src1_in_w

Chroma Expansion
Over-fetching extra chroma pixels and/or lines to perform
proper 420422 and/or 422444 chroma upsampling across
tile/vslice boundaries

Key Functional Processing Units
•Scaler
•Rotation Engine
•Porter-Duff Compositing Engine
•ROP engine

Scaler
L_buf(mem) for vs
or
P_buf(reg) for vs
x
accum line buf (mem)-vs
accum pix buf (reg)-hs
+
phase_in
phase_out
x
scale_f
in
out
cntl
cfg
rdy/req rdy/req
weighted
blending

Scaler (Vertical Scaler)
L_buf(mem)
x
accum (mem)
+
fin fout
x
scale_f
in
Outsrc_row+1
src_row
src_row
frag_delta_v
frag_outfrag_in
frag_in_c
-
a
b
ab
abs(a-b)
intensity
RND
/SAT
out_valid
L_buf(mem)
1
upscaling
1
1
zero
first_row_pix 0
out_valid
1
8.0
8.0
1.24
frag_out_c
5.24
8.4
12.4
12.4
12.4
5.24
8.0
5.24
8.4
8.0
5.24
Inv_Scale_f
src_row inc
tar_row inc
a-bb-aone
out_valid = (a-b)>0 or
last_row_pix & (frag_∆_v < frag_∆_thresh)
- x +
8.0s
RND/
SAT
5.24
1.24
1.24 5.24
5.24
cmp
frag_delta_thresh
0.24
scale_factor_c
5.13
1.13
5.13
5.13
9.13s RND/
SAT
TRUNC TRUNC TRUNC
1
init
1 init

Rotation Engine
1 2 3 4input tile
r_mem
data read out
rotated
output tile
Write in rotated order
(addr + 32 pixel location)
Write in un-rotated order
(addr + 1 pixel location) Write in rotated order
Read already rotated data
Read out in rotated order
1 2 3 4input tile
r_mem
data read out
rotated
output tile
Write in rotated order
Write in un-rotated order
(addr + 1 pixel location) Write in rotated order
Read already rotated data
Read out in rotated order

Porter-Duff Compositing
Out = a*S + (1-a)*D Simple Blending
PorterDuff_Rule Selection
0x0 : CLEAR
0x1 : SRC
0x2 : DST
0x3 : SRC_OVER
0x4 : DST_OVER
0x5 : SRC_IN
0x6 : DST_IN
0x7 : SRC_OUT
0x8 : DST_OUT
0x9 : SRC_ATOP
0xA : DST_ATOP
0xB : XOR
0xC: PLUS

Porter-Duff Compositing Engine
+
16
16
1 / 255
X
88
16
X
8
24
X
88
16
X
8
16
24
0 1 1 0
+
2424
24
1 / 255
1 / 255
/
816
8
0 1
Csrc
Fsrc Asrc
SRC1
non_PreMult
SRC2
non_PreMult
Cdst
Fdst
2424
8
8
8
Cout
Non-Pre_Mult
16
16
((x<<8) + x + 256) >> 16
1 / 255 estimation
8
1 0
Csrc Cdst
Cout_simple_src
0 1
Cout_simple_no_div
0 (pd_CLEAR)
1 (Cout_simple_no_div or Cout_simpe_p2p)
2 (dst_pre_mult)
3 (else)
Pre_Mult
X
88
16
(Asrc*Csrc)
0xFF Asrc_mod
0 1
src_alpha_modulated &
~cout_simple
X
88
Fsrc'
Pre_ModAsrc
Pre_ModAdst
X
88
16
(Adst*Cdst)
255 Adst_mod
0 1
X
88
Fdst'
0 1
8'h0
pd_clear
dst_alpha_modulated &
~cout_simple
Adst
Aout selection - based on
data pipeline delay
0 1
cout_simple_p2p
dst=npdst=p
or (Csrc*Asrc Cdst*Adst)
src=np src=p
Dst Non-Pre_Mult
1 / 255 1 / 255
x 255 x 255
1 0
cout_simple_src
cout_simple_p2np
0 1
1 0
cout_simple_src
cout_simple_p2np0 1
cout_simple_p2np 1 0
16
16
clip
clip
αout
8'h0
1 2 3 0

2DCompsitionEngine

More Related Content

Viewers also liked (12)

Similar to 2DCompsitionEngine (20)

More from Shereef Shehata (19)

2DCompsitionEngine