SlideShare a Scribd company logo
2D composition Engine
Agenda:
• Architecture Overview
BB_2DHWA Feature Summary
• Block Copy/Draw Operations
• Rotation (90/180/270 degrees) and Mirror/Flip operations
• Scaling (1/16x ~ 16x)
• Color Space and format Conversion
• Chroma Up/Down sampling
• ROP2/3 operations
• Alpha Blending/Compositing (Porter Duff Compositing)
• Destination Clipping
• Source Pattern Repeat
Image Attributes….
source base address
SrcWidth
SrcHeight
(SrcXoffset,
SrcYoffset)
SurfWidth
SurfHeight
Stride_Size
pattern SrcPatHeight
SrcPatWidth
Data Types
•LUT/MONO-1/2/4/8
•YUV (420_2,422,444)
•RGB (aRGB16/24,32)
•Component Ordering
•Pre-multiplied
•Embedded Alpha
DMA Attributes
• Base Address
• Width/Height
• Stride
• Offsets
Operation Commands (SRC)
• CSC/CHRUS
• VC-remapping
• Color Expand
• Scaling
• Rotation
• Pattern RepeatOperation Commands (DST)
• Blending/Compositing
• ROP2/3
• Clipping
• Color Fill
• CSC/CHRDS
BB_2DHWA Operation Overview
SRC-1
Image Data
SRC-2
Image Data
Alpha
(Remote/Mask)
Color Space Conv
Color Expand
Scaling
Rotate
SRC-dst
Image
Color Space Conv
Color Expand
Scaling
Rotate
SRC-dst
Image
Blending
(Compositing)
ROP-2/3
ColorFill
Clipping
Color Space Conv
DST
Image
Data
DST
Image
Data
BB_2DHWA SRC  DST types
Any Source data types  Non-sub-byte and non-LUT Dest data types
Architecture Block Diagram
bb_2dhwa
bb_2dhwa_dp_core
bb_2dhwa_dp_cntl
L4 I/F (ocp2mmr)
bb_2dhwa_vpdma
uv read
R-client
ListMgr
L3I/F
bufbuf
BIMCDMA
ARB
pmem
packed data
R-client
SCR
vbusp_m vbusp_s
bb_2dhwa_dp_dst
alpha
444
to
422/
420
Color Red
&
Dithering
C
S
C
ROP/Blend
src1src2
smem
vbusp_s
dst
cfg
alpha
R-client
uv read
R-client
uv read
W-client
packed data
W-client cfg
bb_2dhwa_dp_src
ROT
rmem
CSC
Color
Exp
V
Scaler
H
Scaler
SLmem
SAmem
cmem
420to422
YC_aligner
422to444
Lmem
VC1Range
bb_2dhwa_dp_src
ROT
rmem
CSC
Color
Exp
V
Scaler
H
Scaler
SLmem
SAmem
cmem
420to422
YC_aligner
422to444
Lmem
VC1Range
packed data
R-client
bb_2dhwa_clkc_int
INTC
vbusp_s
CLK/RST
L3
L4
VPDMA
DP_SRC
DP_SRC DP_DST
DP_CORE
BB_2DHWA Architecture Block Diagram
bb_2dhwa
bb_2dhwa_dp_core
bb_2dhwa_dp_cntl
L4 I/F (ocp2mmr)
bb_2dhwa_vpdma
uv read
R-client
ListMgr
L3I/F
bufbuf
BIMCDMA
ARB
pmem
packed data
R-client
SCR
vbusp_m vbusp_s
bb_2dhwa_dp_dst
alpha
444
to
422/
420
Color Red
&
Dithering
C
S
C
ROP/Blend
src1src2
smem
vbusp_s
dst
cfg
alpha
R-client
uv read
R-client
uv read
W-client
packed data
W-client cfg
bb_2dhwa_dp_src
ROT
rmem
CSC
Color
Exp
V
Scaler
H
Scaler
SLmem
SAmem
cmem
420to422
YC_aligner
422to444
Lmem
VC1Range
bb_2dhwa_dp_src
ROT
rmem
CSC
Color
Exp
V
Scaler
H
Scaler
SLmem
SAmem
cmem
420to422
YC_aligner
422to444
Lmem
VC1Range
packed data
R-client
bb_2dhwa_clkc_int
INTC
vbusp_s
CLK/RST
L3
L4
VPDMA
DP_SRC
DP_SRC DP_DST
DP_CORE
VPDMA FW
Initialization
List Start
Descriptor
DownLoad
Descriptor
Copied
Client
Configuration
DMA
Read Req
Src Data
Processing
Dest Data
Generation
DMA
Write Req
(List) Cmd Done
IRQ
BB_2DHWA External Interfaces
MMR
Interconnect
HP
Interconnect
Interrupt
Interconnect
Clock/Reset
Interconnect
DFT
Interconnect
Memory BIST
Interconnect
_mmr_slv
_vpdma_mst
intr
l3_clk/clkdiv
l4_clk/clkdiv
rst_main_arst_n
dft
gpi
gpo
BB_2DHWA
Core Processing Unit (dp_src)
Cmem
Color
Exp
Rotate
Engine
Rmem
V
Scaler
H
Scaler
SLmem SAmem
422
to
444
Lmem
YC_aligner
420to422
u/v
y, yuv
(a)rgb, bm
8
32
cmem_mux rmem_mux
clut_loader
argb
32
vpi_invpi_in
VC-1rangemap
uv_2x
CSC
(clut 32)
Core Processing Unit (dp_dst)
dp_dst_src_gen
src1_pipe_fifo src2_pipe_fifo
blend_pd
rop_engine
alpha_pipe_fifo
Color_Red csc
Clip_Cntl
(dst_col_fill)
chr_ds
vpi_out_y vpi_out_uv
argb 32 argb 32 alpha-1/8/32
rgb
yuv444, yuv422,
y(420), mono-8
32 8
32 32 32
u/v(420)
Terminologies
• Tile Mode
• Vslice Mode
• Chroma Expansion
Tile Mode (Rotation)
90d rotate + scale
vpi i/f
reverse blocked
reverse raster order
vpdma (src)
90d rotate
with mirror-y
scale
to
32x32 blk
vpdma
(dst)
Tile Mode (Rotation Modes)
scale + 90d rotate
vpdma (src)
90d rotate
with mirror-y
scale
vpdma
(dst)
scale + 90d rotate + mirror-y
vpi i/f
(to core)
vpdma
(dst)
scale + 90d rotate + mirror-x
vpdma (src)
90d rotate
with mirror-y
scale
vpdma (src)
90d rotate
with mirror-y
scale
vpdma
(dst)
vpi i/f
normal blocked
reverse ROW
raster order
scale
vpdma
(dst)
vpdma
TB-RL tile read
TB-RL row ordering
90d rotate
with mirror-y
scale
scale +270 rotate
Scan Order Determination FlowChart
Any Src
90/270
Rotated?
src
flip/mirror?
Rot_mir_
mode
LtUp
90
(mode 1)
270
(mode 3)
yes
overlapped
copy
x or y
axis?
LtDn RtUp
y-axis
x-axis
RtDn
copy dir
RtDn RtUp LtDn
U | UR | UL | L
D | DR | DL
R
UpRt (Tile)
RtUp
DnLt (Tile)
LtDn
no
LtUp(Tile)
LtUp
180 Rot?
yes
yes
yes
Flip (only)
RtDn (Tile)
RtDn
0
(mode 0)
x or y
axis
LtDn(Tile)
LtDn
RtUp(Tile)
RtUp
no
yes (modes 4 & 5)
x-axisy-axis
no
no
No
180
(mode 2)
DnRt(Tile)
RtDn
UpLt(Tile)
LtUp
90+mx
(mode 6)
90+my
(mode 7)
Vslice Mode
YUV420 Source Data
Or
Any Data (scale_en)
> 1020 pixels wide
src2
Vslice_tar_w
Src2_in_w
Src1_in_w
Chroma Expansion
Over-fetching extra chroma pixels and/or lines to perform
proper 420422 and/or 422444 chroma upsampling across
tile/vslice boundaries
Key Functional Processing Units
•Scaler
•Rotation Engine
•Porter-Duff Compositing Engine
•ROP engine
Scaler
L_buf(mem) for vs
or
P_buf(reg) for vs
x
accum line buf (mem)-vs
accum pix buf (reg)-hs
+
phase_in
phase_out
x
scale_f
in
out
cntl
cfg
rdy/req rdy/req
weighted
blending
Scaler (Vertical Scaler)
L_buf(mem)
x
accum (mem)
+
fin fout
x
scale_f
in
Outsrc_row+1
src_row
src_row
frag_delta_v
frag_outfrag_in
frag_in_c
-
a
b
ab
abs(a-b)
intensity
RND
/SAT
out_valid
L_buf(mem)
1
upscaling
1
1
zero
first_row_pix 0
out_valid
1
8.0
8.0
1.24
frag_out_c
5.24
8.4
12.4
12.4
12.4
5.24
8.0
5.24
8.4
8.0
5.24
Inv_Scale_f
src_row inc
tar_row inc
a-bb-aone
out_valid = (a-b)>0 or
last_row_pix & (frag_∆_v < frag_∆_thresh)
- x +
8.0s
RND/
SAT
5.24
1.24
1.24 5.24
5.24
cmp
frag_delta_thresh
0.24
scale_factor_c
5.13
1.13
5.13
5.13
9.13s RND/
SAT
TRUNC TRUNC TRUNC
1
init
1 init
Rotation Engine
1 2 3 4input tile
r_mem
data read out
rotated
output tile
Write in rotated order
(addr + 32 pixel location)
Write in un-rotated order
(addr + 1 pixel location) Write in rotated order
Read already rotated data
(addr + 1 pixel location)
Read out in rotated order
(addr + 32 pixel location)
1 2 3 4input tile
r_mem
data read out
rotated
output tile
Write in rotated order
(addr + 32 pixel location)
Write in un-rotated order
(addr + 1 pixel location) Write in rotated order
Read already rotated data
(addr + 1 pixel location)
Read out in rotated order
(addr + 32 pixel location)
Porter-Duff Compositing
Out = a*S + (1-a)*D Simple Blending
PorterDuff_Rule Selection
0x0 : CLEAR
0x1 : SRC
0x2 : DST
0x3 : SRC_OVER
0x4 : DST_OVER
0x5 : SRC_IN
0x6 : DST_IN
0x7 : SRC_OUT
0x8 : DST_OUT
0x9 : SRC_ATOP
0xA : DST_ATOP
0xB : XOR
0xC: PLUS
Porter-Duff Compositing
Porter-Duff Compositing
Porter-Duff Compositing
Porter-Duff Compositing Engine
+
16
16
1 / 255
X
88
16
X
8
24
X
88
16
X
8
16
24
0 1 1 0
+
2424
24
1 / 255
1 / 255
/
816
8
0 1
Csrc
Fsrc Asrc
SRC1
non_PreMult
SRC2
non_PreMult
Cdst
Fdst
2424
8
8
8
Cout
Non-Pre_Mult
16
16
((x<<8) + x + 256) >> 16
1 / 255 estimation
8
1 0
Csrc Cdst
Cout_simple_src
0 1
Cout_simple_no_div
0 (pd_CLEAR)
1 (Cout_simple_no_div or Cout_simpe_p2p)
2 (dst_pre_mult)
3 (else)
Pre_Mult
X
88
16
(Asrc*Csrc)
0xFF Asrc_mod
0 1
src_alpha_modulated &
~cout_simple
X
88
Fsrc'
Pre_ModAsrc
Pre_ModAdst
X
88
16
(Adst*Cdst)
255 Adst_mod
0 1
X
88
Fdst'
0 1
8'h0
pd_clear
dst_alpha_modulated &
~cout_simple
Adst
Aout selection - based on
data pipeline delay
0 1
cout_simple_p2p
dst=npdst=p
or (Csrc*Asrc Cdst*Adst)
src=np src=p
Dst Non-Pre_Mult
1 / 255 1 / 255
x 255 x 255
1 0
cout_simple_src
cout_simple_p2np
0 1
1 0
cout_simple_src
cout_simple_p2np0 1
cout_simple_p2np 1 0
16
16
clip
clip
αout
8'h0
1 2 3 0
BB_2DHWA ROP

More Related Content

PDF
Fast & Energy-Efficient Breadth-First Search on a Single NUMA System
PDF
NUMA-aware Scalable Graph Traversal on SGI UV Systems
PDF
Fast and Scalable NUMA-based Thread Parallel Breadth-first Search
PDF
NUMA-aware thread-parallel breadth-first search for Graph500 and Green Graph5...
PDF
Drawing with Quartz on iOS
PDF
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
PDF
An Improved Optimization Techniques for Parallel Prefix Adder using FPGA
PPTX
Presentación1
Fast & Energy-Efficient Breadth-First Search on a Single NUMA System
NUMA-aware Scalable Graph Traversal on SGI UV Systems
Fast and Scalable NUMA-based Thread Parallel Breadth-first Search
NUMA-aware thread-parallel breadth-first search for Graph500 and Green Graph5...
Drawing with Quartz on iOS
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
An Improved Optimization Techniques for Parallel Prefix Adder using FPGA
Presentación1

Viewers also liked (12)

PPTX
4th. # 1
PPTX
2016 03 upm
PDF
Comunicado prensa planes_viales
DOC
Otero barnes lesson 5-7 (2)
PDF
SEO Your Resume Bypass HR with Keyword Optimization
PDF
Book315
PPT
Antavilla School Olimpiadas 2013
PPT
Spec template and mapping to derivatives of a product
PDF
Profil Pasangan Calon : Benyamin Sudarmadi - Haji Mustangid
DOCX
The top 250 most difficult sat words
PPTX
Tipos de erupciones volcanicas
PDF
Jardines japoneses
4th. # 1
2016 03 upm
Comunicado prensa planes_viales
Otero barnes lesson 5-7 (2)
SEO Your Resume Bypass HR with Keyword Optimization
Book315
Antavilla School Olimpiadas 2013
Spec template and mapping to derivatives of a product
Profil Pasangan Calon : Benyamin Sudarmadi - Haji Mustangid
The top 250 most difficult sat words
Tipos de erupciones volcanicas
Jardines japoneses
Ad

Similar to 2DCompsitionEngine (20)

PDF
Hpg2011 papers kazakov
PDF
Advanced Scenegraph Rendering Pipeline
PPT
D3 D10 Unleashed New Features And Effects
PDF
2020 icldla-updated
PPTX
Optimizing the Graphics Pipeline with Compute, GDC 2016
PPSX
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
PDF
The State of the GeoServer project
PDF
Qemu JIT Code Generator and System Emulation
PPSX
Introduction to Direct 3D 12 by Ivan Nevraev
PDF
7nm "Navi" GPU - A GPU Built For Performance
 
PDF
Anatomy of ROCgdb presentation at gcc cauldron 2022
PDF
State of GeoServer at FOSS4G-NA
PPTX
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
PPTX
COSCUP 2020 RISC-V 32 bit linux highmem porting
PPTX
Gpu with cuda architecture
PDF
Understanding low latency jvm gcs V2
PPTX
Triangle Visibility buffer
PPT
An Introduction to NV_path_rendering
PDF
Graph computation
ODP
RailswayCon 2010 - Dynamic Language VMs
Hpg2011 papers kazakov
Advanced Scenegraph Rendering Pipeline
D3 D10 Unleashed New Features And Effects
2020 icldla-updated
Optimizing the Graphics Pipeline with Compute, GDC 2016
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
The State of the GeoServer project
Qemu JIT Code Generator and System Emulation
Introduction to Direct 3D 12 by Ivan Nevraev
7nm "Navi" GPU - A GPU Built For Performance
 
Anatomy of ROCgdb presentation at gcc cauldron 2022
State of GeoServer at FOSS4G-NA
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
COSCUP 2020 RISC-V 32 bit linux highmem porting
Gpu with cuda architecture
Understanding low latency jvm gcs V2
Triangle Visibility buffer
An Introduction to NV_path_rendering
Graph computation
RailswayCon 2010 - Dynamic Language VMs
Ad

More from Shereef Shehata (19)

PDF
Windows_Scaling_2X_Speedup
PDF
2D_block_scaling_Software
PDF
2D_BLIT_software_Blackness
PDF
CIECAM02_Color_Management
PDF
Deblocking_Filter_v2
PDF
log_algorithm
PDF
Temporal_video_noise_reduction
PDF
Shereef_Color_Processing
PDF
Inertial_Sensors
PDF
magentometers
PDF
Shereef_MP3_decoder
PDF
Fusion_Class
PDF
Gyroscope_sensors
PDF
Block_Scaler_Control
PDF
2D_BitBlt_Scale
PDF
xvYCC_RGB
PDF
The_Mismatch_Noise_Cancellation_Architecture
PDF
Architectural_Synthesis_for_DSP_Structured_Datapaths
PDF
High_Level_Synthesis_of_DSP_Archiectures_Targeting_FPGAs
Windows_Scaling_2X_Speedup
2D_block_scaling_Software
2D_BLIT_software_Blackness
CIECAM02_Color_Management
Deblocking_Filter_v2
log_algorithm
Temporal_video_noise_reduction
Shereef_Color_Processing
Inertial_Sensors
magentometers
Shereef_MP3_decoder
Fusion_Class
Gyroscope_sensors
Block_Scaler_Control
2D_BitBlt_Scale
xvYCC_RGB
The_Mismatch_Noise_Cancellation_Architecture
Architectural_Synthesis_for_DSP_Structured_Datapaths
High_Level_Synthesis_of_DSP_Archiectures_Targeting_FPGAs

2DCompsitionEngine

  • 3. BB_2DHWA Feature Summary • Block Copy/Draw Operations • Rotation (90/180/270 degrees) and Mirror/Flip operations • Scaling (1/16x ~ 16x) • Color Space and format Conversion • Chroma Up/Down sampling • ROP2/3 operations • Alpha Blending/Compositing (Porter Duff Compositing) • Destination Clipping • Source Pattern Repeat
  • 4. Image Attributes…. source base address SrcWidth SrcHeight (SrcXoffset, SrcYoffset) SurfWidth SurfHeight Stride_Size pattern SrcPatHeight SrcPatWidth Data Types •LUT/MONO-1/2/4/8 •YUV (420_2,422,444) •RGB (aRGB16/24,32) •Component Ordering •Pre-multiplied •Embedded Alpha DMA Attributes • Base Address • Width/Height • Stride • Offsets Operation Commands (SRC) • CSC/CHRUS • VC-remapping • Color Expand • Scaling • Rotation • Pattern RepeatOperation Commands (DST) • Blending/Compositing • ROP2/3 • Clipping • Color Fill • CSC/CHRDS
  • 5. BB_2DHWA Operation Overview SRC-1 Image Data SRC-2 Image Data Alpha (Remote/Mask) Color Space Conv Color Expand Scaling Rotate SRC-dst Image Color Space Conv Color Expand Scaling Rotate SRC-dst Image Blending (Compositing) ROP-2/3 ColorFill Clipping Color Space Conv DST Image Data DST Image Data
  • 6. BB_2DHWA SRC  DST types Any Source data types  Non-sub-byte and non-LUT Dest data types
  • 7. Architecture Block Diagram bb_2dhwa bb_2dhwa_dp_core bb_2dhwa_dp_cntl L4 I/F (ocp2mmr) bb_2dhwa_vpdma uv read R-client ListMgr L3I/F bufbuf BIMCDMA ARB pmem packed data R-client SCR vbusp_m vbusp_s bb_2dhwa_dp_dst alpha 444 to 422/ 420 Color Red & Dithering C S C ROP/Blend src1src2 smem vbusp_s dst cfg alpha R-client uv read R-client uv read W-client packed data W-client cfg bb_2dhwa_dp_src ROT rmem CSC Color Exp V Scaler H Scaler SLmem SAmem cmem 420to422 YC_aligner 422to444 Lmem VC1Range bb_2dhwa_dp_src ROT rmem CSC Color Exp V Scaler H Scaler SLmem SAmem cmem 420to422 YC_aligner 422to444 Lmem VC1Range packed data R-client bb_2dhwa_clkc_int INTC vbusp_s CLK/RST L3 L4 VPDMA DP_SRC DP_SRC DP_DST DP_CORE
  • 8. BB_2DHWA Architecture Block Diagram bb_2dhwa bb_2dhwa_dp_core bb_2dhwa_dp_cntl L4 I/F (ocp2mmr) bb_2dhwa_vpdma uv read R-client ListMgr L3I/F bufbuf BIMCDMA ARB pmem packed data R-client SCR vbusp_m vbusp_s bb_2dhwa_dp_dst alpha 444 to 422/ 420 Color Red & Dithering C S C ROP/Blend src1src2 smem vbusp_s dst cfg alpha R-client uv read R-client uv read W-client packed data W-client cfg bb_2dhwa_dp_src ROT rmem CSC Color Exp V Scaler H Scaler SLmem SAmem cmem 420to422 YC_aligner 422to444 Lmem VC1Range bb_2dhwa_dp_src ROT rmem CSC Color Exp V Scaler H Scaler SLmem SAmem cmem 420to422 YC_aligner 422to444 Lmem VC1Range packed data R-client bb_2dhwa_clkc_int INTC vbusp_s CLK/RST L3 L4 VPDMA DP_SRC DP_SRC DP_DST DP_CORE VPDMA FW Initialization List Start Descriptor DownLoad Descriptor Copied Client Configuration DMA Read Req Src Data Processing Dest Data Generation DMA Write Req (List) Cmd Done IRQ
  • 9. BB_2DHWA External Interfaces MMR Interconnect HP Interconnect Interrupt Interconnect Clock/Reset Interconnect DFT Interconnect Memory BIST Interconnect _mmr_slv _vpdma_mst intr l3_clk/clkdiv l4_clk/clkdiv rst_main_arst_n dft gpi gpo BB_2DHWA
  • 10. Core Processing Unit (dp_src) Cmem Color Exp Rotate Engine Rmem V Scaler H Scaler SLmem SAmem 422 to 444 Lmem YC_aligner 420to422 u/v y, yuv (a)rgb, bm 8 32 cmem_mux rmem_mux clut_loader argb 32 vpi_invpi_in VC-1rangemap uv_2x CSC (clut 32)
  • 11. Core Processing Unit (dp_dst) dp_dst_src_gen src1_pipe_fifo src2_pipe_fifo blend_pd rop_engine alpha_pipe_fifo Color_Red csc Clip_Cntl (dst_col_fill) chr_ds vpi_out_y vpi_out_uv argb 32 argb 32 alpha-1/8/32 rgb yuv444, yuv422, y(420), mono-8 32 8 32 32 32 u/v(420)
  • 12. Terminologies • Tile Mode • Vslice Mode • Chroma Expansion
  • 13. Tile Mode (Rotation) 90d rotate + scale vpi i/f reverse blocked reverse raster order vpdma (src) 90d rotate with mirror-y scale to 32x32 blk vpdma (dst)
  • 14. Tile Mode (Rotation Modes) scale + 90d rotate vpdma (src) 90d rotate with mirror-y scale vpdma (dst) scale + 90d rotate + mirror-y vpi i/f (to core) vpdma (dst) scale + 90d rotate + mirror-x vpdma (src) 90d rotate with mirror-y scale vpdma (src) 90d rotate with mirror-y scale vpdma (dst) vpi i/f normal blocked reverse ROW raster order scale vpdma (dst) vpdma TB-RL tile read TB-RL row ordering 90d rotate with mirror-y scale scale +270 rotate
  • 15. Scan Order Determination FlowChart Any Src 90/270 Rotated? src flip/mirror? Rot_mir_ mode LtUp 90 (mode 1) 270 (mode 3) yes overlapped copy x or y axis? LtDn RtUp y-axis x-axis RtDn copy dir RtDn RtUp LtDn U | UR | UL | L D | DR | DL R UpRt (Tile) RtUp DnLt (Tile) LtDn no LtUp(Tile) LtUp 180 Rot? yes yes yes Flip (only) RtDn (Tile) RtDn 0 (mode 0) x or y axis LtDn(Tile) LtDn RtUp(Tile) RtUp no yes (modes 4 & 5) x-axisy-axis no no No 180 (mode 2) DnRt(Tile) RtDn UpLt(Tile) LtUp 90+mx (mode 6) 90+my (mode 7)
  • 16. Vslice Mode YUV420 Source Data Or Any Data (scale_en) > 1020 pixels wide src2 Vslice_tar_w Src2_in_w Src1_in_w
  • 17. Chroma Expansion Over-fetching extra chroma pixels and/or lines to perform proper 420422 and/or 422444 chroma upsampling across tile/vslice boundaries
  • 18. Key Functional Processing Units •Scaler •Rotation Engine •Porter-Duff Compositing Engine •ROP engine
  • 19. Scaler L_buf(mem) for vs or P_buf(reg) for vs x accum line buf (mem)-vs accum pix buf (reg)-hs + phase_in phase_out x scale_f in out cntl cfg rdy/req rdy/req weighted blending
  • 20. Scaler (Vertical Scaler) L_buf(mem) x accum (mem) + fin fout x scale_f in Outsrc_row+1 src_row src_row frag_delta_v frag_outfrag_in frag_in_c - a b ab abs(a-b) intensity RND /SAT out_valid L_buf(mem) 1 upscaling 1 1 zero first_row_pix 0 out_valid 1 8.0 8.0 1.24 frag_out_c 5.24 8.4 12.4 12.4 12.4 5.24 8.0 5.24 8.4 8.0 5.24 Inv_Scale_f src_row inc tar_row inc a-bb-aone out_valid = (a-b)>0 or last_row_pix & (frag_∆_v < frag_∆_thresh) - x + 8.0s RND/ SAT 5.24 1.24 1.24 5.24 5.24 cmp frag_delta_thresh 0.24 scale_factor_c 5.13 1.13 5.13 5.13 9.13s RND/ SAT TRUNC TRUNC TRUNC 1 init 1 init
  • 21. Rotation Engine 1 2 3 4input tile r_mem data read out rotated output tile Write in rotated order (addr + 32 pixel location) Write in un-rotated order (addr + 1 pixel location) Write in rotated order Read already rotated data (addr + 1 pixel location) Read out in rotated order (addr + 32 pixel location) 1 2 3 4input tile r_mem data read out rotated output tile Write in rotated order (addr + 32 pixel location) Write in un-rotated order (addr + 1 pixel location) Write in rotated order Read already rotated data (addr + 1 pixel location) Read out in rotated order (addr + 32 pixel location)
  • 22. Porter-Duff Compositing Out = a*S + (1-a)*D Simple Blending PorterDuff_Rule Selection 0x0 : CLEAR 0x1 : SRC 0x2 : DST 0x3 : SRC_OVER 0x4 : DST_OVER 0x5 : SRC_IN 0x6 : DST_IN 0x7 : SRC_OUT 0x8 : DST_OUT 0x9 : SRC_ATOP 0xA : DST_ATOP 0xB : XOR 0xC: PLUS Porter-Duff Compositing
  • 24. Porter-Duff Compositing Engine + 16 16 1 / 255 X 88 16 X 8 24 X 88 16 X 8 16 24 0 1 1 0 + 2424 24 1 / 255 1 / 255 / 816 8 0 1 Csrc Fsrc Asrc SRC1 non_PreMult SRC2 non_PreMult Cdst Fdst 2424 8 8 8 Cout Non-Pre_Mult 16 16 ((x<<8) + x + 256) >> 16 1 / 255 estimation 8 1 0 Csrc Cdst Cout_simple_src 0 1 Cout_simple_no_div 0 (pd_CLEAR) 1 (Cout_simple_no_div or Cout_simpe_p2p) 2 (dst_pre_mult) 3 (else) Pre_Mult X 88 16 (Asrc*Csrc) 0xFF Asrc_mod 0 1 src_alpha_modulated & ~cout_simple X 88 Fsrc' Pre_ModAsrc Pre_ModAdst X 88 16 (Adst*Cdst) 255 Adst_mod 0 1 X 88 Fdst' 0 1 8'h0 pd_clear dst_alpha_modulated & ~cout_simple Adst Aout selection - based on data pipeline delay 0 1 cout_simple_p2p dst=npdst=p or (Csrc*Asrc Cdst*Adst) src=np src=p Dst Non-Pre_Mult 1 / 255 1 / 255 x 255 x 255 1 0 cout_simple_src cout_simple_p2np 0 1 1 0 cout_simple_src cout_simple_p2np0 1 cout_simple_p2np 1 0 16 16 clip clip αout 8'h0 1 2 3 0