SlideShare a Scribd company logo
Retrieval-Augmented Layout
Transformer
for Content-Aware Layout Generation
Daichi Horita, Naoto Inoue, Kotaro Kikuchi, Kota Yamaguchi,
Kiyoharu Aizawa, CVPR2024
2024/05/30
n
•
•
•
BB
n
•
•
n
•
•
→ (RAG)
[Donovan+, CHI2015]
Retrieval-Augmented Generation
Retrieval-Augmented Generation
n RAG
• (LLM)
•
• LLM
•
n
•
•
RAG
n
•
[Yang+, ACM TOMM2016]
• NN [Li+, ICLR2019] [Arroyo+,
CVPR2021]
•
• GAN
• VAE
• Diffusion Model
n
• ContentGAN [Zheng+, ACM TOG2019]
•
• CGLGAN [Zhou+, IJCAI2022]
•
n (RAG)
•
n (RALF)
論文紹介:Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
論文紹介:Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
論文紹介:Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
論文紹介:Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
論文紹介:Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
Image encoder
CGL-GAN [Zhou+, CVPR2022]
Retrieval augmentation module
Constraint encoder
Layout decoder
Transformer
decoder
n
•
• CGL [Zhou+, IJCAI2022]
• 60,548
• 1,000
• 100
• PKU [Hsu+, CVPR2023]
• 9,974
• 905
• 100
n
•
• FID↓
• (Und↓)
• (Ove↓)
•
• Occ↓
• Rea↓
n
• K K=16
•
• 350, 240
n
n
論文紹介:Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
論文紹介:Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
n (RAG)
•
n
• K
n
•
•
•
•
•
論文紹介:Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
C : S : P :
論文紹介:Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
論文紹介:Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
論文紹介:Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation

More Related Content

PDF
論文紹介:Unboxed: Geometrically and Temporally Consistent Video Outpainting
PDF
論文紹介:OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video​ Unde...
PDF
論文紹介:HOTR: End-to-End Human-Object Interaction Detection​ With Transformers, ...
PDF
論文紹介:Segment Anything, SAM2: Segment Anything in Images and Videos
PDF
論文紹介:Unbiasing through Textual Descriptions: Mitigating Representation Bias i...
PDF
論文紹介:AutoPrompt: Eliciting Knowledge from Language Models with Automatically ...
PDF
論文紹介:「Amodal Completion via Progressive Mixed Context Diffusion」「Amodal Insta...
PDF
論文紹介:「mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal La...
論文紹介:Unboxed: Geometrically and Temporally Consistent Video Outpainting
論文紹介:OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video​ Unde...
論文紹介:HOTR: End-to-End Human-Object Interaction Detection​ With Transformers, ...
論文紹介:Segment Anything, SAM2: Segment Anything in Images and Videos
論文紹介:Unbiasing through Textual Descriptions: Mitigating Representation Bias i...
論文紹介:AutoPrompt: Eliciting Knowledge from Language Models with Automatically ...
論文紹介:「Amodal Completion via Progressive Mixed Context Diffusion」「Amodal Insta...
論文紹介:「mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal La...

More from Toru Tamaki (20)

PDF
論文紹介:What, when, and where? ​Self-Supervised Spatio-Temporal Grounding​in Unt...
PDF
論文紹介:PitcherNet: Powering the Moneyball Evolution in Baseball Video Analytics
PDF
論文紹介:"Visual Genome:Connecting Language and Vision​Using Crowdsourced Dense I...
PDF
論文紹介:"InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning" ...
PDF
論文紹介:ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Stream...
PDF
論文紹介:Make Pixels Dance: High-Dynamic Video Generation
PDF
PCSJ-IMPS2024招待講演「動作認識と動画像符号化」2024年度画像符号化シンポジウム(PCSJ 2024) 2024年度映像メディア処理シンポジ...
PDF
論文紹介:T-DEED: Temporal-Discriminability Enhancer Encoder-Decoder for Precise E...
PDF
論文紹介:On Feature Normalization and Data Augmentation
PDF
論文紹介:CLIFF: Continual Latent Diffusion for Open-Vocabulary Object Detection
PDF
論文紹介:MS-DETR: Efficient DETR Training with Mixed Supervision
PDF
論文紹介:Synergy of Sight and Semantics: Visual Intention Understanding with CLIP
PDF
論文紹介:2D Pose-guided Complete Silhouette Estimation of Human Body in Occlusion
PDF
論文紹介:Multi-class Video Co-segmentation with a Generative Multi-video Model
PDF
論文紹介:DEVIAS: Learning Disentangled Video Representations of Action and Scene
PDF
時空間情報を活用する動画像理解AI技術(Nagoyaオープンイノベーション研究会)
PDF
論文紹介:TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
PDF
論文紹介:QLoRA: Efficient Finetuning of Quantized LLMs
PDF
論文紹介:DiffusionTrack: Point Set Diffusion Model for Visual Object Tracking
PDF
論文紹介:Is Appearance Free Action Recognition Possible
論文紹介:What, when, and where? ​Self-Supervised Spatio-Temporal Grounding​in Unt...
論文紹介:PitcherNet: Powering the Moneyball Evolution in Baseball Video Analytics
論文紹介:"Visual Genome:Connecting Language and Vision​Using Crowdsourced Dense I...
論文紹介:"InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning" ...
論文紹介:ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Stream...
論文紹介:Make Pixels Dance: High-Dynamic Video Generation
PCSJ-IMPS2024招待講演「動作認識と動画像符号化」2024年度画像符号化シンポジウム(PCSJ 2024) 2024年度映像メディア処理シンポジ...
論文紹介:T-DEED: Temporal-Discriminability Enhancer Encoder-Decoder for Precise E...
論文紹介:On Feature Normalization and Data Augmentation
論文紹介:CLIFF: Continual Latent Diffusion for Open-Vocabulary Object Detection
論文紹介:MS-DETR: Efficient DETR Training with Mixed Supervision
論文紹介:Synergy of Sight and Semantics: Visual Intention Understanding with CLIP
論文紹介:2D Pose-guided Complete Silhouette Estimation of Human Body in Occlusion
論文紹介:Multi-class Video Co-segmentation with a Generative Multi-video Model
論文紹介:DEVIAS: Learning Disentangled Video Representations of Action and Scene
時空間情報を活用する動画像理解AI技術(Nagoyaオープンイノベーション研究会)
論文紹介:TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
論文紹介:QLoRA: Efficient Finetuning of Quantized LLMs
論文紹介:DiffusionTrack: Point Set Diffusion Model for Visual Object Tracking
論文紹介:Is Appearance Free Action Recognition Possible
Ad

Recently uploaded (20)

PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Spectroscopy.pptx food analysis technology
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Empathic Computing: Creating Shared Understanding
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Big Data Technologies - Introduction.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Machine learning based COVID-19 study performance prediction
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
Diabetes mellitus diagnosis method based random forest with bat algorithm
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Unlocking AI with Model Context Protocol (MCP)
Spectroscopy.pptx food analysis technology
Reach Out and Touch Someone: Haptics and Empathic Computing
MIND Revenue Release Quarter 2 2025 Press Release
Spectral efficient network and resource selection model in 5G networks
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Empathic Computing: Creating Shared Understanding
20250228 LYD VKU AI Blended-Learning.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Big Data Technologies - Introduction.pptx
cuic standard and advanced reporting.pdf
Approach and Philosophy of On baking technology
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Machine learning based COVID-19 study performance prediction
Chapter 3 Spatial Domain Image Processing.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Network Security Unit 5.pdf for BCA BBA.
Dropbox Q2 2025 Financial Results & Investor Presentation
Ad

論文紹介:Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation