SlideShare a Scribd company logo
Make Your Own Ray
Tracing GPU with
FPGA
COSCUP 2023
Owen Wu
fallingcat@gmail.com
Agenda
● Self-Intro
● Session Overview
● How to Start
○ HDL
○ EDA
○ FPGA
● Think Differently
● Ray Tracing
● HomebrewGPU projrect
Self-Intro
Self-Intro
● Game developer
○ Game engine development
○ PC
○ Console
○ Mobile
● GPU software engineer
○ Optimization from the perspective of hardware
○ AMD
○ Arm
● https://tinyurl.com/owenwu
Session Overview
Session Overview
● This session is for software engineer
● This session is NOT for hardware engineer
● Basic intro for the beginner
● Making a chip is very easy and cheap nowaday
○ 6 months from zero to a workable GPU
● Turn you algorithm into hardware, think
differently
● Many open sourced projects on GitHub
How to Start
How to Start
● HDL (Haardware Description Language)
○ Language
● EDA (Electronic Design Automation)
○ Compiler
● FPGA (Field Programmable Gate Array)
○ Hardware
HDL
Hardware Description
Language
HDL
● VHDL
○ Ada
● Verilog
○ C
● Connect modules to design a whole chip
● Clock is the way to sync all modules
○ 100M Hz - generate 100M signals in one second
● You can think module as a function in C
● Every module can only do very limited works
○ Works need to be finished in one clock
● Books for begineer
○ Programming FPGAs: Getting Started with Verilog
○ Introduction to Verilog
EDA
Electronic Design
Automation
EDA
● Convert HDL to bitstream file
● Upload bitstream file to FPGA to execute
● Many steps
○ IC Design
○ Synthesis
○ Verification
○ Physical Design
● You can think EDA as a compiler
● FPGA makers provide basic EDA for free
○ Xilinx Vivado
● You can also try EDA online
○ https://8bitworkshop.com/v3.10.1/?platform=verilog&file=cloc
k_divider.v
FPGA
Field Programmable Gate
Array
FPGA
● Upload bitstream file to configure logic blocks
● FPGA development board integrate many components
○ VGA/HDMI output
○ Memory
○ LED
○ 7 segment disply
○ SD Card
○ SoC
● FPGA has different number of logic cells
○ Which decides how complex the design can be
● Elbert V2 for beginner
● Nexys A7 for more complex design
Think Differently
Parallel
● Software is serial
● Hardware is parallel
● Every modules work simultaneously
● Software optimizatios may not work with
hardware
● Don’t use software thinking when designing
hardware
B()
MUX
A()
R
C
Latency v.s. Throughput
● CPU performance depends on latency
● Low latency means that the instruction can be completed quickly
● Instruction has order dependency
● GPU only care how long it takes to finish a frame
● Pixel doesn’t have order dependency
● Pixels can be executed simultaneously
● If the latency of one pixel is 32 clock
● The hardware executes 64 streams simultaneously
● The throughput will be 2 pixel per clock
Pipeline
● Hardware has many different modules
● All modules need to work simultaneously to get
the best performance
● Use pipeline to split the tasks
● Every module process different pixel at the
same time
Surface (5 clk) Shadow (5 clk) Shading (5 clk)
15 clk/pixel
Surface Shadow Shading
pixel 1
pixel2 pixel 1
pixel3 pixel2 pixel 1
pixel 4 pixel 3 pixel 2
5 clk/pixel
clk 0
clk 5
clk 10
clk 15
Ray Tracing
Ray Tracing
● Ray tracing is easy to implement
● Ray hit objects then reflect to camera
● Invert the ray
● Cast a ray from each pixel of the screen
● Find a closest hit of ray and objects
● Decide the color of the pixel
○ If there is a hit, the color of hit object
○ If there is no hit, the color of background
● Ray Tracing in One Weekend at GitHub
Ray Tracing - Reflection
● For the hit on a object
● If the object is reflective
● Cast a reflection ray from hit point
● Find the closet hit of the reflection ray
● Recursively cast reflection rays
● Blend the colosr of reflected objects into
final shading
Ray Tracing - Shadow
● For the hit on a object
● Cast a new ray from hit
point toward light
● Is there is any hit of the new
ray
● If yes, the pixel is in shadow
● Otherwise the pixel is not in
shadow
BVH(Bounding Volume Hierarchy)
● Accelerate the hit detection between ray and primitives
○ Quickly exclude the nodes which don’t have intersection
● Use AABB(Axis-aligned Bounding Box) to split the space
● Traverse the AABB until reach the leaf
● Detect the hits of ray between the primitive in leaf
AABB
AABB AABB
AABB AABB
Left
Node
Right
Node
HomebrewGPU Project
HomebrewGPU project
● Open sourced project
○ https://github.com/fallingcat/HomebrewGPU
● Implement a basic ray tracing GPU
○ Voxel based rendering
○ BVH acceleration
○ Shading/Reflection/Refraction/Shadow
● 6 months from zero to complete
HomebrewGPU project
Architecture
Architecture
● Thread Generator
○ Generate one thread per clock for each ray core
○ Each thread presents one pixel
○ The thread will go through ray core and output the final color
● BVH Structure
○ BVH structure stores the BVH tree structure data
○ Accepts the node or leaf query from ray core
○ Output the node or leaf data to ray core
● Primitive Unit
○ Primitive Unit stores the raw data of all primitives
○ Accepts the query from ray core and output primitive data
● Ray Core
○ Ray core process one thread to output the final color
○ Accepts the thread from thread generator or reflection/refraction
ray
● Frame Buffer Writer
○ Cache the output of ray cores and write the pixel to frame buffer
○ Some threads with reflection/refraction take longer to get the final
color
○ Wait util all threads in one cache set are finished then write the
data to the frame buffer
Architecture
● Surface stage
○ Process the ray from camera and find the closest hit of the ray
○ Pass the hit information to next stage
● Shadow stage
○ Cast a ray from closest hit position to light source
○ It will pass the shadow information to next stage
● Shade stage
○ Use the closest hit information to decide if it's the final color or
reflection/refraction will occur
○ Cast a reflection/refraction ray and pass the data back to surface
stage
○ Recursively feed back to surface stage
Design Verification
Q & A
● Owen (fallingcat@gmail.com)
● https://github.com/fallingcat/Ho
mebrewGPU
Thanks!

More Related Content

PPT
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
PPTX
Physically Based and Unified Volumetric Rendering in Frostbite
PDF
Dissecting the Rendering of The Surge
PDF
輪読発表資料: Efficient Virtual Shadow Maps for Many Lights
PPT
Star Ocean 4 - Flexible Shader Managment and Post-processing
PPTX
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
PDF
GPU最適化入門
PPT
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Physically Based and Unified Volumetric Rendering in Frostbite
Dissecting the Rendering of The Surge
輪読発表資料: Efficient Virtual Shadow Maps for Many Lights
Star Ocean 4 - Flexible Shader Managment and Post-processing
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
GPU最適化入門
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)

What's hot (20)

PPTX
Rendering Technologies from Crysis 3 (GDC 2013)
PPTX
なぜなにリアルタイムレンダリング
PPSX
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
PDF
Screen Space Decals in Warhammer 40,000: Space Marine
PDF
Mask Material only in Early Z-passの効果と仕組み
PDF
Bindless Deferred Decals in The Surge 2
PDF
シェーダーを活用した3Dライブ演出のアップデート ~『ラブライブ!スクールアイドルフェスティバル ALL STARS』(スクスタ)の開発事例~​
PDF
Penner pre-integrated skin rendering (siggraph 2011 advances in real-time r...
PPT
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
PPTX
Optimizing the Graphics Pipeline with Compute, GDC 2016
PDF
【Unite Tokyo 2019】Unityプログレッシブライトマッパー2019
PDF
More explosions, more chaos, and definitely more blowing stuff up
PDF
Screen Space Reflections in The Surge
PPTX
Filmic Tonemapping for Real-time Rendering - Siggraph 2010 Color Course
PDF
Juceで作るオーディオアプリケーション
PDF
CEDEC 2020 - 高品質かつ低負荷な3Dライブを実現するシェーダー開発 ~『ラブライブ!スクールアイドルフェスティバル ALL STARS』(スク...
PPT
Paris Master Class 2011 - 07 Dynamic Global Illumination
PDF
先進的なルックデベロップメント
PDF
シリコンスタジオの最新テクノロジーデモ技術解説
PDF
【Unite Tokyo 2019】SRPで一から描画フローを作ってみた! ~Unity描画フローからの脱却~
Rendering Technologies from Crysis 3 (GDC 2013)
なぜなにリアルタイムレンダリング
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Screen Space Decals in Warhammer 40,000: Space Marine
Mask Material only in Early Z-passの効果と仕組み
Bindless Deferred Decals in The Surge 2
シェーダーを活用した3Dライブ演出のアップデート ~『ラブライブ!スクールアイドルフェスティバル ALL STARS』(スクスタ)の開発事例~​
Penner pre-integrated skin rendering (siggraph 2011 advances in real-time r...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
Optimizing the Graphics Pipeline with Compute, GDC 2016
【Unite Tokyo 2019】Unityプログレッシブライトマッパー2019
More explosions, more chaos, and definitely more blowing stuff up
Screen Space Reflections in The Surge
Filmic Tonemapping for Real-time Rendering - Siggraph 2010 Color Course
Juceで作るオーディオアプリケーション
CEDEC 2020 - 高品質かつ低負荷な3Dライブを実現するシェーダー開発 ~『ラブライブ!スクールアイドルフェスティバル ALL STARS』(スク...
Paris Master Class 2011 - 07 Dynamic Global Illumination
先進的なルックデベロップメント
シリコンスタジオの最新テクノロジーデモ技術解説
【Unite Tokyo 2019】SRPで一から描画フローを作ってみた! ~Unity描画フローからの脱却~
Ad

Similar to COSCUP 2023 - Make Your Own Ray Tracing GPU with FPGA (20)

PDF
CUDA by Example : Constant Memory and Events : Notes
PPTX
Graphics Processing unit ppt
KEY
Why Graphics Is Fast, and What It Can Teach Us About Parallel Programming
PPTX
GPU Computing: A brief overview
PPTX
Penn graphics
PDF
Can FPGAs Compete with GPUs?
PPT
Programmable Piplelines
PDF
Open CL For Haifa Linux Club
PDF
The Explanation the Pipeline design strategy.pdf
PPTX
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
PDF
thesis
PDF
Computing using GPUs
PPTX
General Purpose Computing using Graphics Hardware
PDF
GPU - how can we use it?
PDF
Transformation and dynamic visualization of images from computer through an F...
PPT
Hardware Shaders
PDF
From Experimentation to Production: The Future of WebGL
PDF
tybsc cs game programming Introduction-to-GPUs.pdf
PDF
Debugging GPU faults: QoL tools for your driver – XDC 2023
PPTX
Implementing a modern, RenderMan compliant, REYES renderer
CUDA by Example : Constant Memory and Events : Notes
Graphics Processing unit ppt
Why Graphics Is Fast, and What It Can Teach Us About Parallel Programming
GPU Computing: A brief overview
Penn graphics
Can FPGAs Compete with GPUs?
Programmable Piplelines
Open CL For Haifa Linux Club
The Explanation the Pipeline design strategy.pdf
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
thesis
Computing using GPUs
General Purpose Computing using Graphics Hardware
GPU - how can we use it?
Transformation and dynamic visualization of images from computer through an F...
Hardware Shaders
From Experimentation to Production: The Future of WebGL
tybsc cs game programming Introduction-to-GPUs.pdf
Debugging GPU faults: QoL tools for your driver – XDC 2023
Implementing a modern, RenderMan compliant, REYES renderer
Ad

More from Owen Wu (9)

PPT
TGDF 2024 Unreal Lumen with Arm Immortalis : The Best Practices of Ray Tracin...
PPTX
Unreal Fest 2023 - Lumen with Immortalis
PPT
Unity mobile game performance profiling – using arm mobile studio
PPTX
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
PPTX
[TGDF 2020] Mobile Graphics Best Practices for Artist
PPTX
[Unity Forum 2019] Mobile Graphics Optimization Guides
PPTX
[TGDF 2019] Mali GPU Architecture and Mobile Studio
PPSX
[GDC 2012] Enhancing Graphics in Unreal Engine 3 Titles Using AMD Code Submis...
PPT
[TGDF 2014] 進階Shader技術
TGDF 2024 Unreal Lumen with Arm Immortalis : The Best Practices of Ray Tracin...
Unreal Fest 2023 - Lumen with Immortalis
Unity mobile game performance profiling – using arm mobile studio
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
[TGDF 2020] Mobile Graphics Best Practices for Artist
[Unity Forum 2019] Mobile Graphics Optimization Guides
[TGDF 2019] Mali GPU Architecture and Mobile Studio
[GDC 2012] Enhancing Graphics in Unreal Engine 3 Titles Using AMD Code Submis...
[TGDF 2014] 進階Shader技術

Recently uploaded (20)

PDF
R24 SURVEYING LAB MANUAL for civil enggi
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
PPTX
Nature of X-rays, X- Ray Equipment, Fluoroscopy
PDF
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
PDF
COURSE DESCRIPTOR OF SURVEYING R24 SYLLABUS
PDF
Integrating Fractal Dimension and Time Series Analysis for Optimized Hyperspe...
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
PDF
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
PPTX
UNIT 4 Total Quality Management .pptx
PDF
III.4.1.2_The_Space_Environment.p pdffdf
PDF
Abrasive, erosive and cavitation wear.pdf
PDF
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
PPTX
Artificial Intelligence
PDF
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
PPTX
UNIT - 3 Total quality Management .pptx
PPTX
Safety Seminar civil to be ensured for safe working.
PDF
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
737-MAX_SRG.pdf student reference guides
R24 SURVEYING LAB MANUAL for civil enggi
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
Nature of X-rays, X- Ray Equipment, Fluoroscopy
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
COURSE DESCRIPTOR OF SURVEYING R24 SYLLABUS
Integrating Fractal Dimension and Time Series Analysis for Optimized Hyperspe...
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
UNIT 4 Total Quality Management .pptx
III.4.1.2_The_Space_Environment.p pdffdf
Abrasive, erosive and cavitation wear.pdf
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
Artificial Intelligence
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
UNIT - 3 Total quality Management .pptx
Safety Seminar civil to be ensured for safe working.
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
Automation-in-Manufacturing-Chapter-Introduction.pdf
737-MAX_SRG.pdf student reference guides

COSCUP 2023 - Make Your Own Ray Tracing GPU with FPGA

  • 1. Make Your Own Ray Tracing GPU with FPGA COSCUP 2023 Owen Wu fallingcat@gmail.com
  • 2. Agenda ● Self-Intro ● Session Overview ● How to Start ○ HDL ○ EDA ○ FPGA ● Think Differently ● Ray Tracing ● HomebrewGPU projrect
  • 4. Self-Intro ● Game developer ○ Game engine development ○ PC ○ Console ○ Mobile ● GPU software engineer ○ Optimization from the perspective of hardware ○ AMD ○ Arm ● https://tinyurl.com/owenwu
  • 6. Session Overview ● This session is for software engineer ● This session is NOT for hardware engineer ● Basic intro for the beginner ● Making a chip is very easy and cheap nowaday ○ 6 months from zero to a workable GPU ● Turn you algorithm into hardware, think differently ● Many open sourced projects on GitHub
  • 8. How to Start ● HDL (Haardware Description Language) ○ Language ● EDA (Electronic Design Automation) ○ Compiler ● FPGA (Field Programmable Gate Array) ○ Hardware
  • 10. HDL ● VHDL ○ Ada ● Verilog ○ C ● Connect modules to design a whole chip ● Clock is the way to sync all modules ○ 100M Hz - generate 100M signals in one second ● You can think module as a function in C ● Every module can only do very limited works ○ Works need to be finished in one clock ● Books for begineer ○ Programming FPGAs: Getting Started with Verilog ○ Introduction to Verilog
  • 12. EDA ● Convert HDL to bitstream file ● Upload bitstream file to FPGA to execute ● Many steps ○ IC Design ○ Synthesis ○ Verification ○ Physical Design ● You can think EDA as a compiler ● FPGA makers provide basic EDA for free ○ Xilinx Vivado ● You can also try EDA online ○ https://8bitworkshop.com/v3.10.1/?platform=verilog&file=cloc k_divider.v
  • 14. FPGA ● Upload bitstream file to configure logic blocks ● FPGA development board integrate many components ○ VGA/HDMI output ○ Memory ○ LED ○ 7 segment disply ○ SD Card ○ SoC ● FPGA has different number of logic cells ○ Which decides how complex the design can be ● Elbert V2 for beginner ● Nexys A7 for more complex design
  • 16. Parallel ● Software is serial ● Hardware is parallel ● Every modules work simultaneously ● Software optimizatios may not work with hardware ● Don’t use software thinking when designing hardware B() MUX A() R C
  • 17. Latency v.s. Throughput ● CPU performance depends on latency ● Low latency means that the instruction can be completed quickly ● Instruction has order dependency ● GPU only care how long it takes to finish a frame ● Pixel doesn’t have order dependency ● Pixels can be executed simultaneously ● If the latency of one pixel is 32 clock ● The hardware executes 64 streams simultaneously ● The throughput will be 2 pixel per clock
  • 18. Pipeline ● Hardware has many different modules ● All modules need to work simultaneously to get the best performance ● Use pipeline to split the tasks ● Every module process different pixel at the same time Surface (5 clk) Shadow (5 clk) Shading (5 clk) 15 clk/pixel Surface Shadow Shading pixel 1 pixel2 pixel 1 pixel3 pixel2 pixel 1 pixel 4 pixel 3 pixel 2 5 clk/pixel clk 0 clk 5 clk 10 clk 15
  • 20. Ray Tracing ● Ray tracing is easy to implement ● Ray hit objects then reflect to camera ● Invert the ray ● Cast a ray from each pixel of the screen ● Find a closest hit of ray and objects ● Decide the color of the pixel ○ If there is a hit, the color of hit object ○ If there is no hit, the color of background ● Ray Tracing in One Weekend at GitHub
  • 21. Ray Tracing - Reflection ● For the hit on a object ● If the object is reflective ● Cast a reflection ray from hit point ● Find the closet hit of the reflection ray ● Recursively cast reflection rays ● Blend the colosr of reflected objects into final shading
  • 22. Ray Tracing - Shadow ● For the hit on a object ● Cast a new ray from hit point toward light ● Is there is any hit of the new ray ● If yes, the pixel is in shadow ● Otherwise the pixel is not in shadow
  • 23. BVH(Bounding Volume Hierarchy) ● Accelerate the hit detection between ray and primitives ○ Quickly exclude the nodes which don’t have intersection ● Use AABB(Axis-aligned Bounding Box) to split the space ● Traverse the AABB until reach the leaf ● Detect the hits of ray between the primitive in leaf AABB AABB AABB AABB AABB Left Node Right Node
  • 25. HomebrewGPU project ● Open sourced project ○ https://github.com/fallingcat/HomebrewGPU ● Implement a basic ray tracing GPU ○ Voxel based rendering ○ BVH acceleration ○ Shading/Reflection/Refraction/Shadow ● 6 months from zero to complete
  • 28. Architecture ● Thread Generator ○ Generate one thread per clock for each ray core ○ Each thread presents one pixel ○ The thread will go through ray core and output the final color ● BVH Structure ○ BVH structure stores the BVH tree structure data ○ Accepts the node or leaf query from ray core ○ Output the node or leaf data to ray core ● Primitive Unit ○ Primitive Unit stores the raw data of all primitives ○ Accepts the query from ray core and output primitive data ● Ray Core ○ Ray core process one thread to output the final color ○ Accepts the thread from thread generator or reflection/refraction ray ● Frame Buffer Writer ○ Cache the output of ray cores and write the pixel to frame buffer ○ Some threads with reflection/refraction take longer to get the final color ○ Wait util all threads in one cache set are finished then write the data to the frame buffer
  • 29. Architecture ● Surface stage ○ Process the ray from camera and find the closest hit of the ray ○ Pass the hit information to next stage ● Shadow stage ○ Cast a ray from closest hit position to light source ○ It will pass the shadow information to next stage ● Shade stage ○ Use the closest hit information to decide if it's the final color or reflection/refraction will occur ○ Cast a reflection/refraction ray and pass the data back to surface stage ○ Recursively feed back to surface stage
  • 31. Q & A ● Owen (fallingcat@gmail.com) ● https://github.com/fallingcat/Ho mebrewGPU