Intel’S Larrabee

Intel’s LarrabeeVipin.p.nairS7-ECRoll no: 24CEK

IntroductionIt is a multicore general purpose graphics processor unit (GPGPU), combines the functions of multi core CPU & GPU.

Larrabee is based on Intel’s x86 architecture.Architectural convergence

FeaturesTexture filtering, rasterization, depth testing and alpha blending entirely in softwareImplement binned renderer to increase parallelism Reduced memory BandwidthParallel processing on image processing, physical simulation, medical & financial analysis.DDR5 RAM supportEach core can execute 32Gigaflops/s with 1GHz clock, results several teraflops/s speed

Differences with CPUOut of order executionVector processing unit supports 16-single precision floating point numbers at a timeTexture sampling units – trilinear /anisotropic filtering & texture decompression1024-bit ring bus between coresCache control instructions4-way multithreading

Difference with GPUx86 instruction set with Larrabee-specific extensions cache coherency across all its coresz-buffering, clipping, and blending without using graphics hardware

ArchitectureCores communicate on a 1024-bit wide ring bus - Fast access to memory, I/O interfaces and fixed function blocks - Fast access for cache coherencyL2 cache is partitioned among the cores - Provides high aggregate bandwidth - Allows data replication & sharing Optimized for highly parallel workload using vector processorIn-order CPU Core Separate scalar & vector units with separate registers

Vector unit: 16 32-bit ops/clock

In-order instruction execution

Direct connection to eachcore’s subset of the 256k L2 cachePrefetch instructions load L1and L2 caches

Vector Unit Vector complete instruction set – Scatter/gather for vector load/store – Mask registers select lanes to write, which allows data-parallel flow control – Masks also support data compaction Vector instructions support – Full speed when data in L1 cache – Fused multiply add (three arguments) – Int32, Float32 and Float64 data – Can read 8-bit unorm, 8-bit uint, 16 bit sine, 16 bit float data & convert it into 32 bit floats/ integers.

Fixed Function LogicMicro codes in place of fixed function logic for post shader alpha blending, rasterizationand interpolation.Includes fixed function texture filter logicVirtual memory for textures

Larrabee’s Binning RendererBinning pipeline– Reduces synchronization– Front end processes vertex & geometry shading– Back end processes pixel shading, stencil testing, blending– Bin FIFO between them• Multi-tasking by cores– Each orange box is a core– Cores run independently– Other cores can run othertasks, e.g. physics

Back-end Rendering a Tile• Orange boxes represent work on separate threads• Three work threads do Z, pixel shader, and blending• Setup thread reads from bins and does pre-processing• Combines task parallel, data parallel, and sequential

Pipeline can be changedParts can move between front end & back end – Vertex shading, tesselation, rasterization, etc. – Allows balancing computation vs. bandwidthNew features – Transparency, shadowing, ray tracing etc. – Each of these need irregular data structures– Also helps to be able to “repack” the data

TransparencyTransparency with & without pre-resolve effects

Examples of using TasksApplications – Scene traversal and culling – Procedural geometry synthesis – Physics contact group solve – Data parallel strand groups – Distribute across threads/cores using task system – Exploit core resources with SIMDLarrabee can submit work to itself! – Tasks can spawn other tasks – Exposed in Larrabee Native programming interface(c/c++compiler)

Scalability Studies Based on memory Bandwidth & texture filtering speedPerformance Breakdowns

Binning & Bandwidth StudiesBandwidthImmediate mode use more Bandwidth -2.4 to 7 times for F.E.A.R -1.5 to2.6 times more for Gears of War -1.6 to 1.8 times more for Half Life 2 Episode 2.

Conclusion The Larrabee architecture opens the rich set of opportunities for both graphics rendering and throughput computing and is the appropriate platform for convergence of GPU & CPU

ReferenceIEEE Digital Library- Larrabee: a many- core x86 architecture for visual computing: - Larry Seiler, Doug Carmean, Toni Juan of Intel Corporation, Jeremy Sugerman & Peter Hanrahan – Stanford UniversityIEEE spectrum January 2008ACM transactions on graphics-Article 18www.intel.comwww.wikipedia.com

Intel’S Larrabee

More Related Content

What's hot (20)

Similar to Intel’S Larrabee (20)

Intel’S Larrabee