Wednesday, October 15, 2008

Larrabee after SIGGRAPH 2008

I made a foundation post on Larrabee on my Flight Sim Blog last March here.

Since then SIGGRAPH 2008 happened and a good amount of Larrabee information was released.

The Larrabee hardware paper here.

The Beyond Programmable Shading course, with links to all the speakers papers here.

The ACM Queue issue on GPUs here isnt Larrabee focused, and is instead on GPU architectures and is thus interesting nonetheless.

Oh, and the Neoptica whitepaper that disappeared after the Intel purchase has reappeared here (thanks Aaron).

So what does this material tell us?

The Neoptica paper predicts a “brave new world” of heterogeneous processors and programmers using both CPU and GPU in a relatively equal way. It’s interesting in an historical way, but lacks detail to make any firm conclusions.

The Larrabee hardware paper describes the Larrabee hardware architecture in enough detail to decide it is very interesting. Larrabee is a bunch of CPUs with some graphics hardware. Working the problem from this angle gets us a couple things:
1) CPU style memory behavior, eg caching
2) CPU style programming, eg no Shader Model 15, here is a C/C++ compiler
3) Flexibility due to parts of the system pipeline being in software

Another detail that might not be crystal clear is Larrabee will be especially strong on compute resources. Will this change the typical graphics IHV advice (see the Beyond Programmable Shading ATI course) of 4 ALU ops per 1 Texture Op? TBD. But if so that would enable some interesting things (mo betta physics, AI, etc) that the cycles do not exist for on today’s graphics hardware.

The Beyond Programmable Shading course is tres interesant. From both a “stimulate long term thinking” aspect as well as practical things that could be lifed immediately. Long term it presents the 4 architectures and GPGPU programming models (traditional graphics IHV specific APIs eg CUDA/CAL, Larrabee, D3D11 ComputeShader, OpenCL) and it is worth ruminating over the similarities and differences. It is clear that thinking through your problem and comparing it to the kernel-stream paradigm and moving in that direction can yield good software algorithm/architecture results. It is also clear that over the long run that may not be aggressive enough. Immediately useful tidbits include the depth-advance optimization presented in the id Software paper here.

The ACM Queue issue has several good articles, and if you are considering parallelising your algorithm I highly suggest reading Chas’ article on Data-Parallel Computing.

And Kayvon’s blog is worth a gander if you like this sort of material.

For a lively discussion of parts of this material, see the Beyond3D forums here.

It is going to be an interesting couple of years, as we see the biggest change in GPU architecture since programmable shading. If this doesn’t get your mental juices flowing, check your pulse, as you may be DOA.