Wednesday, October 29, 2008

How much data to release, or the Goldilocks problem

Silicon vendors suffer a conundrum.

They must release information to ISVs about new parts, but how much?

Much like the Goldilocks fairy-tale:
1) too little information and ISVs don’t have the data to be successful,
2) too much information and ISVs can hurt themselves ( for instance if future hw changes a lot ISV titles that are “over optimized” for a specific architecture could perform worse),
3) just the right info and ISVs have what they need.

Thus finding just the right level of data to share can be a challenge but is very valuable to the ISV community.

So what is known about Larrabee to date and how is Intel doing at this?

After the disclosures around the SIGGRAPH 2008 presentations, quite a few reviews popped up, like this one.

Salient points that are now known are:
• The Larrabee architecture has a scalar pipeline derived from the dual-issue Pentium processor, which uses a short execution pipeline with a fully coherent cache structure. The Larrabee architecture provides significant modern enhancements such as multi-threading, 64-bit extensions, and sophisticated pre-fetching.
• The Larrabee architecture enhances the x86 instructions set with the addition of new instructions, including wide vector processing operations and some specialized scalar instructions.
• Each core in the Larrabee architecture has fast access to its 256KB local subset of a coherent L2 cache.
• The Larrabee architecture specifies 32KB for the instruction cache and 32KB data cache for each core, with explicit cache control.
• The Larrabee architecture supports 4 execution threads per core, with separate register sets per thread.
• The Larrabee architecture gains computational density from a 3-operand, 16-wide vector processing unit (VPU), which executes integer, single-precision float, and double precision float instructions.
• Task scheduling is performed entirely with Software in the Larrabee architecture, rather than fixed function logic.
• The Larrabee architecture uses a 1024 bits wide, bi-directional ring network (ie; 512 bits in each direction) to allow agents to communicate with each other in low latency manner.
• The Larrabee architecture supports full IEEE
• Larrabee programmers use high level languages like C/C++ - no more Shader Model n

Historically Intel has done pretty good at this sort of disclosure for CPUs. For instance, cache size and cache memory organization data allows developers to optimize both data and algorithms. The size and organization of Intel’s memory and cache architecture is regularly published per CPU family like this article for Nehalem, so developers know what they are dealing with. AMD is pretty good about this too; it is something the CPU IHVs have learned.

Contrast that to GPU IHVs. Now think about the efforts researchers put into optimizing GPGPU algorithms like matrix mult here and here and here and here and here. Would data from the IHVs on GPU cache size have helped? Almost certainly. And that is on released products.

Considering that as a baseline for information disclosure, Intel has to be commended for sharing so much so soon. Here is hoping Intel continues to get the right information out at the right time.

Thursday, October 23, 2008

Brief History of recent HW Ray Tracing discussion in the graphics community

I wanted to spend a minute catching up on what I see as key points in the global discussion on where graphics hardware rendering is going.

Here are some links to capture what I see as key points in the historical discussion:


"Jensen: NVIDIA is the world leader of ray tracing solutions. Mental Images is the clear market leader of ray tracing renderers. And recently, we added to our capabilities with the RayScale team. At Siggraph, we demonstrated the world’s first real-time ray tracer running on CUDA. "


CUDA sample - the standing-room-only presentation from SIGGRAPH 2008 and NVISION08 that describes the how-to of the must-see experience from both shows: fully interactive, high quality ray tracing running in CUDA on the GPU


My SIGGRAPH Larrabee news coverage, here.


ATI does Ray tracing too


Ray Scale goes to nVidia

2008-April (Spring IDF)

Post-GDC clarification on Ars Technica-

"What I found is that the truth is out there—you just have to take a few steps back and look at the big picture. Intel's direct and indirect statements on Larrabee paint a pretty coherent picture, a picture that's only muddied when one starts parsing any one spokesperson's words too carefully. "

Tom Forsythe’s comments -

"There's no doubt Larrabee is going to be the world's most awesome raytracer...Raytracing on Larrabee is a fascinating research project, it's an exciting new way of thinking about rendering scenes, just like splatting or voxels or any number of neat ideas, but it is absolutely not the focus of Larrabee's primary rendering capabilities, and never has been - not even for a moment. "


"David Kirk, NVIDIA: I'm not sure which specific advantages you are referring to, but I can cover some common misconceptions that are promulgated by the CPU ray tracing community."

2008-Feb (GDC)

Discusses Daniel Pohl's GDC presentation about Q4 Ray Tracing


Mental Ray goes to nVidia


Rendering Games with Raytracing Will Revolutionize Graphics, discusses ray tracing benefits


Ray Tracing on a PS3


Ray Tracing Quake4 on a PC, Daniel Pohl's project

So I see this discussion as having several key points:

1) Larrabee is not “just” about raytracing. Larrabee will run traditional games on the traditional pipeline. It has to, since the games wont switch on day one.

2) Ray Tracing and other alternate rendering schemes are becoming possible in real-time. It’s too early to tell where the industry will end up, but being freed from what I call the "tyranny of the triangle" and being shackled to the traditional pipeline can only be a good thing. The SGI style pipeline has had a good run, but expecting it to be the "only way" for ever just seems like very constrained thinking to me.

To support this, Larrabee also has a mode where you can take over the entire rendering pipeline and “do your own thing” like raytracing, deferred rendering, voxels, etc.

3) Hybrid approaches where the appropriate technique is used per object, or per material (like ray tracing for objects that act like a mirror) are clearly an option in the short term.

4) What I believe will happen is, for the next bit of time hybrid renderers where polygon rendering and "x" rendering are mixed will likely be the most interesting option for games developers. And "x" can be ray tracing, voxels, or whatever developers come up with.

And in the long run more general solutions to global illumination ( like ray tracing, radiosity, etc ) will be the norm rather than the exception.

5) To get a good idea where things can go, understanding GPGPU style operations and kernel-stream processing lets you see these new classes of hardware as not just stream processors but algorithmic processors. To fully understand the potential, read my original Larrabee post on my old Flight Sim blog here which links to a lot of the original papers. Especially check out the Stanford.EDU GPGPU papers like Brook.

Any algorithm, like ray tracing or deferred shading, that can be decomposed to have a phase that is “render a fullscreen quad and have a pixel shader run for each pixel” is easily parallelizable by the back-end driver and will just scale beautifully as the hardware gets more cores (stream processors).

And creating the data structures required by these algorithms is being supported by better memory access, eg more general scatter-gather, on the GPU side. Its still harder to code up the ping-pong required between CPU side (data structure creation and traversal ) and GPU side (rendering ) for some of these algorithms on today’s hardware than it will be on Larrabee, but even there the possibility exists today. Larrabee will just make it easier. And "the other guys" are on a trajectory to get there too.

6) This freedom to pick and choose is exciting and is where the biggest potential for large innovation is, at least in my opinion. Clearly an algorithmic sea change is coming. Maybe not all at once, but the fingerprints are there for those who want to play industry CSI.

Finally, I find the marketing change in tune over the last year quite interesting. At the beginning of the year, there are "misconceptions" about ray tracing. Now, the discussion is about who is “the clear leader". And who and what businesses have been acquired is also interesting. Plotting when they were acquired against the public talking points is another interesting exercise. And plotting when those talking points changed is also another interesting exercise.

Another interesting observation, if newer hardware designs are containing more "Larrabee-like" architectural features supporting this more general memory access and enabling CPU style operations for data structure creation and traversal - can Intel be all that wrong?

Sunday, October 19, 2008

Evangelism is hard, Part II, or how the bogus “Larrabee is a 2006 GPU” claim came to be

Peter Glaskowsky made an interesting blog post on Larrabee. It was commenting on an Intel system of rating game hardware performance in terms of what I will call “Larrabee core equivalents” at certain Ghz.

Now, overall I don’t want to comment on his thoughts relative to the Larrabee hardware since that would require me to disclose details I am not yet ready to talk. And Intel wouldn’t like it.

I may take the time to refute his ray tracing comments later, though, since I don’t believe he really understands polygon rendering and what freeing us from what I call “the tyranny of the triangle” would bring in terms of benefits. Just one thing in this post he gets “partly” right is turning polygons to pixels does require “triangle setup” but that is only part of the story. At a minimum there is also vertex processing, interpolation, and pixel processing. So while that post is fair game that's for later.

Back on topic, it is interesting what his comments have been turned into. Which is where I got my title about the bogus “Larrabee is a 2006 GPU” claim.

He even tried to correct the misperception. For which he gets major kudos.

But as usual with mud-slinging the correction gets less coverage then the original fact-bending. See here for how far the disinformation of the mud-slinging got and how the correction didn’t get the same traction.

Lesson - don’t believe everything you hear or read without fully researching the sources. And this serves to prove how some news gets more time than other news, go figure. Another example of how evangelism can be hard. 3D graphics can sometimes feel like a presidential campaign.

It’s going to be an interesting couple of years as we see what impact Larrabee makes on the industry over the next 3-5 years.

Evangelism is hard

Some recent events reminded me how hard evangelism and product creation is.

The hardest moments the DirectX evangelism team faced when I was there occurred early in my career, in the 1st year:
1) The John Carmack .Plan file Dec 23 1996.
2) The Chris Hecker Open Letter June 12 1997.
3) The Alex St John firing June 24 1997.
John’s missive happened when I had been on the job about 4 months, Chris’s open letter about the 10 month, and then Alex’s demise at the 11-month mark in late June 1997.

It was a busy 1st year at MS for me, :-), and that was without the Hummer-ing through the construction sites on campus in Alex’s new toy; the Judgment Day II spaceship; or any of the other behind-the-scenes action.

These episodes taught me the truth of the likelihood that if something can go wrong it will; and later the value of positive action in overcoming negative waves. For instance, it took 3 years of hard work and programmable shaders before D3D had proved to the world it did have a reason to exist.

So recently when the OpenGL crew had the shoe on the other foot my first reaction (to the surprise of many I am sure) was sympathy and a measure of identification.

I am talking about the OGL 3.0 fooforah. Basically the OGL powers-that-be announced a plan for a 3.0 release to clean many things up, went dark, changed the plan, and then appear to have been a bit surprised that their major unannounced change of plan wasn’t received well when it was finally announced. Anything that has multiple threads including ones that run to 170 pages of posts is generating passion.

The lesson here is that even if it’s painful to disclose decisions as they happen, if you don’t the negative energy that builds up will need an outlet eventually.

One good thing for OGL to come out of that discussion is the idea of an independent SDK and movement on making it happen, something OGL has long lacked.

OpenGL has a big problem though, in that it isn’t cutting edge any more. While this can be a problem, it can also have advantages. Problematic because of the fact that innovation outside of extensions is slower; and thus OGL support for D3D10 level hardware is worse than D3D10 because there is no API support for new units like Geometry Shaders as the support is only available via extensions. Advantageous in that because of extensions OpenGL in some respects supports D3D10 level hardware better because it is available on both XP and Vista, and OGL does get full WDDM support on Vista, and OGL does not have texture format changes to disadvantage previous generation content. It can also be an advantage following since OGL can decide to fix other rough edges as they become apparent like the Pack/Unpack processor in OGL 2.0 (see section 3.3 and 3.4) for dealing with data type/range issues. This was a rough edge in D3D9 that was fixed in D3D10 with resource views.

It should be clear, though, that OGL 2.0 was a reaction to D3D9. You cannot be cutting edge and be in reaction mode, and you cannot simply depend on extensions to bail you out.

Extensions come with their own cost and are definitely a dual-edged sword. Originally, if I recall, extensions were aimed at providing new functionality not in the core and if they were proved to be “worthy” they would become part of the new core. In practice they have been used to expose single vendor only features that never make it into the core. So OGL Extensions may be good in one sense, but they are broken in another sense and can cause serious issues merely by their existence ( see issue 2 ). You shouldn’t need dynamic memory allocation and a full parsing system to determine what your rendering APIs are. And a new driver shouldn’t break a game like that.

In that sense it’s a good thing that D3D has held the line and not officially allowed extensions. Even though there are unofficial ones if you know whom to ask and if they decide to share with you. And I acknowledge that policy is a little less democratic than the OGL extension policy of being open to all comers.

Sigh. It’s just tough some times finding the golden mean. So to close this post, the OGL guys have my honest sympathy.

Saturday, October 18, 2008

More DirectX history, some people who deserve a shout-out

Since I am on a history kick I wanted to call out some people who have a place in the history of DirectX and don’t usually get a lot of mention.

I mentioned Craig Eislers’ DirectX history, it’s only fair to link to something about Eric and I include Alex for completeness.

The original three from RenderMorphics; Servan, Doug, and Kate don’t get the credit they deserve. Doug was a monster at debugging, and he and Serv should be better known in the 3D world than they are. They are currently at Qube Software. I haven’t heard what Kate is up to in quite some time.

Steve Lacy was a very early addition to RenderMorphics and came to MS as part of the acquisition. Steve preceded me at Flight Sim after his time on the DX team and is now at Google.

Todd Laney was involved in Windows display drivers, Windows multimedia extensions, Windows multi-monitor support, DirectX, and Flight Sim. This is just one of the stories. I met him at the Windows multimedia extensions conference Dec 1990 and we have been friends ever since.

Andy Glaister has a history that is intertwined with DirectX’s, and he doesn’t get much mention at all. Andy, you should get out more :-).

Richard Huddy has been around and involved with RM and D3D for as long as I can remember.

Ken Nicholson is another unsung warrior in the trenches. I first ran into him when he was at ATI during the DX1-DX2 days.

Chas Boyd has been architect for D3D from D3D5-D3D9 and deserves much credit for D3Ds programmable shading architecture.

Nick Wilt was the Dev Lead for D3D5 and DrawPrimitive at MS and for CUDA at nVidia.

Otto and also deserve a shout-out, they helped create the idea of XBox and get very little credit.

is a UK gamedev who did much of the engine work for Bullfrog ( Populous, Dungeon Keeper, etc ) back in the day.

This list is not meant to be inclusive, so if I left you out I apologize. Place a comment and I will update the thread.

Wednesday, October 15, 2008

Larrabee after SIGGRAPH 2008

I made a foundation post on Larrabee on my Flight Sim Blog last March here.

Since then SIGGRAPH 2008 happened and a good amount of Larrabee information was released.

The Larrabee hardware paper here.

The Beyond Programmable Shading course, with links to all the speakers papers here.

The ACM Queue issue on GPUs here isnt Larrabee focused, and is instead on GPU architectures and is thus interesting nonetheless.

Oh, and the Neoptica whitepaper that disappeared after the Intel purchase has reappeared here (thanks Aaron).

So what does this material tell us?

The Neoptica paper predicts a “brave new world” of heterogeneous processors and programmers using both CPU and GPU in a relatively equal way. It’s interesting in an historical way, but lacks detail to make any firm conclusions.

The Larrabee hardware paper describes the Larrabee hardware architecture in enough detail to decide it is very interesting. Larrabee is a bunch of CPUs with some graphics hardware. Working the problem from this angle gets us a couple things:
1) CPU style memory behavior, eg caching
2) CPU style programming, eg no Shader Model 15, here is a C/C++ compiler
3) Flexibility due to parts of the system pipeline being in software

Another detail that might not be crystal clear is Larrabee will be especially strong on compute resources. Will this change the typical graphics IHV advice (see the Beyond Programmable Shading ATI course) of 4 ALU ops per 1 Texture Op? TBD. But if so that would enable some interesting things (mo betta physics, AI, etc) that the cycles do not exist for on today’s graphics hardware.

The Beyond Programmable Shading course is tres interesant. From both a “stimulate long term thinking” aspect as well as practical things that could be lifed immediately. Long term it presents the 4 architectures and GPGPU programming models (traditional graphics IHV specific APIs eg CUDA/CAL, Larrabee, D3D11 ComputeShader, OpenCL) and it is worth ruminating over the similarities and differences. It is clear that thinking through your problem and comparing it to the kernel-stream paradigm and moving in that direction can yield good software algorithm/architecture results. It is also clear that over the long run that may not be aggressive enough. Immediately useful tidbits include the depth-advance optimization presented in the id Software paper here.

The ACM Queue issue has several good articles, and if you are considering parallelising your algorithm I highly suggest reading Chas’ article on Data-Parallel Computing.

And Kayvon’s blog is worth a gander if you like this sort of material.

For a lively discussion of parts of this material, see the Beyond3D forums here.

It is going to be an interesting couple of years, as we see the biggest change in GPU architecture since programmable shading. If this doesn’t get your mental juices flowing, check your pulse, as you may be DOA.

DirectX history

I wanted to take a link I found recently and repost the material. Its a "History of DirectX" post I had forgotten I had made back in 2004.

"Some one should store this somewhere for when it gets asked again :-).

Alex, Craig, and Eric burst onto stage at CGDC 1995 and showed a beta of the GamesSDK v1, with DDraw, DSound, DInput, and DPlay. The logo used was the radiation symbol, the internal code-name was "The Manhattan Project" and the technology was smoking hot compared to any previous Windows graphics API.They got a standing o, as no one had ever seen an MS library delivering refresh-rate graphics. It was quite a scene, and the BBQ and open rides at Great America contributed to the feeling that something great was happening.

Summer of 1995 MS bought RenderMorphics with the intent of adding a 3D API to the GSDK. I know this because I was at Kaleida labs who was a source licensee of RL, and I was the 3D developer for ScriptX, both Mac and Windows. Thus Kaleida and I was notified about the transaction.

GSDK v1 shipped in Oct 1995, just after Windows 95 shipped in August 1995.The famous Judgment day Halloween party complete with Gwar in the Haunted House at the Redwest parking lot celebrated the event.

The day after Judgment day, "The Aftermath" was held to brief developers on D3D. At that time a beta of D3D was made available. The immediate mode API was always called D3D. The higher-level RL API was renamed to be D3D Retained Mode.

At some point a 2nd version of RL was indeed made available, but at the cost of being COM-ified. RL lost a bit of its performance with this architectural change. But D3D IM and accessing 3D HW is where the action was anyway.

DX 2.0 shipped in June 1996. The Games SDK was indeed renamed at this point, but still used the old radiation symbol logo. It contained updates to the original 4 APIs. D3D IM and RM were aimed at DX 3.0.

I arrived at MS on Alex’s team during Meltdown week August 1996. DX 3.0 shipped August "43" ( Sept 12 to the rest of us ) because Eric promised it would ship in August. D3D IM with execute buffers and D3D RM were in that release.

The release plan Alex, Craig, and Eric had outlined before the community at CGDC 96 and again at Meltdown Aug '96 showed: DX 3 Aug 96 DX 4 Dec 96 DX 5 June 97. DX 4 was basically a bug fix release to make sure DX 3 actually worked on real 3D HW since the Voodoo 1 HW release was going to be after DX 3 shipped. Similar fix-ups have happened since. After a lot of discussion, saner heads prevailed and convinced the 3 that a release right before Christmas would be a bad idea in terms of consumer satisfaction and game developer sales, since there would be little testing and a high likelihood of bad experiences and returns. So the question was what to do about release numbers. Since a DX 5 release in 97 was already on tap and had been discussed publicly, it was decided to skip DX 4 and go to DX 5. That way the community got the expected release in 97.

At the same time, both Japanese and German geographies provided feedback that the radiation symbol logo was not appropriate for those geographies.Kevin Dalles was product manager at the time (before Kevin Bachus) and produced the 4-arm logo over winter 96-97. The logo change was unrelated to any release vehicle.

EDIT:Craig provides a similar recount of history here but with more details from the inside since I joined between DX2 and DX3 ( hey, I did ship DX2 games at Dynamix ). Between DX3 and DX5 Craig, Alex, and Eric went off to do the Chrome browser that used the 3D hardware for rendering. Way cool, but both too far ahead of its time and the team wanted to charge for it.

DX 5 was basically the DrawPrimitive release for D3D and did ship summer of 1997 and is what I used for my Unreal and 3DSMax ports. If I recall correctly, Force Feedback DInput was in this release too.

DX 6 was basically multi-texture, with the fixed function multi-texture cascade, from the D3D perspective. Released 1998.

DX 7 was basically hardware transform and light from the D3D perspective. Released 1999. This was the first full release where DMusic was in the SDK, even though it was initially released in the 6.1 "update".

DX 8 was the 1st rev of programmable shading for D3D, but was very limited for pixel shaders. Very. Just as a note, DPlay actually got a lot better in this release. At this point I became the DX SDK PM. Released 2000.

Dx 8.1 was an intermediate step from the D3D perspective with support for an advanced pixel shader model, 1.4, which was better than the 1.1 in D3D8 but still not very general. Released 2001.

DX 9 was the SM2.0 release with really good pixel shaders, compatibility with DShow, the last unified toolkit release, and was in my opinion the high-water mark of DirectX. I have an internal politics post that I may decide to make some day to discuss what has happened to the integrated multimedia toolkit that was DirectX, but I digress. Released Dec 2002.

DX 10 is really Direct3D 10 and is only about 3D. And also Vista only, which was a huge break with the past. Certainly the API churn was huge, the flip of the texture formats disabled previous generation D3D content, and Geometry Shaders and some of the other interesting bits of D3D10 have not really fulfilled their promise yet, time will tell if they end up doing so.

Now we await D3D11 in Windows 7 and are expecting it to also appear in Vista. ComputeShader, Tesselator, SM 5.0, and a few other bits are the E-ticket rides.

As a retrospective flashback, Alex started appearing down in the valley at Ken Nicholsons "GamePC Consortium" meetings (which I attended) in 1994 and was talking about graphics acceleration, "Funstones", as a benchmark to measure graphics performance, etc - all a huge big hint he was cooking something up back in Redmond.

Second Hand Smoke captures some of that perspective. Although I have to say WinG wasn’t that laughable, just limited to 8-bit modes, which wasn't enough. DDraw accelerated all modes by comparison, which made it way more valuable.

And the perspective on how RL become D3D IM is a bit limited in that it’s accurate enough for RM but doesn't hit enough data points for the IM API.

And D3D definitely was not "jumped on" by 3Dfx, ATI, and nVidia. 3Dfx did all they could to counter-evangelise Glide. ATI and nVidia were helpful true. Nor is Rich Seidner's "Crushed by MS" totally true - I worked with Rich at Kaleida and thus know quite a bit about what really happened.

So don't read Omits' article or any of these articles and believe all of it.

All that and I didn't mention the spaceship at all :-)”

Some one did store the original of this post away here, thanks to Keith Ditchburn. I just touched it up to bring it up to date, and found the links to Omit’s and Rich’s articles. And I still haven't discussed the spaceship story. :-).

1st Post, who am I and what is FutureGPU.Org?

So who am I? See .

So its late on my 7th day ( Day 2 of Week 2 ) at my new gig at Intel in the Larrabee space and I have to admit I was missing blogging. My Flight Sim blog got me hooked.

Now I have my own piece of real estate on the web,, and am going to use it to post my thoughts and musings on things gpu, graphics, and gaming. Maybe not in that order, and maybe occassionally other topics as well. Who knows, maybe I will even get to the DirectX spaceship story.

No I will not talk about any Intel secrets. But I do have a rather unique perspective given my history in the industry and can share that without compromising what I know that isn't public knowledge.

And once things become public knowledge, then I can certainly talk about them. And there are lots of topics that are public. DirectX 11, OpenGL 3.0, various CPU and GPU roadmaps, speculation that is wrong about Larrabee - these are all fair game. :-).

Before I go, I have to say my 1st 7 days at Intel have impressed me already. This is an organization that is serious about engineering. For instance, the approach to meetings is just spot on, taught to everyone, and seems more intentionally organized than at Microsoft. From perusing other blogs, some feel that Intel values like "constructive confrontation" are not used with the same vigor as in the past, but I'll have to make my own observations. And so far the new location isn't as horrid as the switch from single office to cube might sound. Far from it. All in all a good start!