Silicon vendors suffer a conundrum.
They must release information to ISVs about new parts, but how much?
Much like the Goldilocks fairy-tale:
1) too little information and ISVs don’t have the data to be successful,
2) too much information and ISVs can hurt themselves ( for instance if future hw changes a lot ISV titles that are “over optimized” for a specific architecture could perform worse),
3) just the right info and ISVs have what they need.
Thus finding just the right level of data to share can be a challenge but is very valuable to the ISV community.
So what is known about Larrabee to date and how is Intel doing at this?
After the disclosures around the SIGGRAPH 2008 presentations, quite a few reviews popped up, like this one.
Salient points that are now known are:
• The Larrabee architecture has a scalar pipeline derived from the dual-issue Pentium processor, which uses a short execution pipeline with a fully coherent cache structure. The Larrabee architecture provides significant modern enhancements such as multi-threading, 64-bit extensions, and sophisticated pre-fetching.
• The Larrabee architecture enhances the x86 instructions set with the addition of new instructions, including wide vector processing operations and some specialized scalar instructions.
• Each core in the Larrabee architecture has fast access to its 256KB local subset of a coherent L2 cache.
• The Larrabee architecture specifies 32KB for the instruction cache and 32KB data cache for each core, with explicit cache control.
• The Larrabee architecture supports 4 execution threads per core, with separate register sets per thread.
• The Larrabee architecture gains computational density from a 3-operand, 16-wide vector processing unit (VPU), which executes integer, single-precision float, and double precision float instructions.
• Task scheduling is performed entirely with Software in the Larrabee architecture, rather than fixed function logic.
• The Larrabee architecture uses a 1024 bits wide, bi-directional ring network (ie; 512 bits in each direction) to allow agents to communicate with each other in low latency manner.
• The Larrabee architecture supports full IEEE
• Larrabee programmers use high level languages like C/C++ - no more Shader Model n
Historically Intel has done pretty good at this sort of disclosure for CPUs. For instance, cache size and cache memory organization data allows developers to optimize both data and algorithms. The size and organization of Intel’s memory and cache architecture is regularly published per CPU family like this article for Nehalem, so developers know what they are dealing with. AMD is pretty good about this too; it is something the CPU IHVs have learned.
Contrast that to GPU IHVs. Now think about the efforts researchers put into optimizing GPGPU algorithms like matrix mult here and here and here and here and here. Would data from the IHVs on GPU cache size have helped? Almost certainly. And that is on released products.
Considering that as a baseline for information disclosure, Intel has to be commended for sharing so much so soon. Here is hoping Intel continues to get the right information out at the right time.