by Jon Stokes
In a series of announcements and conference calls, culminating in a recent analyst meeting, AMD has been slowly revealing pieces of the big picture regarding where they plan to take their platforms in the coming years. I've been following the coverage, and I've put together a synthesis of it below.
Fusion and Torrenza: die-level vs. board-level
AMD's long-term plan appears to go as follows. First, they'll push Torrenza as a platform for doing board-level integration with specialized coprocessors. This way, system builders will have the option of tailoring systems to particular workloads by changing the mix of processors that inhabit the coherent HyperTranspoort sockets on a motherboard.
So for instance, for a four-socket motherboard, one customer may want two stream processors and two multicore general-purpose processors (i.e., Opteron or Athlon), while another may want one Java + XML coprocessor and three multicore general-purpose processors. Thus, system builders can mix and match to get the best performance per watt per dollar for the type of workload that their customers typically run.
Fusion and die-level integration
AMD also plans to do the same type of mix-and-match strategy at the level of the processor die by offering an array of heterogeneous multicore parts that fit different workloads. A "quad-core" processor from AMD might have two general-purpose cores and two specialized processor cores (e.g., a stream processor and a Java + XML coprocessor), one general-purpose core and three specialized cores, and so on. AMD refers to these specialized computing cores as application processing units (APUs), and they plan to develop an array of such application-specific, modularly designed cores that can be dropped onto a die and fabbed with minimal cost.
In addition to modular processor cores that can be mixed and matched to suit different application types, AMD also plans to make other parts of the processor modular, as well. Specifically, the processor will host one or more HyperTransport modules and other I/O interfaces, a memory controller, different levels and amounts of cache, and so on.
All of this die-level mixing and matching goes by the name of Fusion, with the result that Fusion and Torrenza are essentially the same idea, but at the die level and board level respectively.
AMD would like to use the Fusion/Torrenza combo to escape an n-core race with Intel. When you consider the fact that Intel's approach to the nascent n-core race, in which the company starts out with package-level integration before moving to die-level integration, leaves them perpetually ahead of AMD in terms of cores-per-socket, it's easy to see why AMD doesn't want an n-core race to replace the MHz race.
So will AMD's flexible, mix-and-match, Fusion/Torrenza approach beat Intel's brute-force multicore approach over the medium term? It's really hard to say at this point, but the question is worth thinking about from a historical perspective.
Integration and reincarnation
To understand the ebb and flow of the modern (post-integrated circuit) history of computing, you need to know three fundamental rules that govern the evolution of computing systems:
- Moore's Law: die-level integration is cheaper than board-level integration, and it gets ever cheaper as transistors shrink.
- The abstraction vs. specialization tradeoff: modularity and abstraction can be cheaper (depending on the amount of overhead involved), but specialization is almost always faster.
- Sutherland's "wheel of reincarnation": generally speaking, it's cheaper to do things with a combination of software and general-purpose hardware than it is to do them with specialized hardware, but sometimes the market demands faster instead of cheaper (see Rule #2). So functionality will move from the processor die to a dedicated coprocessor for speed's sake, and then back on to the processor die when the coprocessor becomes bloated with features and too expensive.
Let me take these three rules one at a time, and discuss their implications for AMD's overall strategy.
Rule #1 above may not look like the traditional formulation of Moore's Law, but if you've read my article on Moore's Law then you know that the traditional formulation doesn't look much like what Moore actually said. In fact, I think that my formulation here actually captures Moore's larger point, which was a point about the economics of integration, than does the traditional "transistor densities double every 18 months" phrase.
At any rate, this rule is the number one factor working against AMD's QuadFX initiative, which is basically a pitch to sell board-level integration using coherent HyperTransport and PCIe. The overwhelming economic advantages of die-level integration will forever relegate platforms that rely on board-level integration to more specialized niches. These specialized niches are profitable, but the really huge money is in the mass market, and the mass market has an endless appetite for die-level integration.
So the result of rule #1 for AMD's grand strategy is that Torrenza is a niche server/workstation technology, while Fusion could be used in everything from the high end of the market all the way down to mobile devices. The economics of microprocessors dictate that this is and will always be the case.
Rule #2 may mean that Fusion-based parts are generally slower and cheaper than whatever kind of specialized hardware that Intel is producing at the moment. I say this, because Fusion's basic idea of a library of modular functional blocks that are mixed and matched for different implementations seems to imply the heavy use of automated layout tools, with less process- and clockspeed-friendly customization. So the question for Fusion is, how much performance- and profit-killing overhead is implied in the modular design, and is the tradeoff worth it in terms of cost?
My point here is that the main way that Fusion is interesting is as a way to do performance per dollar (and per watt), and not raw performance. Note that this characterization is as true for servers as it is for, say, mobile phones. Leaving coprocessors on the system board (i.e., Torrenza) will probably make for better raw performance in most cases.
The other, closely related issue with such modular designs (software or hardware) is that they're forward-looking, with the idea being that an up-front investment in modularity and flexiblity will pay off over a projected long term scenario in which flexibility is important. But what if flexibility turns out not to be very important? Specifically, what if there are really only two or three types of functionality that will ever be worth putting on a processor die?
Put differently, AMD has pitched this idea of an APU as a sort of placeholder, and invited all of us to imagine many different types of specialized coprocessors that people will want to put on the CPU die. To help jump-start our imaginations, they've offered up the GPU as an example of functionality that's headed for the processor die in the very near future. Then, they have a few more minor, "hey, someone might do this" examples, like Java + XML processors and physics processors. But what if it turns out to be the case that the GPU—or, rather, a generic stream processor—is really the only specialized kind of coprocessor that's worth putting on the CPU die for the forseeable future? What if it turns out that there's essentially no such thing as an "APU," because in the real world all "APUs" are just stream processors?
This last point brings me to rule #3, the wheel of reincarnation. Sutherland's original observation that functionality moves back and forth between the main microprocessor and specialized, off-die hardware was originally made with respect to graphics processing. And indeed, graphics processing is really the only domain I can think of where the observation has consistently held true. So it's not insane to think that GPU functionality is ultimately the only thing that will continue to make that transition.
But maybe I'm wrong, and there are other specialized coprocessors that could profitably make the jump onto the CPU die. (Notice the word "profitably," because I'm sure there are some that could do so unprofitably, where the market wouldn't reward the time spent developing and implementing it.) If you can think of some other examples, post them in the discussion thread.