OK, I have read a lot of reviews now. Some things are clearer now.
I suppose I overreacted a bit in my previous blog. Zambezi is hot ,but overall it's not a slow chip. It performs rather well in MT applications. It does have some weaknesses which AMD must correct. Some of the weaknesses are not solely AMD's fault,but GloFo's too.
So this is what ,in my humble opinion , AMD must focus on in the future ( think Piledriver and Steamroller):
1) First and foremost AMD must invest heavily in relationship with developers. They must hire a brand new team of both young and motivated guys who will literary go out and help developers in order to maximize the potential of Bulldozer design. This first iteration is just that ,first. It has some flaws which AMD will try to fix and hopefully succeed in that task. But underlying design ,which is truly revolutionary , will need GOOD software support in order to give best performance to the end users. This means FMA4,XOP,BMI and the rest will need to be properly supported in future multimedia desktop workloads. Notice I'm speaking about DESKTOP space here. Server is in no such need since recompiling is a norm there.
2) AMD must improve the cache performance,especially L1 and L2 writes. This is a major bottleneck and it shows its ugly face in many workloads. AMD is aware of this and hopefuly Piledriver has at least somewhat better write performance with these two levels of cache. L3 looks fine,even more than that. It is much faster than L3 on Thuban.
They also need to work on improving the FP unit. It may be great in FMA4 stuff but it's much less impressive in legacy SSE or AVX128 workloads. Maybe expanding it a bit and expanding the buffers could help. Single thread performance is not anywhere near what this thing SHOULD be capable of so there must be a bottleneck somewhere since in numerous SIMD workloads it's not faster than K10's single core(and its 128b unit).
3) AMD must twist GloFo's arm very hard and very fast. Not only their 32nm production is bringing many defective Llano parts (which is truly a shame since most of the time GPU is broken and then it's not APU any more), but now they can't brake 3.6Ghz barrier on a design that was SPECIFICALLY DESIGNED FOR CLOCK speed (while it does have some IPC improvements in certain areas too). So original goal set by AMD was 30% clock uplift with the same power draw as previous design. We get this ONLY in limited Turbo mode now. We should have 4.1Ghz 125W stock clocked Zambezi parts with 4.7Ghz half core turbo and 4.5Ghz full core turbo. This Zambezi would effectively be 12% faster than 8150 with same power draw. This Zamebezi would allow AMD to use SMT core affinity scheme and release a patch for windows 7 that would force threads first to modules and not cores. Performance uplift is ranging from 5% to massive 40% in some cases,averaging to around 15-20%,depending on benchmark selection.
So what we need is 95W 3.6Ghz FX8150, 125W 4Ghz 8170 and 4-4.2 Ghz 125W 8270 (Piledriver).
This lineup would hold off SB and IB ,at least in mid and mid-high performance segments,without many problems.
4) AMD should work closely with MS and release a patch to windows scheduler. As in link I've provided above, performance uplift is not a small number but a very nice 15-20%.
Trade-off is power draw though. All is explained well in this great review by harware.fr .
So there you have it. Bulldozer is not what we expected,but it's not a complete failure either. It's a solid chip which will shine in future applications ,which are going for multiple threads. Single core speed ,while still important,is not the main selling point any more. For those who want a good single core performance while having great MT performance (but still slower MT performance than FX8150) ,they can pick 2500K . It's the best chip by intel currently from perf./$ POV. 8150 is not as good but very close! It needs 10% shave from it's MSRP and AMD may sell a sh*t load of these things :).