CPU Test - the results

ppowellaa ( ) posted Thu, 28 March 2002 at 9:08 AM

P4 1.5, 512mb SDRAM, using Poser 3- 44 sec.

mjtdevries ( ) posted Thu, 28 March 2002 at 9:49 AM

Don't hold your breath waiting for code to be optimized for the P4. For most code it won't ever give any noticable performance gain, or is much too timeconsuming or too difficult to implement. Only in very rare cases will it give a real performance boost. It is the same story as with MMX, ISSE, ISSE2, 3Dnow! etc etc. MPEG encoding and some specific photoshop filters are indeed examples of those rare cases where you can indeed get huge performance boost. But just as with MMX and ISSE, those are only a very small part of current applications. There is no reason why suddenly the P4 optimalizations would work so much better then the P3, PPr0 or pentiumMMX optimalizations. (BTW photoshop is THE benchmarketing application. It is used to show how fantastic MMX was, how fantastic a Mac is/was, how fantastic an Athlon is or a P4. You can twist photoshop results any way you want it) I wonder how you have determined the P4 is so great for 3DsMax. In all tests I have seen the P4 performs nice in viewports, but final rendering performance is pretty sad. Also the good performance is the P4 in mpeg encoding is mainly because those programmers of Intel are much better then the programmers that made the inial code. Now that AMD programmers have helped with the codec also, the Athlon is once again (a little bit) faster than the P4. So there goes another myth about the P4. A few things to consider when reading Jim's results: Don't look at memory above 256MB. It is not used, so it doesn't influence the benchmark results. (different in more complex scenes of course. So memory is still important for Poser) CPU usage is not 100% during rendering. On my machine I only got 100% cpu usage during the first 6 seconds when the shadow map was created. Something else is the bottleneck here. I suspect it having to do with the temp file that is created, but I don't have definite proof of that. There might also be more issues like turning the dancing man off. It would be interesting to get to know more about the systems that somehow perform better or worse than similar systems. For example: the 24 seconds of the 2000+ systems is so much faster that it is not just caused by the CPU. 68 seconds for a 1800+ is so bad that something else must be holding it back. In general P4 systems seem to perform badly compared to Athlon and P3 systems. But that doesn't mean the CPU performs slow. Since the CPU usage isn't used 100% during rendering anyway other factors as RDRAM instead of DDR RAM, other IDE controllers, and such on the P4 systems have to be looked at also as a possible cause. And maybe Poser REALLY likes a powerful FPU. That will stay the weak spot of the P4.

duanemoody ( ) posted Thu, 28 March 2002 at 2:16 PM

To recap what he's saying above: the redesign of the P3 into the P4 involved removing some fairly basic math instruction subprocessors and emulating them with less efficient processors, so that those processes would now take more time to execute. Since most of the compilers out there for x86 were built around optimization based on the existence of the subprocessors, the resulting compiled code isn't as efficient as it's supposed to be. The Tom's Hardware article I'm paraphrasing went on to say that the benefits of the PPC RISC architecture have been hamstrung by the overhead which is why they're not as fast as they could be compared to the x86/Pentium CISC legacy.

mjtdevries ( ) posted Fri, 29 March 2002 at 2:10 AM

It's not just that the compiled code isn't as efficient as it's supposed to be. It is also quite simply that the P4 has a weak FPU, regardless of the code. Now, if the P4 is the only CPU your code works on, that doesn't have to be that big of a problem. Instead of using the FPU you can often use ISSE2 instructions. But that poses programmers with 2 major problems: 1) You have to manually optimize your program for the P4 instructions and you can't do it in a high level language. That means a lot of extra time, money, expertise is needed for something which was and is not needed for all other processors. 2) You have to make a program that uses different code for the P4 and other x86 processors. That means more time and money is needed. It also means that the program becomes more complex, more difficult to maintain and will have more bugs. Lots of companies don't even want to write for multiple operating systems. (Win32 and MacOS) let alone, that they want to write for multiple CPUs. Whereever possible they will write one piece of code in a high level language and will let the compiler make a Win32 and a Mac version. If a compiler is able to make some optimalizations all by itself that is fine. (But again: that is not enough for the P4)

soulhuntre ( ) posted Fri, 29 March 2002 at 3:07 AM

Wow... I posted a long reply before but it looks like it got eaten. I'll recap it in shorter form :)

Basically you are incorrect from the information I have on both points. There are numerous sources that indicate that the primary requirement to benefit from the P4 is to recompile your code with a compiler that knows how to do it. In fact the folks at Discreet did exactly that to get the performance version of Max and saw a 5-30% increase off the bat.

Their results indicate that there has been NO performance decrease under other processors as a result of this change.

That about covers it. Recompile with an optimizing compiler. Doesn't seem like a big problem, and there is a version of the compiler for Linux that shows similar speed increases.

So, to recap for my points:

All the software I use that needs the boost is already optimized, and those optimizations are part of the primary branch and will continue to be optimized in future releases. This leaves Poser as the only exception... but I can't imagine that Poser 5 will not when it is trivial to do so.
The 3D display library (DirectX) most used by the games we develop and play here all benefit from the P4 optimizations . Quake for instance is dramatically faster on the P4. Serious Sam is not, but that's due mostly to it's rather "old school" rendering engine.
The application software we are developing for clients is all under .NET on both the server and client side. This applications that can benefit do so automatically.

I have seen nothing that indicates that this is a nightmare of hand optimization in assembly. In fact all the information I have seen indicates that it is a simple compiler flag. Interestingly, not only is Max faster with the optimization, but more accurate (the internal format used now is 80 bits).

Personally, I am not concerned about the Athlons... I want everyone to be fast :) It's fine by me. But the P4 is not a bad chip and it has some advantages. Like most things in the tech world when teams of smart people attack a problem there are trade offs. For me, the P4 fits my uses better :)

mjtdevries ( ) posted Fri, 29 March 2002 at 5:30 AM

I'm sorry, but that bit of marketing on the website doesn't support your claims: They didn't just recompile with an Intel compiler. They worked together with Intel personnell to change the DLLs. (If recompiling for P4 optimizations is such a trivial task, why did they need to work together with Intel to do it?) Indeed, they just recompiled some DLLs, but they MODIFIED key DLLs so that they now make use of SSE. You don't have to be a rocket scientist to understand that that is where most performance gain comes from. BTW 5 to 30% performance increase isn't even all that much. Simply using a better compiler can already give you 10%. (not speaking about P4 optimizations here) And Intel has very talented programmers that have realised such performance gains just by optimizing code without making use any specific P4 or SSE parts. Also I wonder how much the P3 and Athlon XP benefit from that package. Because the modification to the DLLs involved SSE, which is usefull for P4, but also for P3 and Athlon XP which have exactly the same SSE functions. Although the pack is marketed as a P4 pack, the text clearly indicates that is was made to improve both P3 and P4. Now your other points: - Those programs you have that have some sort of optimization (that's actually noticable, and not just marketing) are very likely all much more expensive than Poser4 or Poser5. For Lightwave, Cinema4D, or 3DsMax it is much easier to hire Intel for some help, and they probably have more experienced coders to start with anyway. For relatively low-cost programs like Poser the situation is quite different. - Quake3 is a very interesting case. The GAME is very much optimised for the P4. (Again with help from Intel programmers). The performance of the P4 with the game is stellar. But the ENGINE apparantly doesn't have that much optimizations. Because in all other games that use the Quake3 engine, the P4 doesn't perform nearly as well as with the quake3 game. In fact in most of those games it loses to the Athlons. - .NET has very little to do with CPU optimizations at all. Don't get me wrong. The P4 isn't a bad cpu, but it is mainly developed from a marketing standpoint (people are easily fooled by high Mhz numbers) and it simply isn't better than the competition and on top of that is more expensive then the competition. I don't see any advantages on it's design. It can shine in niche products where SSE2 can be used to effectively. But support in programs is rare, and by the time lots of programs use it, the competition has SSE2 too. (That's not new. It happened with MMX and SSE too) In cases where it can't profit from SSE2 is is often slower then the competition. I buy a CPU for the programs I use NOW and not for future optimizations. Right NOW programs use SSE optimizations at best. Next year there might be lots of programs that use SSE2, but by that time I will want to buy a new CPU anyway. The competition has SSE2 support then too. Right NOW for Poser4 the P4 is clearly not the best choice. Whether you care about that is another question. Especially since other factors are clearly very important to Poser4 performance too. We'll see what happens when Poser5 finally arrives. (if the P4 is still used by that time ;-))

soulhuntre ( ) posted Sat, 30 March 2002 at 3:49 AM

Hmmm... OK :)

"I'm sorry, but that bit of marketing on the website doesn't support your claims:
They didn't just recompile with an Intel compiler. They worked together with Intel personnell to change the DLLs. (If recompiling for P4 optimizations is such a trivial task, why did they need to work together with Intel to do it?)"

I don't see anything on there that indicates hand optimization - nor do I find any references on the 'net to hand optimization being necessary for P4 optimization. I would be happy to look over any references you might have to that being a requirement for significant gains.

"BTW 5 to 30% performance increase isn't even all that much. Simply using a better compiler can already give you 10%."

I'll take it, thanks :)

"And Intel has very talented programmers that have realised such performance gains just by optimizing code without making use any specific P4 or SSE parts."

Intel just plain has some talented people. That is why I am always amazed when folks assume those same talented folks are being purely driven by their marketing department into making silly choices :)

"Those programs you have that have some sort of optimization (that's actually noticable, and not just marketing) are very likely all much more expensive than Poser4 or Poser5."

Well, yeah. That was sort of my point. For the higher end uses that we put these systems too the P4 is a good choice for us... the chip is fast and the quality is good, it comes from vendors we like and the optimizations exist is all the software we use that it would matter in.

BTW - that is not the only circumstance for the gains. There has been a bit of stir in the Linux community because some projects are advocating the use of the Intel compiler over the GNU compiler set because of it's superior performance ... especially on P4 chips.

The P4 is a chip with a LONG pipeline and some interesting instruction choices - it needs to be paired with a compiler that knows what to do to make it hum along - fortunately Intel has built one.

"For relatively low-cost programs like Poser the situation is quite different."

The stuff I see on the net and the experiences of a number of developers say something else. That significant performance gains are possible simply with a recompile.

"Quake3 is a very interesting case. The GAME is very much optimised for the P4. (Again with help from Intel programmers). The performance of the P4 with the game is stellar. But the ENGINE apparantly doesn't have that much optimizations. Because in all other games that use the Quake3 engine, the P4 doesn't perform nearly as well as with the quake3 game. In fact in most of those games it loses to the Athlons."

Yup, it is interesting :) Especially because Quake 3 is completely bound by it's rendering engine for performance. There is no AI to speak of and almost no housekeeping done. The performance is totally tied to the graphics engine. The fact that it performs so well under the P4 is a useful pointer to what is possible with that chip and a smart development team.

The fact that those folks who license the engine manage to sacrifice that advantage speaks of them more than the chip :)

".NET has very little to do with CPU optimizations at all."

Actually, this is not entirely the case. >NET software (we are developing a number of programs under it) makes heavy use of the runtime environment and libraries provided by MS. Since those are P4 aware and optimized the software that sits on them is also optimized that way. This would be true for a Athlon optimized version if one existed.

As more software is developed on this framework, more software will see the benefits. Simple :)

"I buy a CPU for the programs I use NOW and not for future optimizations."

So do I :) And the programs we use most often are P4 aware. The performance is on par or faster than the Athlon parts and the systems come from vendors we trust (a whole other discussion). As there is (for us) no disadvantage to running the P4 we are happy to do so.

As for the future, we keep systems in service for years... I see no reason not to bet on the industry leader and take the advantages as the come later, in addition to the ones we get now :)

"Right NOW for Poser4 the P4 is clearly not the best choice."

While I can agree that there seems to be a slight advantage to the Athlon XP at a similar clock speed for Poser, the benchmark also (as you mentioned) shows that Poser is extremely sensitive to other issues - the wide disparity in performance across machines of similar CPU is a critical clue.

Long before I would worry about my brand of CPU for Poser I would worry about the other factors that seem to be so important to it.

"We'll see what happens when Poser5 finally arrives."

Given the dramatic change sin the rendering system and soft body dynamics that have been hinted at, I think we can see that Poser 5 will have to be a more gown up piece of software. They will no longer be able to ignore the hardware acceleration available for previews and they will be using good compilers for the system.

Me, I would LOVE to see Poser 4 compiled with the Intel compiler... just to see what happens :)

mjtdevries ( ) posted Sun, 31 March 2002 at 9:08 AM

I'll try to make my reply short time time :-) "I don't see anything on there that indicates hand optimization - nor do I find any references on the 'net to hand optimization being necessary for P4 optimization. I would be happy to look over any references you might have to that being a requirement for significant gains." The webpage you refered to has the proof: "In addition to being recompiled with the Intel compiler, the following DLLs have had key performance functions optimized using SSE (Streaming SIMD Extensions). Pentium III class processors and higher will be able to take advantage of these optimizations. Rend.dlr Blur.dlv" I think you'll agree that the rend.dlr would be the most likely candidate if you want to modify a DLL to improve performance. I wouldn't even be suprised if you could gain up to 30% from just those 2 DLLs, and not even touch the rest. "Intel just plain has some talented people. That is why I am always amazed when folks assume those same talented folks are being purely driven by their marketing department into making silly choices :)" I never said it was a silly choice. It has been a very smart move by Intel and AMD still has a lot of problems trying to inform people that Mhz alone doesn't say anything about performance. It would have been a silly choice if the impact of that design decision had been so large that they would not have been able to keep up with AMD. Right now the situation is that both have equally powerfull CPUs. But also remember that those talented folks still want to be paid. And that sometimes means that you have to accept what management decides and try to make the best of it. Lastly I'll counter your story about .NET with past experience with DirectX. DirectX got optimized for ISSE and later for 3DNow! Did we ever notice anything? Did you ever notice that the ATI and Nvidia drivers got optimized for them? No! Those generic optimizations will hardly ever really give big performance gains. You need to make optimizations specifically for your application to really get performance gains. That is what is proven time and again. Take a look at Quake. Only when game developers get help from Intel (or AMD) are really big performance gains realized. Without help from Intel they can't accomplish it. The same here with the special performance pack for Max. BTW most performance gains I have seen for CPUs have been geared towards the use of SSE and SSE2 (or 3Dnow!) With the mpeg in windows media player being the best example. P4 did very well in those benchmarks, and people thought that was because of the P4 architecture. Later it was discovered that media player didn't activate the SSE optimizations in the AthlonXP CPU. The moment those were activated the Athlon got just as much performance gain as the P4 and the Athlon beat the P4 again in that department. That's why I am also interested in the performance gain with that pack when used with the P3 and the AthlonXP. Since the most important modifications seem to be geared towards SSE and not just P4. Hmm, it seems I didn't manage to make a short reply after all.... ;-) One last remark. I don't agree with you that there is such a wide disparity in performance across machines of similar CPU. Most systems perform as expected and are grouped in clusters. I see only a few exceptions: - XP 1800+ 68 seconds. As I said before I really wonder what is holding that system back. - Dual 1700+ Not really an exception since a dual config gives overhead, and AGP implementation on dual systems is not that good. - the 1Ghz Athlon systems seem out of place. Maybe they are SlotA athlons instead of Thunderbirds? But differences in having that dancing man activated or not should also be considered. BTW I would also love to see Poser4 compiled with the Intel compiler. But I would like to see two compilations. One with P4 optimizations turned on, and one with normal settings but also with the Intel compiler. Just to see the difference between the Intel compiler and another compiler, and the difference between different settings within the compiler. I remember IBM made the Windows3.1 within OS/2 10% faster by just taking the source code from Microsoft and compiling it with their own compiler. They didn't modify any file for that and it didn't use any CPU specific settings either.

mjtdevries ( ) posted Sun, 31 March 2002 at 9:11 AM

Is there some rule that prevents long replies from being posted 9 times out of 10.....? Took me quite a lot of tries to post the above....

soulhuntre ( ) posted Sun, 31 March 2002 at 2:45 PM

BTW - thanks for the discussion - it's nice to be able to discuss these things without it turning into zealotry and anger :)

"The webpage you refered to has the proof:"

Optimized in the source code, in a HLL is not the same thing as the claimed requirement to hand optimize the assembly. Surely you Can agree to that?

"It has been a very smart move by Intel and AMD still has a lot of problems trying to inform people that Mhz alone doesn't say anything about performance."

I disagree the the interest was purely clock speed. I think the reality is that Intel sees some benefits to the long pipeline architecture. Those benefits mean that the P4 performs as well as it's competition (though it is more expensive, the Intel parts usually are... people will pay for the higher quality of the brand) and they mean that the chip has a long improvable life in the architecture.

The P4 is an evolutionary, not revolutionary chip - but it does well and certainly is no slouch :)

"DirectX got optimized for ISSE and later for 3DNow! Did we ever notice anything?"

I certainly notice when DirectX got the optimizations for the processors of the time, MMX was it? SSE? Whatever :) And I notice now when I have to compress things. So these optimizations DO matter. SSE2 is a good thing... a very good thing. Yeah, AMD will be able to copy it, they have before, but I don't mind supporting the folks who are doing the job of getting it supported.

"That's why I am also interested in the performance gain with that pack when used with the P3 and the AthlonXP. Since the most important modifications seem to be geared towards SSE and not just P4."

I'd be happy to see it as well, since at the moment the Atlon's don't support SSE2 as I recall. If the optimizations use that then there should be a dramatic difference. I am all for seeing how this all turns out :)

I looked and looked out there, and I couldn't find a benchmark that had the P4 and the Athlon XP both with and without the performance update under Max as the benchmark. Very disappointing.

I would REALLY like to know what else Poser is dependant on :(

Anyway, thanks again!

OH, for long posts? I have noticed that Renderosity threads "time out". If I read a thread then take a while to compose a reply, it is best to "refresh" the thread page before I actually enter the reply. So I compose it in another editor. When I do that it almost always actually posts.

duanemoody ( ) posted Sun, 31 March 2002 at 3:06 PM

Much to my displeasure I learned this morning that Apple's 9.0 installer refuses to install the AltiVec extensions if it detects an off-brand G4 update (like my newertech MAXpowr G4). Fortunately I also learned that Apple's own Tome Viewer is a can opener for their installers and bypasses this nonsense, extracting the four necessary extensions. However, the improvement in rendering time is not colossal: 468 seconds instead of 476, or 1.7% faster. I'd like to go back and create some extension sets that eliminate most of the stuff Poser isn't using and report again.

duanemoody ( ) posted Sun, 31 March 2002 at 4:38 PM

Poser requires Apple's OpenTransport library (presumably for the direct link to Curious), so disabling the internet components is not an option. After experimenting with disabling other system components, results were poor: 473 seconds. My guess is that Poser's mathematical operations don't call on the AltiVec hardware much. Of course, my machine is hardly typical: it's a souped up 7500, so USB support is software, and Poser is residing on an external drive because the Mac's still sporting the 1.5G SCSI internal it came with.

mjtdevries ( ) posted Mon, 01 April 2002 at 3:01 AM

"Optimized in the source code, in a HLL is not the same thing as the claimed requirement to hand optimize the assembly. Surely you Can agree to that?" I'll grant you that it doesn't mean you have to optimize with assembler. But a compiler won't change source code. So any optimization in that source code has had to be done by a software engineer and therefore counts as hand optimized to me. Indeed AthlonXP doesn't yet support SSE2. (Next generation CPU will) But although the performance pack is called a P4 pack, the explanation about it only talks about SSE and talks about improving the P4 AND P3. If the modifications to the source code had been for SSE2 and would just have benefitted the P4, I'm convinced they would have mentioned SSE2 instead of just SSE. About the bottlenecks for Poser4. As you have also seen, the memory doesn't make much difference in this benchmark. You just have to have 256MB to make sure performance isn't degraded because of swapping. But if you have 1GB or 256MB doesn't matter at all in this test. When I render I see that during creation of the shadowmaps the CPU is at 100%. Shadowmaps aren't big here so that is just 6s of my 35s rendertime. After that my CPU usage is about 70% and I hear the harddisk. I assumed Poser writes the rendered image to a temporary file. I thought that may be the bottleneck. But I have done the test with performance monitor running and I don't see any bottlenecks at all. Pages/sec is low: 1.6 , so is avg disk write queue: 0.397 max Disk write Bytes/sec doesn't exceed 400Kb/s. So it's not the disk that is holding things back. It's not the CPU nor the amount of memory. What's left? I think there are more things like the dancing man holding things back. I didn't have him disabled this time, and I couldn't see that in the performance counters. There are probably more things like that, that slow it down without us being able to point to a specific bottleneck.

soulhuntre ( ) posted Mon, 01 April 2002 at 4:56 AM

"Indeed AthlonXP doesn't yet support SSE2. (Next generation CPU will) But although the performance pack is called a P4 pack, the explanation about it only talks about SSE and talks about improving the P4 AND P3." I guess until we get some comparative benchmarks we are at a stand still on this - guessing about code we can't see :) "So it's not the disk that is holding things back. It's not the CPU nor the amount of memory. What's left?" A good question! I am curious why a scene with 2 mil figures (Vick and Steph) and complex hair with a UD light setup renders faster (MUCH faster) than that benchmark did. Maybe a second benchmark file is needed to compare another data point?

mjtdevries ( ) posted Tue, 02 April 2002 at 1:11 AM

That was probably a scene rendered in a different program? In that case there are soo many variables to take into account that's impossible to judge if the other program has a more efficient renderer, or that it maybe renders less, or the light setup was easier etc etc. Light can make a LOT of difference. Still performance can differ a lot between programs. The raytracers of Bryce and Vue seem extremely slow compared to Cinema4D. (Haven't done much with it, just a bit of playing around. Having lots of problems with transmaps in san francisco hair. Maybe that's why it is faster too. Who knows? :-)) You could try a second benchmark for Poser with hi-res Vicky and hi-res shadowmaps, although I don't expect much difference. I think you would mainly see much higher memory demands, but as long as you have enough memory not much difference relatively from the current list. If you make a Poser scene and put it in free stuff I guess we'll find enough people to give it another try. And this time we'll have to make sure that everybody has the dancing man turned off :-)

soulhuntre ( ) posted Tue, 02 April 2002 at 5:08 AM

"If you make a Poser scene and put it in free stuff I guess we'll find enough people to give it another try. And this time we'll have to make sure that everybody has the dancing man turned off :-)" Actually it was Poser... maybe it is the light setup. I'll see if I can find time to set up a test file sometime soon.

Jim Burton ( ) posted Tue, 02 April 2002 at 1:12 PM

Gee, I like the dancing man! I'd suggest a few less characters though, I think the one I put up was bigger than most people work most of the time. I was also suprized to see my hard drive light come on (after I hooked it up!), with 768 Mb RAM I thought it would run in memory, but maybe Poser always has to open all the files.

mjtdevries ( ) posted Wed, 03 April 2002 at 4:21 AM

I guess that the dancing man won't make much difference for larger renders, just for the short ones. 5 seconds is a lot of total render time is 44 seconds, but nobody will care about 5 seconds on a total of 440. I saw the harddisk light although I have 1GB, but that is not caused by swapping or anything like that. During rendering Poser writes the image to a temporary file on the harddisk and that is what you see. (In a performance log you will see that the harddisk activity is just writing) You could try one or more vicky figures with high res textures and large shadowmaps. It would focus the benchmark more on textures and less on polygones. I wouldn't use stephanie or mike since a lot of people won't have those figures. A lot of people do have vicky though. (which texture to use might be a problem. Is there a high res texture from free stuff?)

soulhuntre ( ) posted Wed, 03 April 2002 at 10:36 AM

Is there any way to alter the patrh of the temporary file? I am tempted to point it at a ramdisk and see if that helps :)

mjtdevries ( ) posted Thu, 04 April 2002 at 12:32 AM

I've been thinking along those lines too :-) I think Poser just uses the %temp variable. I don't know how to point that to a ramdisk in w2k/xp. Then again, the disk write queue en bytes/sec I measured are not high at all, so I wonder how much it would help anyway.

phoenix4 ( ) posted Mon, 08 April 2002 at 3:04 AM

News from Curious Labs on the question :) --------------------------------------- From: "Curious Labs Technical Support" To: Cc: "TECH SUPPORT CURIOUS LABS" Sent: Monday, April 08, 2002 2:14 AM Subject: Re: Why does Poser Rende... Poser 4 was released before computers with your specifications existed. We are constantly striving to improve our products. We cannot announce any updates or other developments in advance of press releases. Please visit our Web site at www.curiouslabs.com for the latest information about our products. > > Why does Poser Render soooo slowly on a Pentium 4?? > > > > A Pentium 4 has 6!!!! Floating point units and an Athlon only has 2!!!. If software is compiled properly a P4 will!! eat anything in its path. > > > > Why isnt it written to handle a Pentiium 4s floating point system and > handle Rambus memory????? > > > > I have notice up to a 60 second difference between an Athlon and a P4 on render. > > > > Is there an Update to make it perform faster on a P4?? > > > > > > Thanks, > >

mjtdevries ( ) posted Mon, 08 April 2002 at 4:05 AM

There is some misinformation in the mail to which Curious Labs has responded. I'll try to claify in normal english. The techies should take a look at http://www6.tomshardware.com/cpu/00q4/001120/p4-10.html which tells you the same, but with a more indepth and complex explanation. First: The P4 does NOT have 6 floating point units. How someone got to this number I can't imagine, but it is just not true! That person probably confused FPUs with execution units, but even then the number 6 is wrong. (just as the 2 for Athlon) Some background: The design goal for the P4 was a chip that had to run at as high clock speed as possible. (no matter what)Furthermore it would feature another extension to SSE to improve performance. The result was that some compromises had to be made which have produced a CPU that acts quite differently (and for some people unpredictable) than its predecessors. One of the compromises is that SSE2 has been introduced, but at the expense of FPU, MMX, and SSE1. And those are less powerfull then on the P3 and Athlon. That isn't a bad thing as long as everybody rewrites their programs and changes the X87FPU instructions into SSE2 instructions. (And as long as they don't mind that their precision goes down from 80bit to 64bit floating point) The result shows very clearly in the benchmarks. In programs which are specially optimized to use SSE2 instructions, the P4 shines, but in code that uses X87 FPU instructions the P4 does poorly compared to P3 and Athlon, because it was designed that way. Poser4 was of course made for X87 FPU instructions. Lots of companies find that changing software to let it make use of SSE2 is difficult, time consuming and thus costly. Therefore you can see that it is only done in very expensive software where reasonably big performance gains can be reached. (Lightwave, 3dsMax etc) The question is whether that is affordable for a small company like Curious Labs, which creates a low priced program: Poser5. Time will tell. (or maybe curious labs) Marc. P.S. For people that don't want to read the entire article I mentioned (or are put off by the technical terms) I've quoted some parts specifically about the FPU performance: "Things look worse if you have a look at the red boxes, which represent the FPU-part of Pentium 4. Please take the time and compare this part to the Pentium III block diagram. You will see that Intel has actually castrated quite a bit of the SSE/MMX part of Pentium 4. Pentium III used to have two MMX and two SSE units, but Pentium 4 has only got one of each. Intel claims that additional units would not have improved the SSE/SSE2, MMX or FPU performance. However, our benchmark results speak a different language." "Intel hopes that software developers will soon replace the old x87-FPU-instructions with the double-precision FP instructions of SSE2, so that Intel's currently false claim that Pentium 4 has the most powerful FPU finally becomes reality. AMD is very impressed with SSE2 as well, which is why it announced to us only a few days ago that the upcoming Hammer-line of x86-64 processors will include SSE2 as well. I personally have my doubts if SSE2 will be able to replace x87-instructions in scientific software. We should not forget that the original FPU is using 80-bit FP-values, not the less exact 64-bit FP-values offered by SSE2."

duanemoody ( ) posted Wed, 14 January 2004 at 1:10 AM

Flash forward nearly two years -- a dual 1.8GHz G5 running P4 under 9.2 in Classic mode renders the scene in 92.97 seconds.

Explore More

Renderosity Forums / Poser - OFFICIAL

Welcome to the Poser - OFFICIAL Forum

Subject: CPU Test - the results

Explore More

Trending Topics

Social

Forums General

Welcome to the Poser - OFFICIAL Forum

Subject: CPU Test - the results

Privacy Notice