![]() |
Different Speed in different OS's
Hi guys,
I recently just set up a common working directory for Prime95 for both my Windows and Linux partitions. Whenever I run MPrime in Linux, on a first time LL test with exponent in the 53 million range, I get around .086 seconds per iteration. Right now I'm running exactly the same job in Windows, and am getting seconds per iteration around .069-.072. Why is the Windows version better than Linux? Both are 64-bit versions. Processor is an AMD Phenom II X6 1055T, six cores at 2.8 GHz apiece. In Linux I got my numbers after shutting just about everything down, including the graphical user interface (didn't actually improve performance much, but that's just to show nothing is using the cpu's besides MPrime). Is there anything I can do to speed up Linux, or is this something in the code not optimized as heavily for Linux? Thanks, Dubslow |
That may depend on how 'fresh' your installation is so far and what you've installed.
Admittedly, my folder share setup is the same as yours and I get a similar result as you do, however, it is not a significant difference. When I run Ubuntu 10.04.2 x64 and mprime 26.6 on my Intel Core 2 6300 on 10,000 iteration intervals, I get about 0.59. This system is fairly 'clean'. I don't install too many applications however, I primarily use Ubuntu over Windows 95% of the time. When I run Windows XP Professional x64 and prime95 26.6 on the same system, it can get as low as .057. These are times derived from idle periods. If your times vary greatly, that may be the result of how resources are set to be utilized (whether by the OS or mprime/prime95). I leave all settings at their default on all that apply. This is contrary to what I would expect, however, having installed many systems I find that, overall, linux consumes more processing resources after a clean install than does Windows [XP] even after the drivers and some applications are installed. A curious side note: I've used a power meter to check the current on both OSes after a clean install and that Core 2 uses 88W at idle via Windows and 92W at idle via Ubuntu. I repeated this twice and allowed a full five minutes to reach idle state. Not much of a difference but I thought this would give some credence to the comparison between System Monitor (Ubuntu) and Task Manager (Windows) in reporting CPU usage. As for Vista and later, I have little to no experience so I cannot say. |
Have you tried assigning each worker to a specific core? It may not help you as much as it did with my core2quad with its cores assembled in pairs, but it's worth a try.
Rather than the software not being optimised for Linux, it may be more that Linux is not as optimised for the hardware. In general, hardware drivers, eg for graphics cards, for Windows have far more resources poured into them by the hardware companies than Linux ones do. |
Also worth thinking about: P95/mprime gets faster over time (GW is still putting effort into optimisations, like better FFT sizes and the latest instructions), so you should ensure you are comparing the same version....
|
[QUOTE=markr;266417]Have you tried assigning each worker to a specific core? It may not help you as much as it did with my core2quad with its cores assembled in pairs, but it's worth a try.[/QUOTE]
As a follow up to that, make sure that both have the same number of workers working on the same number of assignments. |
Huh...
Having been away at camp for a week, I am now back with some interesting observations. When I ran the Minecraft server I host in Windows, suddenly my iteration times went to around .086 for some workers, and up to ~.115 for one thread, which is worse than when I run it in linux, even with the server up. The thing about linux is that there the server is set to start on boot, so when I tested it by killing the gui and killing the server, my guess would be that something about starting the server at boot and then stopping it put all the threads at .086 seconds.
Now for the most interesting part: When I got back, out of some minor sort of ocd, I assigned some TF work for workers 2-6 because (understandably) worker 1 was falling behind. As soon as I did that, the iteration time for worker 1 dropped to .063, for a couple of hours, went up to .071 for an hour, and then settled down to ~.066 seconds per iteration, which is better than I ever got in windows. Now, since last night, one of the TF workers found a factor, and so has resumed LL testing. Now that worker (4, I think) has iteration times around .063-.065 seconds, while worker 1 now is at .076 seconds per iteration. And this is all with people connected to my server, and my gui running as normal. Any thoughts? |
Well, I can't find a way to edit my above post, so I'll put this link here: Thread on why specific cpu assignment isn't working, and I think this will help because of the facts in the post above.
[url]http://mersenneforum.org/showthread.php?p=267455#post267455[/url] Edit: HAHA!!! Now that I've got assignments worked out, it works wonders! Now all my threads average .060-.065 iterations per second. :D I'll need to pretty much reexamine the difference in OS's now that I've got this worked out. |
Even despite having gotten the workers assigned to cpu's, I'm still finding that putting one or more workers on trial factoring speeds up the other LL workers by 2-4 ms per iteration per TF worker. (So that with three TF and three LL, the three LL work ~8 ms per iterations faster than all doing LL (71ms - 63 ms))
|
Remarkable... A couple of days ago, my iteration times spiked to .1 s in Linux and .09 in Windows, a significant hit. I can think of no reason under the sun why that might have happened. So on a hunch just now I started one thread to TF, and now all the other 5 LL are back to normal .065 s iteration times. How weird.
|
[QUOTE=markr;266417] Rather than the software not being optimised for Linux, it may be more that Linux is not as optimised for the hardware. In general, hardware drivers, eg for graphics cards, for Windows have far more resources poured into them by the hardware companies than Linux ones do.[/QUOTE]
Despite the graphic card driver everything else in the Linux core is highly optimized. There were and are countless highly skilled programmers who are thinking about ways to improve the kernel performance. I would rather think the mprime binary isn't compiled with the right options. Maybe the compiler didn't thought about optimization on the Linux part. |
1 Attachment(s)
Here is an example of the same situation, but with Win XP 32 bit versus Win 7 64 bit. The 64 bit version seems to run .005 sec. faster than the 32 bit version.
This is on a dual-boot, Phenom II x6 1090T system. It is currently running at 17x205=~3.5GHz. It has 8GB of RAM running at PC1600. The mobo is an Asus M4A89GTD Pro/USB3. Note that P95 is running the same exponents on both OS's. The exponents are arranged in ascending order for easy comparison. |
| All times are UTC. The time now is 03:08. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.