mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   MPrime Significantly Faster than P95 (https://www.mersenneforum.org/showthread.php?t=16289)

Dubslow 2011-12-04 22:10

MPrime Significantly Faster than P95
 
I had experienced this a couple of months ago; I was running DC's then, and was consistently getting 50% faster times. Attached is a screenshot from just now, my first time in Linux since a couple of months ago.
[url]http://filesmelt.com/dl/screenshot281.png[/url]

I'll come back sometime within 24 hours with a P95 screenshot. I've been getting about 27-28 ms for all three threads there. Not running mfaktc ATM, that's gonna take me a while to get working in Linux.

Dubslow 2011-12-04 23:49

Edit: With mfaktc-linux running, Worker 2 reported 19-20ms iteration times, while the other two stayed the same.

Note: Windows detects AffinityScramble2=01234567 and that's what I put in local.txt because it didn't always detect HT threads correctly. MPrime usually reports 04152637, but was overidden with the Windows settings, which is how it's currently running. Have not tested it in 'normal' configuration (will do so soon). (mfaktc affinity is set to 6 when running here and in Winblows)

Edit: Just tried running MPrime without the AffinityScramble, and sure enough, it detected 04152637, and here's the timings: [url]http://filesmelt.com/dl/screenshot291.png[/url]
Clearly, for whatever reason, not pairing the cores up correctly is correlated with bizzarely huge performance increases.

Edit2: Not huge performance increase; (26+26+14/3)=22, whereas (24+24+24)/3=24 . Still, that is an increase, and the manner in which it comes is very bizzare.

Dubslow 2011-12-05 07:36

So confused. Apparently MPrime is affecting mfaktc, even when it shouldn't... I'll have to come back to this tomorrow. See [URL="http://mersenneforum.org/showthread.php?p=281006"]here[/URL].

axn 2011-12-05 07:42

[QUOTE=Dubslow;281011]Edit2: Not huge performance increase; (26+26+14/3)=22, whereas (24+24+24)/3=24 . Still, that is an increase, and the manner in which it comes is very bizzare.[/QUOTE]

Hmmm... 3/(1/26+1/26+1/14) = 20.2. Not 22.

Do you have turbo core (i3, i5, i7), by any chance?

EDIT:- You should have an i7, considering that you've 8 "cores", right?. Turning off HT might give you consistent timings.

Bdot 2011-12-05 12:38

[QUOTE=Dubslow;281011]
Note: Windows detects AffinityScramble2=01234567 and that's what I put in local.txt because it didn't always detect HT threads correctly. MPrime usually reports 04152637, but was overidden with the Windows settings,

Edit: Just tried running MPrime without the AffinityScramble, and sure enough, it detected 04152637, and here's the timings: [URL]http://filesmelt.com/dl/screenshot291.png[/URL]
[/QUOTE]

Get yourself a tool that can report the core multiplier and load (like ThrottleStop on Windows, not sure for Linux, "top" may be a start). Then, start each mfaktc instance and each thread of mprime/prime95 separately to see on which HT, and on which core they end up. The 26+26+14 seems to indicate that the first two are running on the same core ... 24+24+24 seems they are running on a core where the other HT is busy as well, but maybe not with prime95 (and therefore have a little more room to breathe).
Another thing is that I have a laptop where the BIOS disables SpeedStep/Turbo/dynamic overclocking or whatever the name is for my i7. Using ThrottleStop I enable it and run all cores with a 23 clock multiplier instead of just 20. Maybe your two OSes handle that differently, too.

Dubslow 2011-12-05 21:50

In Windows, CPU-Z usually reports the (stock) multiplier of 34 (i.e. no turbo boosting). I found something I can pin to Ubuntu's panel, but as far as I can tell it only measures when the multiplier drops, and can't measure turbo boost. Because that's BIOS/Temperature based though, I would assume it's the same. Unfortunately my mobo won't let me set the base multiplier, it'll only let me screw with the Turbo multipliers (which I've kept at default due to crappy stock heatsink, though a 212+ with my name on it is somewhere between here and New Jersey).

As far as HT, I have it set to two threads per worker, and when I originally tested it over the summer, I got about the same speeds this was as with HT disabled.

@Bdot: As I said, I didn't get 14 ms timings even when I turned off HT and ran only four cores over the summer. The only way that would make sense is if both Prime95 and MPrime weren't setting the affinities correctly, which I doubt. I'm going to run some more detailed experimenting later today, and I'll do it while watching each thread start in the core load sections of the 'task manager' in ubuntu.

Dubslow 2011-12-06 05:00

Well, got mfaktc figured out at least. Set that to run on 7.

Otherwise MPrime is setting affinities as reported, so nothing funky that way. When I run AS2=01234567, the worker that runs 23 (that would be #2) consistently gets faster iterations, and it's always that worker.

Also MPrime just randomly unreserved a bunch of work from Worker 1 for no particular reason that I can see. I've had them there for at least a week and a half. LaurV also had this problem recently, and I understand why it pissed him off so much... I restarted it and it decided that this time, all the work on Worker 1 was dumb, and I shouldn't have it. The other workers are fine.

debrouxl 2011-12-06 08:02

On Linux, the "i7z" tool ( [url]https://code.google.com/p/i7z/[/url] ) enables gathering the frequency and multipliers on Core i3/i5/i7.

Dubslow 2011-12-06 15:44

Cool, thanks.


I'm getting a compiler error -- [code]i7z_Single_Socket.c: In function ‘print_i7z_socket_single’:
i7z_Single_Socket.c:557:16: warning: unused variable ‘time_to_save’
i7z_Dual_Socket.c: In function ‘print_i7z_socket’:
i7z_Dual_Socket.c:495:9: warning: implicit declaration of function ‘logCpuFreq_dual_d’
[/code]


All times are UTC. The time now is 14:02.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.