mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2012-02-17, 20:33   #45
drh
 
drh's Avatar
 
Jan 2011
Cincinnati, OH

22·52 Posts
Default

I'm seeing similar results. My i5-2500K has dropped from .032 to .022 per iteration on 1 core, the P-1 Stage 2 dropped from 450 sec to 355 sec, also on 1 core, with 2 cores of mfaktc running on the other 2 cores, with no OC.

Huge improvement, great job!
Doug
drh is offline   Reply With Quote
Old 2012-02-17, 20:58   #46
monst
 
monst's Avatar
 
Mar 2007

2638 Posts
Default

Here's what I'm seeing for iteration times on my i5-2500K running 2 instances
of Prime95 (on M26161123 and M26161217) and 2 instances of mfaktc.
(The chip is overclocked to 4.5 GHz.)

26.6 (64-bit) --> 12.4 ms

27.2 (32-bit) --> 9.8 ms

27.3 (32-bit) --> 9.6 ms
27.3 (64-bit) --> 9.1 ms

Nice improvement!!
monst is offline   Reply With Quote
Old 2012-02-17, 23:15   #47
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

191716 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
With hyperthreading disabled:1: 12.8ms
2: 13.2ms
3: 13.6ms
4: 14.8ms
5: 16.7ms
6: ~19ms (ranges from 18.3 to 21.1 in different workers)
I would be intrigued to see, if it's not too awkward to run, what the speed-as-you-add-more-workers progression is for v26.6.3.
fivemack is offline   Reply With Quote
Old 2012-02-17, 23:53   #48
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

11×311 Posts
Default

Quote:
Originally Posted by fivemack View Post
I would be intrigued to see, if it's not too awkward to run, what the speed-as-you-add-more-workers progression is for v26.6.3.
v26.6.3 vs v27.1.3 (both 64-bit):
1: 21.1ms vs 12.8ms (65% faster)
2: 21.2ms vs 13.2ms (61% faster)
3: 21.4ms vs 13.6ms (57% faster)
4: 21.5ms vs 14.8ms (45% faster)
5: 22.0ms vs 16.7ms (32% faster)
6: 22.6ms vs 19.0ms (19% faster)

edit: just realized I ran the 27.1.3 with Hyperthreading disabled, and v26.6.3 with it enabled
Will re-run benchmarks later.

Last fiddled with by James Heinrich on 2012-02-18 at 00:34
James Heinrich is offline   Reply With Quote
Old 2012-02-18, 00:18   #49
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

642310 Posts
Default

These numbers are a bit confusing to analyse because everyone seems to be running their machines at different clock speeds and memory speeds.

I appreciate that it involves multiple reboots and makes running other jobs difficult while you're doing it, but I think that a conclusive analysis of the effect of memory bandwidth really would benefit from benchmarks from 27.3 at two CPU multipliers as far apart as possible, with memory speed kept the same, and turbo and hyperthreading turned off in both cases.

(really ideally would also be data points at two different memory speeds with CPU multiplier kept the same, but I don't know if X79 BIOSes allow you to set that conveniently)

The idea's to solve for runtime as A + B/cpuspeed + C/memoryspeed and see if anything interesting shows up in the values of A, B and C. I've done this analysis with the SPEC99 benchmarks to divide them into CPU-intensive and memory-intensive ones.

Last fiddled with by fivemack on 2012-02-18 at 00:19
fivemack is offline   Reply With Quote
Old 2012-02-18, 00:27   #50
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

2·3,767 Posts
Default

Linux executables should be available. Untested. Sometimes Primenet doesn't recognize new versions, but I have to do some evening entertaining right now.

P.S. This is the second time Ubuntu has toasted the root disk. Arcane fsck command restored it both times. I don't know how a novice user would ever recover...

Last fiddled with by Prime95 on 2012-02-18 at 00:31
Prime95 is online now   Reply With Quote
Old 2012-02-18, 00:43   #51
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

11×311 Posts
Default

Quote:
Originally Posted by fivemack View Post
I think that a conclusive analysis of the effect of memory bandwidth really would benefit from benchmarks from 27.3 at two CPU multipliers as far apart as possible, with memory speed kept the same, and turbo and hyperthreading turned off in both cases. (really ideally would also be data points at two different memory speeds with CPU multiplier kept the same, but I don't know if X79 BIOSes allow you to set that conveniently)
I'll see if I can get you this data tomorrow.
James Heinrich is offline   Reply With Quote
Old 2012-02-18, 01:54   #52
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

226658 Posts
Default

Quote:
Originally Posted by fivemack View Post
Maybe I'm misunderstanding the request, but I think the question is whether there's a slowdown running six one-thread workers on six different jobs
You are right, the memory traffic would be higher in that case. There is no slowdown for me when running 4 different workers, doing 4 different jobs on 4 physical cores (alone, or used as 8 logical, with helpers) compared with 2.72, I would say it is a copper faster. And much faster than 2.65/66.
LaurV is offline   Reply With Quote
Old 2012-02-18, 07:07   #53
firejuggler
 
firejuggler's Avatar
 
Apr 2010
Over the rainbow

2×1,303 Posts
Default

i5-2500K, stock speed , windows 7 home premium
Code:
Compare your results to other computers at http://www.mersenne.org/report_benchmarks
Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz
CPU speed: 3336.07 MHz, 4 cores
CPU features: Prefetch, MMX, SSE, SSE2, SSE4, AVX
L1 cache size: 32 KB
L2 cache size: 256 KB, L3 cache size: 6 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 64
Prime95 64-bit version 27.3, RdtscTiming=1
Best time for 768K FFT length: 4.720 ms., avg: 4.901 ms.
Best time for 896K FFT length: 5.770 ms., avg: 5.987 ms.
Best time for 1024K FFT length: 6.469 ms., avg: 6.606 ms.
Best time for 1280K FFT length: 8.261 ms., avg: 8.452 ms.
Best time for 1536K FFT length: 10.157 ms., avg: 10.394 ms.
Best time for 1792K FFT length: 12.174 ms., avg: 12.454 ms.
Best time for 2048K FFT length: 13.578 ms., avg: 13.884 ms.
Best time for 2560K FFT length: 17.210 ms., avg: 17.559 ms.
Best time for 3072K FFT length: 21.431 ms., avg: 21.703 ms.
Best time for 3584K FFT length: 25.954 ms., avg: 26.447 ms.
Best time for 4096K FFT length: 29.306 ms., avg: 29.481 ms.
Best time for 5120K FFT length: 37.961 ms., avg: 38.296 ms.
Best time for 6144K FFT length: 45.709 ms., avg: 47.362 ms.
Best time for 7168K FFT length: 55.606 ms., avg: 55.943 ms.
Best time for 8192K FFT length: 63.926 ms., avg: 64.368 ms.
Timing FFTs using 2 threads.
Best time for 768K FFT length: 2.611 ms., avg: 2.691 ms.
Best time for 896K FFT length: 3.123 ms., avg: 3.200 ms.
Best time for 1024K FFT length: 3.512 ms., avg: 3.643 ms.
Best time for 1280K FFT length: 4.557 ms., avg: 4.876 ms.
Best time for 1536K FFT length: 5.513 ms., avg: 5.684 ms.
Best time for 1792K FFT length: 6.637 ms., avg: 6.861 ms.
Best time for 2048K FFT length: 7.410 ms., avg: 7.569 ms.
Best time for 2560K FFT length: 9.301 ms., avg: 9.652 ms.
Best time for 3072K FFT length: 11.599 ms., avg: 11.857 ms.
Best time for 3584K FFT length: 14.025 ms., avg: 14.427 ms.
Best time for 4096K FFT length: 15.921 ms., avg: 16.137 ms.
Best time for 5120K FFT length: 20.980 ms., avg: 23.371 ms.
Best time for 6144K FFT length: 24.257 ms., avg: 24.651 ms.
Best time for 7168K FFT length: 29.402 ms., avg: 29.815 ms.
Best time for 8192K FFT length: 34.335 ms., avg: 34.642 ms.
Timing FFTs using 3 threads.
Best time for 768K FFT length: 1.838 ms., avg: 1.916 ms.
Best time for 896K FFT length: 2.223 ms., avg: 2.361 ms.
Best time for 1024K FFT length: 2.547 ms., avg: 3.664 ms.
Best time for 1280K FFT length: 3.276 ms., avg: 3.400 ms.
Best time for 1536K FFT length: 4.017 ms., avg: 4.120 ms.
Best time for 1792K FFT length: 4.768 ms., avg: 4.914 ms.
Best time for 2048K FFT length: 5.392 ms., avg: 5.509 ms.
Best time for 2560K FFT length: 6.934 ms., avg: 7.173 ms.
Best time for 3072K FFT length: 8.511 ms., avg: 8.696 ms.
Best time for 3584K FFT length: 10.403 ms., avg: 10.938 ms.
Best time for 4096K FFT length: 11.650 ms., avg: 12.018 ms.
Best time for 5120K FFT length: 14.835 ms., avg: 15.071 ms.
Best time for 6144K FFT length: 17.789 ms., avg: 18.049 ms.
Best time for 7168K FFT length: 21.164 ms., avg: 21.397 ms.
Best time for 8192K FFT length: 24.641 ms., avg: 25.339 ms.
Timing FFTs using 4 threads.
Best time for 768K FFT length: 1.670 ms., avg: 1.723 ms.
Best time for 896K FFT length: 2.032 ms., avg: 2.079 ms.
Best time for 1024K FFT length: 2.305 ms., avg: 2.403 ms.
Best time for 1280K FFT length: 2.997 ms., avg: 3.066 ms.
Best time for 1536K FFT length: 3.637 ms., avg: 5.094 ms.
Best time for 1792K FFT length: 4.386 ms., avg: 4.509 ms.
Best time for 2048K FFT length: 4.895 ms., avg: 7.237 ms.
Best time for 2560K FFT length: 6.309 ms., avg: 6.483 ms.
Best time for 3072K FFT length: 7.560 ms., avg: 7.742 ms.
Best time for 3584K FFT length: 9.366 ms., avg: 9.645 ms.
Best time for 4096K FFT length: 10.515 ms., avg: 11.590 ms.
Best time for 5120K FFT length: 13.031 ms., avg: 13.184 ms.
[Fri Feb 17 22:29:50 2012]
Best time for 6144K FFT length: 15.449 ms., avg: 15.707 ms.
Best time for 7168K FFT length: 18.263 ms., avg: 18.686 ms.
Best time for 8192K FFT length: 21.290 ms., avg: 21.571 ms.
Best time for 61 bit trial factors: 2.294 ms.
Best time for 62 bit trial factors: 2.309 ms.
Best time for 63 bit trial factors: 2.607 ms.
Best time for 64 bit trial factors: 2.698 ms.
Best time for 65 bit trial factors: 3.169 ms.
Best time for 66 bit trial factors: 3.740 ms.
Best time for 67 bit trial factors: 3.709 ms.
Best time for 75 bit trial factors: 3.614 ms.
Best time for 76 bit trial factors: 3.596 ms.
Best time for 77 bit trial factors: 3.633 ms.
tldr : more than 3 core on one task is useless

Last fiddled with by firejuggler on 2012-02-18 at 07:08
firejuggler is online now   Reply With Quote
Old 2012-02-18, 07:58   #54
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

3·2,141 Posts
Default

Quote:
Originally Posted by firejuggler View Post
i5-2500K, stock speed , windows 7 home premium
Useful data - what's the memory speed here?

(I think you can see something like a memory-bandwidth effect by comparing this to the 4429/2133 i5/2500K data and seeing that the speed ratio goes down as the number of threads go up - the 4429/2133 is 48% faster at 4 threads and only 34% faster at 1 thread - but that would imply that the memory on the i5/2500K is 1333MHz, so if it isn't I'll have to revise my analysis)
fivemack is offline   Reply With Quote
Old 2012-02-18, 13:50   #55
firejuggler
 
firejuggler's Avatar
 
Apr 2010
Over the rainbow

1010001011102 Posts
Default

Yes, memory speed is 1333Mhz.
firejuggler is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Prime95 version 29.4 Prime95 Software 441 2020-02-16 15:18
Prime95 version 26.3 Prime95 Software 76 2010-12-11 00:11
Prime95 version 25.5 Prime95 PrimeNet 369 2008-02-26 05:21
Prime95 version 25.4 Prime95 PrimeNet 143 2007-09-24 21:01
When the next prime95 version ? pacionet Software 74 2006-12-07 20:30

All times are UTC. The time now is 17:50.


Sun Aug 1 17:50:32 UTC 2021 up 9 days, 12:19, 0 users, load averages: 3.20, 2.45, 1.98

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.