mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2012-02-17, 15:19   #23
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

191816 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
What about when all 6 cores are running?
Maybe I'm misunderstanding the request, but I think the question is whether there's a slowdown running six one-thread workers on six different jobs
fivemack is offline   Reply With Quote
Old 2012-02-17, 16:31   #24
Robert_47
 
Mar 2009

2×11 Posts
Default

Just FYI, neither version runs on an AMD FX-4100 Bulldozer.
Robert_47 is offline   Reply With Quote
Old 2012-02-17, 16:42   #25
Zero
 
Dec 2011

148 Posts
Default

Here's a run on i5-2500k @ 4.5GHz for comparison. 4GB RAM @ 2133MT/s
Code:
 [Fri Feb 17 23:16:05 2012]
Compare your results to other computers at http://www.mersenne.org/report_benchmarks
Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz
CPU speed: 4429.34 MHz, 4 cores
CPU features: Prefetch, MMX, SSE, SSE2, SSE4, AVX
L1 cache size: 32 KB
L2 cache size: 256 KB, L3 cache size: 6 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 64
Prime95 64-bit version 27.3, RdtscTiming=1
Best time for 768K FFT length: 3.528 ms., avg: 3.556 ms.
Best time for 896K FFT length: 4.288 ms., avg: 4.299 ms.
Best time for 1024K FFT length: 4.817 ms., avg: 4.835 ms.
Best time for 1280K FFT length: 6.145 ms., avg: 6.158 ms.
Best time for 1536K FFT length: 7.547 ms., avg: 7.597 ms.
Best time for 1792K FFT length: 9.048 ms., avg: 9.057 ms.
Best time for 2048K FFT length: 10.071 ms., avg: 10.144 ms.
Best time for 2560K FFT length: 12.760 ms., avg: 12.794 ms.
Best time for 3072K FFT length: 15.845 ms., avg: 15.856 ms.
Best time for 3584K FFT length: 19.112 ms., avg: 19.134 ms.
Best time for 4096K FFT length: 21.419 ms., avg: 21.444 ms.
Best time for 5120K FFT length: 27.735 ms., avg: 27.755 ms.
Best time for 6144K FFT length: 33.359 ms., avg: 33.404 ms.
Best time for 7168K FFT length: 40.513 ms., avg: 40.526 ms.
Best time for 8192K FFT length: 46.788 ms., avg: 46.831 ms.
Timing FFTs using 2 threads.
Best time for 768K FFT length: 1.947 ms., avg: 1.956 ms.
Best time for 896K FFT length: 2.317 ms., avg: 2.457 ms.
Best time for 1024K FFT length: 2.587 ms., avg: 2.743 ms.
Best time for 1280K FFT length: 3.333 ms., avg: 3.344 ms.
Best time for 1536K FFT length: 4.058 ms., avg: 4.316 ms.
Best time for 1792K FFT length: 4.872 ms., avg: 4.898 ms.
Best time for 2048K FFT length: 5.403 ms., avg: 5.732 ms.
Best time for 2560K FFT length: 6.829 ms., avg: 6.868 ms.
Best time for 3072K FFT length: 8.434 ms., avg: 8.447 ms.
Best time for 3584K FFT length: 10.214 ms., avg: 10.232 ms.
Best time for 4096K FFT length: 11.372 ms., avg: 11.385 ms.
Best time for 5120K FFT length: 14.721 ms., avg: 14.776 ms.
Best time for 6144K FFT length: 17.614 ms., avg: 17.627 ms.
Best time for 7168K FFT length: 21.228 ms., avg: 21.244 ms.
Best time for 8192K FFT length: 24.790 ms., avg: 24.846 ms.
Timing FFTs using 3 threads.
Best time for 768K FFT length: 1.346 ms., avg: 1.360 ms.
Best time for 896K FFT length: 1.605 ms., avg: 1.624 ms.
Best time for 1024K FFT length: 1.807 ms., avg: 1.829 ms.
Best time for 1280K FFT length: 2.309 ms., avg: 2.336 ms.
Best time for 1536K FFT length: 2.804 ms., avg: 2.847 ms.
Best time for 1792K FFT length: 3.341 ms., avg: 3.373 ms.
Best time for 2048K FFT length: 3.759 ms., avg: 3.793 ms.
Best time for 2560K FFT length: 4.749 ms., avg: 4.785 ms.
Best time for 3072K FFT length: 5.897 ms., avg: 5.939 ms.
Best time for 3584K FFT length: 7.119 ms., avg: 7.186 ms.
Best time for 4096K FFT length: 8.076 ms., avg: 8.121 ms.
Best time for 5120K FFT length: 10.241 ms., avg: 10.296 ms.
Best time for 6144K FFT length: 12.166 ms., avg: 12.191 ms.
Best time for 7168K FFT length: 14.617 ms., avg: 14.654 ms.
Best time for 8192K FFT length: 17.305 ms., avg: 17.379 ms.
Timing FFTs using 4 threads.
Best time for 768K FFT length: 1.175 ms., avg: 1.182 ms.
Best time for 896K FFT length: 1.407 ms., avg: 1.414 ms.
Best time for 1024K FFT length: 1.586 ms., avg: 1.595 ms.
Best time for 1280K FFT length: 2.027 ms., avg: 2.038 ms.
Best time for 1536K FFT length: 2.452 ms., avg: 2.467 ms.
Best time for 1792K FFT length: 2.949 ms., avg: 2.963 ms.
Best time for 2048K FFT length: 3.260 ms., avg: 3.270 ms.
Best time for 2560K FFT length: 4.160 ms., avg: 4.212 ms.
Best time for 3072K FFT length: 5.050 ms., avg: 5.078 ms.
Best time for 3584K FFT length: 6.244 ms., avg: 6.264 ms.
Best time for 4096K FFT length: 6.917 ms., avg: 6.943 ms.
Best time for 5120K FFT length: 8.485 ms., avg: 8.560 ms.
Best time for 6144K FFT length: 10.061 ms., avg: 10.113 ms.
Best time for 7168K FFT length: 11.901 ms., avg: 11.986 ms.
Best time for 8192K FFT length: 14.386 ms., avg: 14.399 ms.
Best time for 61 bit trial factors: 1.715 ms.
Best time for 62 bit trial factors: 1.731 ms.
Best time for 63 bit trial factors: 1.957 ms.
Best time for 64 bit trial factors: 2.029 ms.
Best time for 65 bit trial factors: 2.376 ms.
Best time for 66 bit trial factors: 2.804 ms.
Best time for 67 bit trial factors: 2.776 ms.
Best time for 75 bit trial factors: 2.702 ms.
Best time for 76 bit trial factors: 2.698 ms.
Best time for 77 bit trial factors: 2.699 ms.
JH, it would be interesting to see your results with HT disabled.
Zero is offline   Reply With Quote
Old 2012-02-17, 16:55   #26
bcp19
 
bcp19's Avatar
 
Oct 2011

7·97 Posts
Default

Just ported the new version to my 2400. System has 2 cores running mfaktc, 1 doing LL on a 46M and 1 doing P-1. LL iterations dropped from .024 to .019. Nice speed boost.
Am I right in thinking the P-1 coding should be unaffected? I had the LL on 27.2 and the P-1 on 26.6 due to memory, but with this being a 64 bit build, I can now run both on 27.3.

2500K bench:
Code:
Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz
CPU speed: 4260.11 MHz, 4 cores
CPU features: Prefetch, MMX, SSE, SSE2, SSE4, AVX
L1 cache size: 32 KB
L2 cache size: 256 KB, L3 cache size: 6 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 64
Prime95 64-bit version 27.3, RdtscTiming=1
Best time for 768K FFT length: 3.677 ms., avg: 3.826 ms.
Best time for 896K FFT length: 4.489 ms., avg: 4.539 ms.
Best time for 1024K FFT length: 5.041 ms., avg: 5.064 ms.
Best time for 1280K FFT length: 6.452 ms., avg: 6.476 ms.
Best time for 1536K FFT length: 7.924 ms., avg: 8.121 ms.
Best time for 1792K FFT length: 9.499 ms., avg: 9.518 ms.
Best time for 2048K FFT length: 10.590 ms., avg: 10.612 ms.
Best time for 2560K FFT length: 13.410 ms., avg: 13.509 ms.
Best time for 3072K FFT length: 16.680 ms., avg: 16.714 ms.
Best time for 3584K FFT length: 20.142 ms., avg: 20.180 ms.
Best time for 4096K FFT length: 22.639 ms., avg: 22.790 ms.
Best time for 5120K FFT length: 29.448 ms., avg: 29.816 ms.
Best time for 6144K FFT length: 35.307 ms., avg: 35.353 ms.
Best time for 7168K FFT length: 42.849 ms., avg: 42.887 ms.
Best time for 8192K FFT length: 49.787 ms., avg: 49.882 ms.
Timing FFTs using 2 threads.
Best time for 768K FFT length: 2.037 ms., avg: 2.138 ms.
Best time for 896K FFT length: 2.429 ms., avg: 2.466 ms.
Best time for 1024K FFT length: 2.724 ms., avg: 2.755 ms.
Best time for 1280K FFT length: 3.529 ms., avg: 3.551 ms.
Best time for 1536K FFT length: 4.288 ms., avg: 4.332 ms.
Best time for 1792K FFT length: 5.137 ms., avg: 5.180 ms.
Best time for 2048K FFT length: 5.752 ms., avg: 5.786 ms.
Best time for 2560K FFT length: 7.260 ms., avg: 7.306 ms.
Best time for 3072K FFT length: 8.987 ms., avg: 9.026 ms.
Best time for 3584K FFT length: 10.890 ms., avg: 11.477 ms.
Best time for 4096K FFT length: 12.189 ms., avg: 12.224 ms.
Best time for 5120K FFT length: 15.746 ms., avg: 16.710 ms.
Best time for 6144K FFT length: 18.816 ms., avg: 19.186 ms.
Best time for 7168K FFT length: 22.616 ms., avg: 23.328 ms.
Best time for 8192K FFT length: 26.472 ms., avg: 26.990 ms.
Timing FFTs using 3 threads.
Best time for 768K FFT length: 1.420 ms., avg: 1.455 ms.
Best time for 896K FFT length: 1.708 ms., avg: 1.758 ms.
Best time for 1024K FFT length: 1.955 ms., avg: 1.998 ms.
Best time for 1280K FFT length: 2.532 ms., avg: 2.578 ms.
Best time for 1536K FFT length: 3.118 ms., avg: 3.171 ms.
Best time for 1792K FFT length: 3.683 ms., avg: 3.720 ms.
Best time for 2048K FFT length: 4.193 ms., avg: 4.600 ms.
Best time for 2560K FFT length: 5.345 ms., avg: 5.787 ms.
Best time for 3072K FFT length: 6.541 ms., avg: 6.711 ms.
Best time for 3584K FFT length: 8.069 ms., avg: 8.950 ms.
Best time for 4096K FFT length: 9.011 ms., avg: 9.303 ms.
Best time for 5120K FFT length: 11.368 ms., avg: 11.669 ms.
Best time for 6144K FFT length: 13.815 ms., avg: 14.036 ms.
Best time for 7168K FFT length: 16.336 ms., avg: 16.566 ms.
Best time for 8192K FFT length: 19.011 ms., avg: 19.205 ms.
Timing FFTs using 4 threads.
Best time for 768K FFT length: 1.289 ms., avg: 1.307 ms.
Best time for 896K FFT length: 1.572 ms., avg: 1.589 ms.
Best time for 1024K FFT length: 1.773 ms., avg: 1.825 ms.
Best time for 1280K FFT length: 2.309 ms., avg: 2.366 ms.
Best time for 1536K FFT length: 2.817 ms., avg: 3.212 ms.
Best time for 1792K FFT length: 3.364 ms., avg: 3.433 ms.
Best time for 2048K FFT length: 3.795 ms., avg: 3.886 ms.
Best time for 2560K FFT length: 4.860 ms., avg: 5.039 ms.
Best time for 3072K FFT length: 5.842 ms., avg: 6.479 ms.
Best time for 3584K FFT length: 7.207 ms., avg: 7.550 ms.
Best time for 4096K FFT length: 8.130 ms., avg: 8.508 ms.
Best time for 5120K FFT length: 10.159 ms., avg: 10.619 ms.
Best time for 6144K FFT length: 12.097 ms., avg: 13.624 ms.
Best time for 7168K FFT length: 14.258 ms., avg: 14.404 ms.
Best time for 8192K FFT length: 16.324 ms., avg: 16.675 ms.
Best time for 61 bit trial factors: 1.787 ms.
Best time for 62 bit trial factors: 1.796 ms.
Best time for 63 bit trial factors: 2.036 ms.
Best time for 64 bit trial factors: 2.107 ms.
Best time for 65 bit trial factors: 2.468 ms.
Best time for 66 bit trial factors: 2.911 ms.
Best time for 67 bit trial factors: 2.886 ms.
Best time for 75 bit trial factors: 2.809 ms.
Best time for 76 bit trial factors: 2.814 ms.
Best time for 77 bit trial factors: 2.810 ms.

Last fiddled with by bcp19 on 2012-02-17 at 17:09
bcp19 is offline   Reply With Quote
Old 2012-02-17, 17:12   #27
Lennart
 
Lennart's Avatar
 
"Lennart"
Jun 2007

25×5×7 Posts
Default

Quote:
Originally Posted by Robert_47 View Post
Just FYI, neither version runs on an AMD FX-4100 Bulldozer.
What software did you use ?


Lennart
Lennart is offline   Reply With Quote
Old 2012-02-17, 17:15   #28
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

19×397 Posts
Default

Quote:
Originally Posted by fivemack View Post
Maybe I'm misunderstanding the request, but I think the question is whether there's a slowdown running six one-thread workers on six different jobs
That is correct. James, try adding "TimingOutput=4" to prime.txt. Restart aand run just one worker. Note the per-iteration times. Now start the second worker, note times, etc.. Do the workers slow down a lot?

On my machine (all workers running 2400K FFTs), I get times of 1 worker - 13.7ms, 2 workers - 13.9ms, 3 workers - 14.5ms, 4 workers - 16.6ms.
Prime95 is offline   Reply With Quote
Old 2012-02-17, 17:17   #29
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

1D7716 Posts
Default

Quote:
Originally Posted by Robert_47 View Post
Just FYI, neither version runs on an AMD FX-4100 Bulldozer.
Grrrr. Does Options/CPU identify the chip as supporting AVX?

If not, can you add the line "CpuSupportsAVX=1" to local.ini and let me know if your benchmarks indicate prime95 runs faster with AVX vs. v26 using SSE2? Thanks.
Prime95 is offline   Reply With Quote
Old 2012-02-17, 17:24   #30
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

19·397 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
Both v26.6 and v27.3 detected my 4500MHz Sandy-E as 4428MHz.

And both still don't recognize the architecture:

Is this a speed-testing alpha, or should it be considered a semi-stable beta and suitable for production work?
I got the family/model number from cpu-world.com. You'll get recognized properly in the next release.

I think this version is fairly stable and suitable for production work.
Prime95 is offline   Reply With Quote
Old 2012-02-17, 17:30   #31
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

19×397 Posts
Default

Quote:
Originally Posted by bcp19 View Post
Am I right in thinking the P-1 coding should be unaffected?
If you are asking: "Can I use 27.3 and resume a P-1 that was partially completed by an earlier version?" The answer is yes.
Prime95 is offline   Reply With Quote
Old 2012-02-17, 17:47   #32
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

624910 Posts
Default

Quote:
Originally Posted by Prime95 View Post
That is correct. James, try adding "TimingOutput=4" to prime.txt. Restart aand run just one worker. Note the per-iteration times. Now start the second worker, note times, etc.. Do the workers slow down a lot?

On my machine (all workers running 2400K FFTs), I get times of 1 worker - 13.7ms, 2 workers - 13.9ms, 3 workers - 14.5ms, 4 workers - 16.6ms.
Just curious, would this memory-bandwidth bottleneck affect (relatively) very small FFTs that fit entirely in-cache (or something like that...I forget exactly how it works), such as those often used with LLR?

Last fiddled with by mdettweiler on 2012-02-17 at 17:47
mdettweiler is offline   Reply With Quote
Old 2012-02-17, 17:58   #33
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

1D7716 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
Just curious, would this memory-bandwidth bottleneck affect (relatively) very small FFTs that fit entirely in-cache (or something like that...I forget exactly how it works), such as those often used with LLR?
Probably not. My L3 cache is 6MB, or 1.5 MB / core. A float is 8 bytes. So the max FFT size is 192K. The sin/cos data, the program itself, the OS will all want memory too. Maybe a 128K FFT will fit in the L3 cache. At 20 bits per float, you might can test 2.5 million bit numbers. If you try this, let me know if you see a slow down as you run more workers.
Prime95 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Prime95 version 29.4 Prime95 Software 442 2021-08-05 22:28
Prime95 version 26.3 Prime95 Software 76 2010-12-11 00:11
Prime95 version 25.5 Prime95 PrimeNet 369 2008-02-26 05:21
Prime95 version 25.4 Prime95 PrimeNet 143 2007-09-24 21:01
When the next prime95 version ? pacionet Software 74 2006-12-07 20:30

All times are UTC. The time now is 23:24.


Fri Aug 6 23:24:34 UTC 2021 up 14 days, 17:53, 1 user, load averages: 4.24, 4.13, 4.07

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.