mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > PrimeNet

Reply
 
Thread Tools
Old 2014-10-07, 02:50   #45
axn
 
axn's Avatar
 
Jun 2003

2·3·7·112 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
If all 4 doing TF is slower per worker than 2 doing TF, you have a heat problem- probably a misaligned heatsink, maybe a non-functioning CPU fan. Testing all TF removes the memory bottleneck, I believe.

Two LLs could saturate single-stick memory throughput, but I would think TF's lighter memory use would allow LL to much-less-than-double when you add two TFs to 2 LLs; so I think you have a serious heat problem, rather than just a memory bottleneck.
I think you missed the post title which says "Well with 4 workers doing LL .... IT REALLY SUCKS".
axn is offline   Reply With Quote
Old 2014-10-07, 02:55   #46
axn
 
axn's Avatar
 
Jun 2003

13DA16 Posts
Default

Quote:
Originally Posted by petrw1 View Post
I'm am going to guess about the same and in the end I will be running TF on 2 workers and LL on the other 2.
Another (potential) option would be to run small exponent ECM on two of the cores (instead of TF), which would run out of cache and thus wouldn't contribute to the memory bottleneck. The reason I am recommending ECM over TF on a CPU is ...

If you don't mind, could you try out that combination as well to see what is the performance hit to the LLs?
axn is offline   Reply With Quote
Old 2014-10-07, 03:38   #47
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

22·1,217 Posts
Default

Quote:
Originally Posted by axn View Post
I think you missed the post title which says "Well with 4 workers doing LL .... IT REALLY SUCKS".
No, I was suggesting a way to possibly narrow down whether it's a memory bottleneck from single-channel memory or an instant-overheat problem. I agree that ECM is a smarter way to reduce memory contention- I forgot that's a part of GIMPS these days.
VBCurtis is offline   Reply With Quote
Old 2014-10-08, 04:15   #48
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

125416 Posts
Default Test #1 ... more extreme than expected.

With 4 workers doing LL/DC
Worker #2: 35M DC Ms/Iter: 34.34
Worker #4: 68M LL Ms/Iter: 66.02
(#1 is doing 35M DC; #3 is doing 55M LL both with relatively similar Iter. Times)

With only Workers #2 and #4 running.
Worker #2: 35M DC Ms/Iter: 16.30
Worker #4: 68M LL Ms/Iter: 32.42

In both cases LESS than half the time per iteration.
The difference was immediate....as soon as workers #1 and #3 were stopped the iteration times dropped ... in fact they dropped to just OVER half the time per iteration initially and then a few percent faster yet over the next few hours before they stabilized at the above posted times.

That being said, even with only 2 workers running they are still quite a bit over the benchmark I ran only a few weeks ago.

Benchmark times are almost 40% faster yet:
2048 FFT as fast as 10.90 Ms (DC above using 1920 FFT)
3584 FFT as fast as 20.33 Ms

I would like to believe that barring a REALLY bad installation that running 2 out of 4 cores would not be enough to over heat the CPU.

PS There seems to be some native Turbo Boost (OC going on). The chip is rated at 3.4 Ghz but running at 3.7 Ghz.

SO....
Next I can add TF and/or ECM to workers #1 and #3.
I know what will happen with TF because I started the PC with workers #1 and #2 doing TF inherited from the 2 cores predecessor and workers #3 and #4 doing LL/DC with very close to the same iteration times as above with only 2 cores running.
So I know I can add TF without impacting LL/DC...

As for ECM; axn is also suggesting it will NOT slow down the LL/DC.
If that is the case I think I will let then run ECM and leave TF for the GPUs.

I have 800MB allocated to Prime95 (out of 4GB) ... I believe that is more than enough for ECM (Not ECM-Fermat).
petrw1 is offline   Reply With Quote
Old 2014-10-08, 14:28   #49
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23×271 Posts
Default

Assuming you have the latest Prime95, try the benchmark and post your results here?

Last fiddled with by kracker on 2014-10-08 at 14:29
kracker is offline   Reply With Quote
Old 2014-10-08, 15:03   #50
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

22·3·17·23 Posts
Default

Quote:
Originally Posted by kracker View Post
Assuming you have the latest Prime95, try the benchmark and post your results here?
Benchmark attached...note both lines show 26.5 in the benchmark but in fact it was upgraded to 28.5 for the second benchmark (hence the MUCH better numbers). Apparently there is some issue with the benchmark software that may not note the version change if it is upgraded and a benchmark run right after???

Code:
Last Activity	2014-10-07 23:59, Updated 2014-10-07 23:59, Registered 2014-08-25 17:48
GUID	63D1FAC485154FFF8132C68B36D70C9D
Software Version	Windows64,Prime95,v28.5,build 2
Model	Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
Features	4 core, hyperthreaded, Prefetch,SSE,SSE2,SSE4,AVX,AVX2,FMA,
Speed	3.392 GHz (19.762 GHz P4 effective equivalent)
L1/L2 Cache	32 / 256 KB
Computer Memory	4008 MB   configured usage 800 MB day / 800 MB night
Attached Thumbnails
Click image for larger version

Name:	BenchMark.png
Views:	94
Size:	68.3 KB
ID:	11806  
petrw1 is offline   Reply With Quote
Old 2014-10-08, 15:38   #51
pinhodecarlos
 
pinhodecarlos's Avatar
 
"Carlos Pinho"
Oct 2011
Milton Keynes, UK

3×17×97 Posts
Default

Your CPU is dying....
pinhodecarlos is offline   Reply With Quote
Old 2014-10-08, 15:43   #52
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23×271 Posts
Default

Quote:
Originally Posted by petrw1 View Post
Benchmark attached...note both lines show 26.5 in the benchmark but in fact it was upgraded to 28.5 for the second benchmark (hence the MUCH better numbers). Apparently there is some issue with the benchmark software that may not note the version change if it is upgraded and a benchmark run right after???

Code:
Last Activity	2014-10-07 23:59, Updated 2014-10-07 23:59, Registered 2014-08-25 17:48
GUID	63D1FAC485154FFF8132C68B36D70C9D
Software Version	Windows64,Prime95,v28.5,build 2
Model	Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
Features	4 core, hyperthreaded, Prefetch,SSE,SSE2,SSE4,AVX,AVX2,FMA,
Speed	3.392 GHz (19.762 GHz P4 effective equivalent)
L1/L2 Cache	32 / 256 KB
Computer Memory	4008 MB   configured usage 800 MB day / 800 MB night
Hmm, I see.

Well, what I really meant was for you to post the log of Prime95's benchmark(in results.txt I think after it's finished)

Here's mine for example:
Code:
Compare your results to other computers at http://www.mersenne.org/report_benchmarks
Intel(R) Core(TM) i5-4670K CPU @ 3.40GHz
CPU speed: 3534.93 MHz, 4 cores
CPU features: Prefetch, SSE, SSE2, SSE4, AVX, AVX2, FMA
L1 cache size: 32 KB
L2 cache size: 256 KB, L3 cache size: 6 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 64
Prime95 64-bit version 28.5, RdtscTiming=1
Best time for 1024K FFT length: 3.806 ms., avg: 3.890 ms.
Best time for 1280K FFT length: 4.897 ms., avg: 5.268 ms.
Best time for 1536K FFT length: 5.942 ms., avg: 7.670 ms.
Best time for 1792K FFT length: 7.161 ms., avg: 9.124 ms.
Best time for 2048K FFT length: 8.152 ms., avg: 8.240 ms.
Best time for 2560K FFT length: 10.399 ms., avg: 12.288 ms.
Best time for 3072K FFT length: 12.516 ms., avg: 12.628 ms.
Best time for 3584K FFT length: 14.924 ms., avg: 15.013 ms.
Best time for 4096K FFT length: 17.032 ms., avg: 20.831 ms.
Best time for 5120K FFT length: 21.642 ms., avg: 23.980 ms.
Best time for 6144K FFT length: 26.061 ms., avg: 28.492 ms.
Best time for 7168K FFT length: 31.057 ms., avg: 32.707 ms.
Best time for 8192K FFT length: 35.937 ms., avg: 36.401 ms.
Timing FFTs using 2 threads.
Best time for 1024K FFT length: 2.057 ms., avg: 2.094 ms.
Best time for 1280K FFT length: 2.613 ms., avg: 4.898 ms.
Best time for 1536K FFT length: 3.181 ms., avg: 3.390 ms.
Best time for 1792K FFT length: 3.828 ms., avg: 4.099 ms.
Best time for 2048K FFT length: 4.351 ms., avg: 4.538 ms.
Best time for 2560K FFT length: 5.509 ms., avg: 5.555 ms.
Best time for 3072K FFT length: 6.687 ms., avg: 7.799 ms.
Best time for 3584K FFT length: 7.878 ms., avg: 10.341 ms.
Best time for 4096K FFT length: 9.130 ms., avg: 9.168 ms.
Best time for 5120K FFT length: 11.478 ms., avg: 17.221 ms.
Best time for 6144K FFT length: 13.682 ms., avg: 16.085 ms.
Best time for 7168K FFT length: 16.449 ms., avg: 16.585 ms.
Best time for 8192K FFT length: 19.097 ms., avg: 21.210 ms.
Timing FFTs using 3 threads.
Best time for 1024K FFT length: 1.455 ms., avg: 1.485 ms.
Best time for 1280K FFT length: 1.873 ms., avg: 1.907 ms.
Best time for 1536K FFT length: 2.274 ms., avg: 2.319 ms.
Best time for 1792K FFT length: 2.733 ms., avg: 2.774 ms.
Best time for 2048K FFT length: 3.198 ms., avg: 3.241 ms.
Best time for 2560K FFT length: 4.037 ms., avg: 4.079 ms.
Best time for 3072K FFT length: 4.846 ms., avg: 5.031 ms.
Best time for 3584K FFT length: 5.773 ms., avg: 5.833 ms.
Best time for 4096K FFT length: 6.659 ms., avg: 8.788 ms.
Best time for 5120K FFT length: 8.332 ms., avg: 8.451 ms.
Best time for 6144K FFT length: 10.070 ms., avg: 10.253 ms.
Best time for 7168K FFT length: 11.973 ms., avg: 12.058 ms.
Best time for 8192K FFT length: 13.976 ms., avg: 14.087 ms.
Timing FFTs using 4 threads.
Best time for 1024K FFT length: 1.226 ms., avg: 1.260 ms.
Best time for 1280K FFT length: 1.555 ms., avg: 1.600 ms.
Best time for 1536K FFT length: 1.937 ms., avg: 3.906 ms.
Best time for 1792K FFT length: 2.353 ms., avg: 2.400 ms.
Best time for 2048K FFT length: 2.795 ms., avg: 2.846 ms.
Best time for 2560K FFT length: 3.538 ms., avg: 3.576 ms.
Best time for 3072K FFT length: 4.157 ms., avg: 4.215 ms.
Best time for 3584K FFT length: 5.036 ms., avg: 7.363 ms.
Best time for 4096K FFT length: 5.831 ms., avg: 5.877 ms.
Best time for 5120K FFT length: 7.318 ms., avg: 9.553 ms.
Best time for 6144K FFT length: 8.738 ms., avg: 8.837 ms.
Best time for 7168K FFT length: 10.378 ms., avg: 11.727 ms.
Best time for 8192K FFT length: 12.161 ms., avg: 12.284 ms.

Timings for 1024K FFT length (1 cpu, 1 worker):  3.91 ms.  Throughput: 255.67 iter/sec.
Timings for 1024K FFT length (2 cpus, 2 workers):  4.09,  4.08 ms.  Throughput: 489.46 iter/sec.
Timings for 1024K FFT length (3 cpus, 3 workers):  4.91,  4.54,  4.47 ms.  Throughput: 647.90 iter/sec.
Timings for 1024K FFT length (4 cpus, 4 workers):  5.76,  5.77,  5.41,  5.29 ms.  Throughput: 720.83 iter/sec.
Timings for 1280K FFT length (1 cpu, 1 worker):  4.95 ms.  Throughput: 201.94 iter/sec.
Timings for 1280K FFT length (2 cpus, 2 workers):  5.23,  5.21 ms.  Throughput: 383.17 iter/sec.
Timings for 1280K FFT length (3 cpus, 3 workers):  6.14,  5.79,  5.72 ms.  Throughput: 510.31 iter/sec.
Timings for 1280K FFT length (4 cpus, 4 workers):  7.29,  7.32,  6.76,  6.65 ms.  Throughput: 571.91 iter/sec.
Timings for 1536K FFT length (1 cpu, 1 worker):  6.00 ms.  Throughput: 166.56 iter/sec.
Timings for 1536K FFT length (2 cpus, 2 workers):  6.41,  6.30 ms.  Throughput: 314.66 iter/sec.
Timings for 1536K FFT length (3 cpus, 3 workers):  7.43,  6.94,  6.86 ms.  Throughput: 424.38 iter/sec.
Timings for 1536K FFT length (4 cpus, 4 workers):  8.94,  8.87,  8.17,  8.00 ms.  Throughput: 471.97 iter/sec.
Timings for 1792K FFT length (1 cpu, 1 worker):  7.19 ms.  Throughput: 139.09 iter/sec.
Timings for 1792K FFT length (2 cpus, 2 workers):  7.57,  7.52 ms.  Throughput: 264.99 iter/sec.
Timings for 1792K FFT length (3 cpus, 3 workers):  8.95,  8.36,  8.23 ms.  Throughput: 352.84 iter/sec.
Timings for 1792K FFT length (4 cpus, 4 workers): 10.40, 10.34,  9.72,  9.53 ms.  Throughput: 400.69 iter/sec.
Timings for 2048K FFT length (1 cpu, 1 worker):  8.49 ms.  Throughput: 117.84 iter/sec.
Timings for 2048K FFT length (2 cpus, 2 workers):  8.64,  8.70 ms.  Throughput: 230.66 iter/sec.
Timings for 2048K FFT length (3 cpus, 3 workers): 10.46,  9.72,  9.34 ms.  Throughput: 305.48 iter/sec.
Timings for 2048K FFT length (4 cpus, 4 workers): 12.02, 12.05, 11.31, 11.00 ms.  Throughput: 345.52 iter/sec.
Timings for 2560K FFT length (1 cpu, 1 worker): 10.45 ms.  Throughput: 95.66 iter/sec.
Timings for 2560K FFT length (2 cpus, 2 workers): 11.03, 11.00 ms.  Throughput: 181.59 iter/sec.
Timings for 2560K FFT length (3 cpus, 3 workers): 13.08, 12.19, 12.03 ms.  Throughput: 241.58 iter/sec.
Timings for 2560K FFT length (4 cpus, 4 workers): 14.99, 15.16, 14.06, 14.13 ms.  Throughput: 274.58 iter/sec.
Timings for 3072K FFT length (1 cpu, 1 worker): 12.87 ms.  Throughput: 77.68 iter/sec.
Timings for 3072K FFT length (2 cpus, 2 workers): 13.30, 13.28 ms.  Throughput: 150.46 iter/sec.
Timings for 3072K FFT length (3 cpus, 3 workers): 15.58, 14.55, 14.42 ms.  Throughput: 202.29 iter/sec.
Timings for 3072K FFT length (4 cpus, 4 workers): 18.44, 17.97, 16.73, 16.47 ms.  Throughput: 230.37 iter/sec.
Timings for 3584K FFT length (1 cpu, 1 worker): 15.05 ms.  Throughput: 66.42 iter/sec.
Timings for 3584K FFT length (2 cpus, 2 workers): 15.64, 15.58 ms.  Throughput: 128.13 iter/sec.
Timings for 3584K FFT length (3 cpus, 3 workers): 18.51, 17.31, 17.03 ms.  Throughput: 170.51 iter/sec.
Timings for 3584K FFT length (4 cpus, 4 workers): 21.54, 21.45, 19.83, 19.43 ms.  Throughput: 194.95 iter/sec.
Timings for 4096K FFT length (1 cpu, 1 worker): 18.21 ms.  Throughput: 54.92 iter/sec.
Timings for 4096K FFT length (2 cpus, 2 workers): 19.41, 18.67 ms.  Throughput: 105.07 iter/sec.

Last fiddled with by kracker on 2014-10-08 at 15:43
kracker is offline   Reply With Quote
Old 2014-10-08, 17:31   #53
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

22·3·17·23 Posts
Default

Quote:
Originally Posted by pinhodecarlos View Post
Your CPU is dying....
I hope not. It is a brand new PC at a friends work place.

I am prone to believe it is just a very poor setup.
That is, akin to a Lamborghini CPU Chip with Ford Fiesta RAM.
petrw1 is offline   Reply With Quote
Old 2014-10-08, 17:51   #54
pinhodecarlos
 
pinhodecarlos's Avatar
 
"Carlos Pinho"
Oct 2011
Milton Keynes, UK

3·17·97 Posts
Default

Would you be kind to post some pictures of your BIOS settings? If you don't have where to host them just send me a email (my nickname@yahoo.com) and I will host them for you.

Carlos
pinhodecarlos is offline   Reply With Quote
Old 2014-10-08, 18:20   #55
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

22×1,217 Posts
Default

Quote:
Originally Posted by axn View Post
I think you missed the post title which says "Well with 4 workers doing LL .... IT REALLY SUCKS".
OK, it was my reading comprehension at fault here. I missed the "from", thought iteration time doubled with TFx2 and LLx2 vs LLx2 and idle.

On the bright side, that may mean this is merely a single-channel memory problem, and not a defective heatsink mount.

OP-
If the heatsink is not mounted flush, it would work at less than 10% efficiency, and you could have overheating problems/throttling with even a single core at full blast. However, the turbo use on single-thread suggests heat is not the culprit, leaving you with memory as the problem (as your analogy indicates).
Two ECMs and two LLs seem like the plan, then.
VBCurtis is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Computer isn't mine... CuriousKit PrimeNet 25 2016-06-02 05:44
gpu72 site - exp 78227507 credited but not mine? dh1 GPU to 72 1 2015-11-29 14:03
Your end or mine? davieddy Lounge 0 2011-12-11 11:31
Hey brother, can you help a friend in need? petrw1 Math 3 2008-03-30 14:20
some questions of mine, in general jerico2day Software 5 2005-03-30 09:19

All times are UTC. The time now is 10:24.


Mon Aug 2 10:24:06 UTC 2021 up 10 days, 4:53, 0 users, load averages: 1.38, 1.15, 1.14

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.