![]() |
![]() |
#716 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2×13×283 Posts |
![]()
I briefly suspected my cat of standing on the "off" button of a laptop, until I noticed the power cord was not fully inserted into the adapter, and the unit had run on battery for a while. Probably had an intermittent power connection.
Running ECM on wavefront exponents is surprising, and 4 GiB seems low to me. Such ECM might be better off with v30.9b1 and more ram. P-1 in v30.8b14 or later is ok with 4GiB ram at first test wavefront, but is a lot more effective with more ram for stage 2. And ram needed would scale upward with exponent. See the attachment which is a snapshot of a work in progress. Last fiddled with by kriesel on 2022-09-22 at 15:46 |
![]() |
![]() |
![]() |
#717 | |
"James Heinrich"
May 2004
ex-Northern Ontario
3×23×59 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#718 |
Random Account
Aug 2009
Not U. + S.A.
23·32·5·7 Posts |
![]() |
![]() |
![]() |
![]() |
#719 |
"Mihai Preda"
Apr 2015
101101000012 Posts |
![]()
I have changed my CPU (to i9-10940x), and afterwards I ran a small benchmark for 6272K FFT with such results:
Code:
Prime95 64-bit version 30.8, RdtscTiming=1 FFTlen=6272K, Type=3, Arch=8, Pass1=896, Pass2=7168, clm=4 (13 cores, 1 worker): 2.44 ms. Throughput: 410.00 iter/sec. FFTlen=6272K, Type=3, Arch=8, Pass1=896, Pass2=7168, clm=4 (14 cores, 1 worker): 2.44 ms. Throughput: 410.00 iter/sec. FFTlen=6272K, Type=3, Arch=8, Pass1=896, Pass2=7168, clm=2 (13 cores, 1 worker): 2.12 ms. Throughput: 472.00 iter/sec. FFTlen=6272K, Type=3, Arch=8, Pass1=896, Pass2=7168, clm=2 (14 cores, 1 worker): 2.09 ms. Throughput: 479.26 iter/sec. FFTlen=6272K, Type=3, Arch=8, Pass1=896, Pass2=7168, clm=1 (13 cores, 1 worker): 2.07 ms. Throughput: 482.38 iter/sec. FFTlen=6272K, Type=3, Arch=8, Pass1=896, Pass2=7168, clm=1 (14 cores, 1 worker): 2.03 ms. Throughput: 492.55 iter/sec. FFTlen=6272K, Type=3, Arch=8, Pass1=2048, Pass2=3136, clm=1 (13 cores, 1 worker): 2.17 ms. Throughput: 460.30 iter/sec. FFTlen=6272K, Type=3, Arch=8, Pass1=2048, Pass2=3136, clm=1 (14 cores, 1 worker): 2.11 ms. Throughput: 474.48 iter/sec. Using AVX-512 FFT length 6272K, Pass1=896, Pass2=7K, clm=2, 14 threads So it does not select the most efficient FFT according to the bench (chooses clm=2 instead of clm=1) . Why? Does P-1 do auto-bench? (how often) Is there a way to force auto-bench? Or what to do to select the "best" FFT implem. I'm ready to run a lengthy benchmark once, given the new CPU. Are the benchmark results uploaded automatically? (to be available to others) thanks! |
![]() |
![]() |
![]() |
#720 | |
Jan 2021
California
11·47 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#721 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2·13·283 Posts |
![]()
Also, in prime95 v30.8b14, fft sizes for S1 and S2 of the same P-1 run on an exponent differ from each other, in systematic first P-1 and retry P-1 I have been tabulating. I've seen S2 fft size range from ~4% to 12% larger than the S1 fft size on the same exponent. Lower ratio at low S2 ram amounts allowed (4GiB), higher ratio at higher S2 ram allowed (up to 56GiB at least). It would not surprise me if the fft size chosen for PRP or LL were smaller than that for S1 P-1. The difference might have something to do with a difference in the number of carries expected in the code.
I have observed stage 1 P-1 interrupted by benchmarking for a specific fft size. Unfortunately the worker window does not indicate what fft size(s). Quote:
Last fiddled with by kriesel on 2022-09-24 at 17:59 |
|
![]() |
![]() |
![]() |
#722 | |
"Mihai Preda"
Apr 2015
11·131 Posts |
![]()
mprime did run a auto-bench during the night, and appended this to results.bench.txt:
Quote:
Using AVX-512 FFT length 6272K, Pass1=896, Pass2=7K, clm=2, 13 threads which is not the optimal config according to its own auto-bench. I wonder why, and how to make it choose the optimal config? |
|
![]() |
![]() |
![]() |
#723 |
"Oliver"
Sep 2017
Porta Westfalica, DE
1,319 Posts |
![]()
Do you still have the gwnum-file with old entries?
|
![]() |
![]() |
![]() |
#724 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2×13×283 Posts |
![]()
Contexts first, then questions, in bold font.
Prime95 V30.8b14 appears to limit stage 2 memory settings to ~90% of installed ram. Reviewing undoc.txt, I did not find a way to adjust that limit. (On a 64GiB single cpu package system, 57.4 is the most the program will allow for stage 2 memory allowed. Before upgrading the ram, I ran up to 12GiB allowed with 16 installed, on the same system. Little else is running on that system.) For effective ram use, we can run 2 workers, each allowed to use nearly all ram (not only ~45% of ram) and they will de-sync so that prime95 runs with workers alternating S1 & S2 phases; Code:
W1 W2 S1 S2 S2 S1 S1 S2 S2 S1 Quote:
1. Is there a way to allow up to ~60 GiB on a 64 GiB system? Also, in the case of a dual-socket-Xeon system, I think it would be best to run 4 workers, limit high memory usage per worker to ~45% of installed ram, and confine two workers each to the ram on the same side of the NUMA interconnect as their respective CPU packages. For example, if 128 GiB installed ram, and leave 24 GiB for other activity, with the memory units in local.txt being MiB, Code:
MaxHighMemWorkers=2 Memory=106496 [Worker #1] Memory=53248 [Worker #2] Memory=53248 [Worker #3] Memory=53248 [Worker #4] Memory=53248 I didn't see a way to specify "stay on your own side of the NUMA fence" with memory usage. What I'd prefer for efficiency: Code:
CPUa CPUb (each has 8 DIMMS, quad channel; QPI NUMA interconnect between the two) ----- ----- W1 W2 W3 W4 S1 S2 S1 S2 S2 S1 S2 S1 S1 S2 S1 S2 S2 S1 S2 S1 Code:
W1 W2 W3 W4 S1 S1<S2 S2 ( < or > indicating lots of QPI traffic, " " indicating little or none) S2 S2>S1 S1 S1 S1<S2 S2 S2 S2>S1 S1 Last fiddled with by kriesel on 2022-09-25 at 15:42 |
|
![]() |
![]() |
![]() |
#725 | |
P90 years forever!
Aug 2002
Yeehaw, FL
3·11·13·19 Posts |
![]() Quote:
Auto bench done every 21(?) hours until there are several data points. I'm looking into why it is running 13 core benchmarks when it only uses 16 core bench results (a bug). Benchmarks are not uploaded. They are not particularly useful to others given all the combinations of overclocking, memory speeds, etc. |
|
![]() |
![]() |
![]() |
#726 |
"Oliver"
Sep 2017
Porta Westfalica, DE
1,319 Posts |
![]()
Does it detect three of your cores as effciency cores when they are not?
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Do not post your results here! | kar_bon | Prime Wiki | 40 | 2022-04-03 19:05 |
what should I post ? | science_man_88 | science_man_88 | 24 | 2018-10-19 23:00 |
Where to post job ad? | xilman | Linux | 2 | 2010-12-15 16:39 |
Moderated Post | kar_bon | Forum Feedback | 3 | 2010-09-28 08:01 |
Something that I just had to post/buy | dave_0273 | Lounge | 1 | 2005-02-27 18:36 |