![]() |
|
|
#23 | |
|
"David Kirkby"
Jan 2021
Althorne, Essex, UK
1B016 Posts |
Quote:
|
|
|
|
|
|
|
#24 | |
|
Jun 2003
2·2,543 Posts |
Quote:
The P-1 bounds selection uses a hardware-neutral costing function. The theory is simple -- PRP uses a series of multiplication using a given FFT. So does P-1 stage1 and stage2. Hence all you need to do is compare the number of multiplications used in PRP vs how many multiplications are expected for P-1 (for a given B1/B2/memory allocation) and probability (as detailed in https://www.mersenne.org/various/math.php#p-1_factoring). The only variable here is the memory allocation, which affects how many stage2 multiplications are needed. The hardware is not very relevant (because of the FFT computation being used on both sides). Having said that, the calculation uses a fudge factor, where a stage 2 multiplication counts as 1.2 (?) regular multiplication because stage 2 is somewhat less cache friendly. Conceivably, hardware with higher memorybandwidth-to-cpu ratio might have an edge over hardware with lesser bandwidth-to-cpu ratio. But this is not factored into the bounds calculation (AFAIK). |
|
|
|
|
|
|
#25 | ||
|
"David Kirkby"
Jan 2021
Althorne, Essex, UK
24·33 Posts |
Quote:
https://www.mersenne.org/various/math.php#p-1_factoring which is ultimately what matters for maximum chance of finding a prime. How accurate are the estimates of finding a factor for P-1? Quote:
I'm currently running 4 PRP tests. According to top, I'm using just 2.1 MB RAM Code:
%Cpu(s): 94.1/1.3 95[||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ] MiB Mem : 2.1/385610.4 [|| ] MiB Swap: 7.6/2048.0 [|||||||| Anyway, when I have a bit of time I will test these out, and convince myself one way or the other. |
||
|
|
|
|
|
#26 |
|
Jun 2003
2·2,543 Posts |
I believe the current first time tests use either 5.5M or 6M FFT. That mean it either consumes 44MB or 48MB per FFT. Plus some ancillary memory for some lookup tables. If you're running two per CPU, that is 100MB per CPU. The 2MB figure you're seeing is a big fat lie.
Obviously 100MB is not going to run out of 35MB L3 cache. So your performance will be very much dependent on RAM bandwidth (and somewhat on RAM latency). Thus PRP, P-1 stage 1 and P-1 stage 2 are all very much dependent on RAM bandwidth. It is just that stage 2 also need to access a lot more memory in short order so there is some efficiency loss. You can check this yourself. At the end of both stages, P95 will print the "number of transforms" (if you have stopped/restarted in the middle, the count will be from the restart so it won't work -- we need an unbroken run). Divide the the number of transforms by the runtime to find how many transforms/sec in each stage. That will give you the relative inefficiency of stage2 vs stage1. Last fiddled with by axn on 2021-07-23 at 14:28 |
|
|
|
|
|
#27 | ||
|
"David Kirkby"
Jan 2021
Althorne, Essex, UK
24×33 Posts |
Quote:
Quote:
|
||
|
|
|
|
|
#28 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,437 Posts |
Let's see, Linux top apparently lying to drkirkby at 2.1 MiB;
assume as a check, exponent size ~104M which in packed binary would require up to 104000000/8 = 13.MB for each packed multiprecision binary integer value mod Mp, per worker; 13 * 4 =52 so top's figure is more than an order of magnitude too small; 5.5 Mi fft size * 8 B / word = 44 MiB per worker (~46 MB). Inner loops will fit in L3 cache but apparently the outermost won't, even if workers reduced to 2 total, one per cpu package. Four workers * 44 MiB = 176 MiB at least expected. 2 big CPUs with 35MiB L3 each, divided among 4 mprime workers, 2 * 35MiB / 4 ~ 17.5 MiB available per mprime worker. On Win10, a prime95 one-worker instance running 105M PRP occupies ~267. MB at "5600K" fft length (~5.5Mi?). 267/46 ~5.8. Later, still running, 197. MB; paused, ~7 MB. Maybe mprime workers were paused at the time the 2.1MiB figure was obtained by drkirkby? Pausing prime95 drastically reduced ram usage. Pause is not consistent with 94% CPU though. I've seen other utilities apparently misrepresent large ram usage mod some large 2n. (GPU-Z at some version IIRC.) |
|
|
|
|
|
#29 | |
|
Jun 2003
2×2,543 Posts |
Quote:
Indeed! But you would also need numactl & two instances to guarantee proper RAM mapping. |
|
|
|
|
|
|
#30 | |
|
"David Kirkby"
Jan 2021
Althorne, Essex, UK
24×33 Posts |
Quote:
Code:
[Worker #1 Jul 23 16:04] Setting affinity to run worker on CPU core #1 [Worker #1 Jul 23 16:04] Setting affinity to run helper thread 7 on CPU core #8 [Worker #1 Jul 23 16:04] Setting affinity to run helper thread 8 on CPU core #9 [Worker #1 Jul 23 16:04] Setting affinity to run helper thread 9 on CPU core #10 [Worker #1 Jul 23 16:04] Setting affinity to run helper thread 10 on CPU core #11 [Worker #1 Jul 23 16:04] Setting affinity to run helper thread 11 on CPU core #12 [Worker #1 Jul 23 16:04] Setting affinity to run helper thread 12 on CPU core #13 [Worker #1 Jul 23 16:04] Setting affinity to run helper thread 6 on CPU core #7 [Worker #1 Jul 23 16:04] Setting affinity to run helper thread 5 on CPU core #6 [Worker #1 Jul 23 16:04] Setting affinity to run helper thread 4 on CPU core #5 [Worker #1 Jul 23 16:04] Setting affinity to run helper thread 3 on CPU core #4 [Worker #1 Jul 23 16:04] Setting affinity to run helper thread 2 on CPU core #3 [Worker #1 Jul 23 16:04] Setting affinity to run helper thread 1 on CPU core #2 [Worker #2 Jul 23 16:04] Setting affinity to run worker on CPU core #14 [Worker #2 Jul 23 16:04] Setting affinity to run helper thread 7 on CPU core #21 [Worker #2 Jul 23 16:04] Setting affinity to run helper thread 8 on CPU core #22 [Worker #2 Jul 23 16:04] Setting affinity to run helper thread 9 on CPU core #23 [Worker #2 Jul 23 16:04] Setting affinity to run helper thread 10 on CPU core #24 [Worker #2 Jul 23 16:04] Setting affinity to run helper thread 11 on CPU core #25 [Worker #2 Jul 23 16:04] Setting affinity to run helper thread 12 on CPU core #26 [Worker #2 Jul 23 16:04] Setting affinity to run helper thread 6 on CPU core #20 [Worker #2 Jul 23 16:04] Setting affinity to run helper thread 5 on CPU core #19 [Worker #2 Jul 23 16:04] Setting affinity to run helper thread 4 on CPU core #18 [Worker #2 Jul 23 16:04] Setting affinity to run helper thread 3 on CPU core #17 [Worker #2 Jul 23 16:04] Setting affinity to run helper thread 2 on CPU core #16 [Worker #2 Jul 23 16:04] Setting affinity to run helper thread 1 on CPU core #15 [Worker #3 Jul 23 16:04] Setting affinity to run worker on CPU core #27 [Worker #3 Jul 23 16:04] Setting affinity to run helper thread 7 on CPU core #34 [Worker #3 Jul 23 16:04] Setting affinity to run helper thread 8 on CPU core #35 [Worker #3 Jul 23 16:04] Setting affinity to run helper thread 9 on CPU core #36 [Worker #3 Jul 23 16:04] Setting affinity to run helper thread 10 on CPU core #37 [Worker #3 Jul 23 16:04] Setting affinity to run helper thread 11 on CPU core #38 [Worker #3 Jul 23 16:04] Setting affinity to run helper thread 12 on CPU core #39 [Worker #3 Jul 23 16:04] Setting affinity to run helper thread 6 on CPU core #33 [Worker #3 Jul 23 16:04] Setting affinity to run helper thread 5 on CPU core #32 [Worker #3 Jul 23 16:04] Setting affinity to run helper thread 4 on CPU core #31 [Worker #3 Jul 23 16:04] Setting affinity to run helper thread 3 on CPU core #30 [Worker #3 Jul 23 16:04] Setting affinity to run helper thread 2 on CPU core #29 [Worker #3 Jul 23 16:04] Setting affinity to run helper thread 1 on CPU core #28 [Worker #4 Jul 23 16:04] Setting affinity to run worker on CPU core #40 [Worker #4 Jul 23 16:04] Setting affinity to run helper thread 7 on CPU core #47 [Worker #4 Jul 23 16:04] Setting affinity to run helper thread 8 on CPU core #48 [Worker #4 Jul 23 16:04] Setting affinity to run helper thread 9 on CPU core #49 [Worker #4 Jul 23 16:04] Setting affinity to run helper thread 10 on CPU core #50 [Worker #4 Jul 23 16:04] Setting affinity to run helper thread 11 on CPU core #51 [Worker #4 Jul 23 16:04] Setting affinity to run helper thread 12 on CPU core #52 [Worker #4 Jul 23 16:04] Setting affinity to run helper thread 6 on CPU core #46 [Worker #4 Jul 23 16:04] Setting affinity to run helper thread 5 on CPU core #45 [Worker #4 Jul 23 16:04] Setting affinity to run helper thread 4 on CPU core #44 [Worker #4 Jul 23 16:04] Setting affinity to run helper thread 3 on CPU core #43 [Worker #4 Jul 23 16:04] Setting affinity to run helper thread 2 on CPU core #42 [Worker #4 Jul 23 16:04] Setting affinity to run helper thread 1 on CPU core #41 Last fiddled with by drkirkby on 2021-07-23 at 17:28 |
|
|
|
|
|
|
#31 |
|
"David Kirkby"
Jan 2021
Althorne, Essex, UK
6608 Posts |
I have found the problem - the 2.1 was percent! I'm just running top now as mprime says
Code:
[Worker #1 Jul 23 17:10] Using 311330MB of memory. Code:
top - 18:38:14 up 6 days, 45 min, 7 users, load average: 49.87, 49.93, 50.31
Tasks: 888 total, 1 running, 887 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 1.4 sy, 94.2 ni, 4.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 82.7/385610.4 [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ]
MiB Swap: 7.6/2048.0 [|||||||| ]
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
635539 drkirkby 30 10 308.9g 305.0g 7464 S 4961 81.0 7540:01 mprime
637133 drkirkby 20 0 21360 5036 3436 S 1.3 0.0 0:01.39 top
637124 drkirkby 20 0 21360 4780 3436 R 1.0 0.0 0:01.60 top
34729 drkirkby 20 0 5881712 292588 47492 S 0.7 0.1 40:02.49 WolframKernel
Last fiddled with by drkirkby on 2021-07-23 at 18:21 |
|
|
|
|
|
#32 | |
|
Jun 2003
2·2,543 Posts |
Quote:
We need to setup two folders with different copies of local.txt, prime.txt and worktodo.txt. The executable (and libraries) itself do not need to be copied. We'll do 2 workers each and 13 threads per worker. In each worktodo.txt, there should be two sections (Worker 1, Worker 2). Similarly in each local.txt as well. In the local.txt files, we'll add Affinity lines. local.txt #1, Worker #1 Affinity=(0,26),(1-12,27-38) local.txt #1, Worker #2 Affinity=(13,39),(14-25,40-51) local.txt #2, Worker #1 Affinity=(52,78),(53-64,79-90) local.txt #2, Worker #2 Affinity=(65,91),(66-77,92-103) Finally, we can do: numactl -m 0 -N 0 ./mprime -d -wfolder0 numactl -m 1 -N 1 ./mprime -d -wfolder1 where folder0 and folder1 would be the folders you created. |
|
|
|
|
|
|
#33 |
|
Jun 2003
2×2,543 Posts |
On further experimenting, the affinity numbers might be all wrong. Can you do a lstopo-no-graphics and post the output?
EDIT:- lstopo-no-graphics --no-io Last fiddled with by axn on 2021-07-24 at 05:08 |
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Determine squares | fenderbender | Math | 14 | 2007-07-28 23:24 |
| determine | hyderman | Homework Help | 7 | 2007-06-17 06:01 |
| Methods to determine integer multiples | dsouza123 | Math | 6 | 2006-11-18 16:10 |
| Help: trying to determine latency on movaps instructions on AthlonXP | LoKI.GuZ | Hardware | 1 | 2004-01-26 20:05 |
| How to determine the P-1 boundaries? | Boulder | Software | 2 | 2003-08-20 11:55 |