View Single Post
Old 2022-09-25, 15:38   #724
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

26·5·23 Posts
Default Questions on prime95 memory limit and location

Contexts first, then questions, in bold font.


Prime95 V30.8b14 appears to limit stage 2 memory settings to ~90% of installed ram.
Reviewing undoc.txt, I did not find a way to adjust that limit. (On a 64GiB single cpu package system, 57.4 is the most the program will allow for stage 2 memory allowed. Before upgrading the ram, I ran up to 12GiB allowed with 16 installed, on the same system. Little else is running on that system.)
For effective ram use, we can run 2 workers, each allowed to use nearly all ram (not only ~45% of ram) and they will de-sync so that prime95 runs with workers alternating S1 & S2 phases;

Code:
W1 W2
S1 S2
S2 S1
S1 S2
S2 S1
undoc.txt says this about memory in P-1 stage 2:
Quote:
The Memory=n setting in local.txt refers to the total amount of memory the
program can use. You can also put this in the [Worker #n] section to place
a maximum amount of memory that one particular worker can use.

You can set MaxHighMemWorkers=n in local.txt. This tells the program how
many workers are allowed to use lots of memory. This occurs doing stage 2
of P-1, P+1, or ECM. Default is 1.
(There's nothing there about the upper limit or modifying it.)
1. Is there a way to allow up to ~60 GiB on a 64 GiB system?


Also, in the case of a dual-socket-Xeon system, I think it would be best to run 4 workers, limit high memory usage per worker to ~45% of installed ram, and confine two workers each to the ram on the same side of the NUMA interconnect as their respective CPU packages.

For example, if 128 GiB installed ram, and leave 24 GiB for other activity, with the memory units in local.txt being MiB,
Code:
MaxHighMemWorkers=2
Memory=106496

[Worker #1]
Memory=53248

[Worker #2]
Memory=53248

[Worker #3]
Memory=53248

[Worker #4]
Memory=53248
This might accomplish what's intended, or it might result in 2 high-memory workers on one CPU package at the same time, with one of them traversing the NUMA interconnect, resulting in slower operation. (I've seen performance decline on the same system, for P-1 in a 2-worker configuration with one performing P-1, the other PRP, and going above 1/2 of total system ram for P-1 stage 2.)
I didn't see a way to specify "stay on your own side of the NUMA fence" with memory usage.

What I'd prefer for efficiency:
Code:
CPUa  CPUb (each has 8 DIMMS, quad channel; QPI NUMA interconnect between the two)
----- -----
W1 W2 W3 W4
S1 S2 S1 S2
S2 S1 S2 S1
S1 S2 S1 S2
S2 S1 S2 S1
Not: (one worker's S2 memory accesses traverse the QPI instead of being local to the CPU's memory channels)
Code:
W1 W2 W3 W4
S1 S1<S2 S2 ( < or > indicating lots of QPI traffic, " " indicating little or none)
S2 S2>S1 S1
S1 S1<S2 S2
S2 S2>S1 S1
2. Is there a way to ensure a worker's memory access & allocation remains entirely or mostly on the same side of the NUMA boundary as the worker's CPU cores on a multi-Xeon system?

Last fiddled with by kriesel on 2022-09-25 at 15:42
kriesel is offline   Reply With Quote