mersenneforum.org Prime95 version 29.6/29.7/29.8
 Register FAQ Search Today's Posts Mark Forums Read

2020-04-01, 17:22   #496
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

2·5·691 Posts

Quote:
 Originally Posted by jdhedden I'm trying to run mprime on a AMD Ryzen 7 3800X 8-Core Processor. I've run with different numbers of threads with the following approximate throughputs: 1 thread : 20ms/iter 2 threads: 31ms/iter 4 threads: 60ms/iter 7 threads: 100ms/iter.
We need to work on terminology first. Since your timings are getting worse, I think you are timing 1,2,4,7 workers (that is testing 1,2,4,7 different exponents). You can also time one worker using multiple threads. In this case, timings will decrease.

In either scenario, at some point your timings will likely be constrained by your RAM's bandwidth.

2020-04-01, 20:58   #497
jdhedden

Mar 2020

2 Posts

Quote:
 Originally Posted by Prime95 We need to work on terminology first. Since your timings are getting worse, I think you are timing 1,2,4,7 workers (that is testing 1,2,4,7 different exponents). You can also time one worker using multiple threads. In this case, timings will decrease. In either scenario, at some point your timings will likely be constrained by your RAM's bandwidth.
Yes, I agree that my terminology may be off. I have 2 exponents. I'm configured to run 2 workers. When running, the system reports 200% cpu usage, indicating 2 threads are being used. (I'm on a Linux machine (Debian OS) with an AMD Ryzen 7 which has 8 cores = 16 cpus (threads).)

When I was using 4 workers/exports, it said 400% cpu - again, 4 threads used. However, ms/iter also went up by 2 which indicates that total "throughput" remained constant even though more cpus were involved.

I've tried setting cpu frequencies:
Code:
for xx in 8 9 10 11 12 13 14 15; do
sudo cpufreq-set --freq 3.90GHz --cpu \$xx
done
In conjunction with cpu affinities in local.txt:
Code:
[Worker #1]
Affinity=8-9,10-11
[Worker #2]
Affinity=12-13,14-15
But this had no affect on reducing the ms/iter.

So it may be that I'm not configured make the most of my computer's resources. Suggestions?

2020-04-01, 22:52   #498
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

11010111111102 Posts

Quote:
 Originally Posted by jdhedden Suggestions?
Ryzen owners mmay be able to offer better advice. I would guess you will get maximum throughput with 1 worker using 8 threads.

The benchmark menu choice will tell you which configuration gives he best throughput. Assuming you are doing first time tests, benchmark the 5120K FFT size, 8 cores, 1.2.4.8 workers.

Afterwards use the Worker Windows menu choice to create the proper number of workers using the proper number of threads. Direct editing prime.txt and local.txt is not recommended for newcomers.

 2020-04-13, 16:43 #499 ixfd64 Bemusing Prompter     "Danny" Dec 2002 California 43468 Posts I found an issue on macOS: if you keep checking and unchecking the "Merge All Workers" option, then Prime95 will keep creating new windows.
 2020-04-13, 22:57 #500 pepi37     Dec 2011 After milion nines:) 50516 Posts https://www.dropbox.com/s/gsew1px1wr...52020.jpg?dl=0 I see this behavior many times, and on different ( my own ) computers. With older and newer Intel CPU-s as with DDR3 or DDR4 memory. I post image so you can see that Prime95 uses 97 %of CPU time, also you see other process and you see big difference between worker1 and worker2. Always worker 2 is faster then worker1. Always. And as you can see difference is around 15% and that is huge difference. How to pinpoint what process or few of them steal cycles on worker1 but not from Worker2 and why is only worker1 affected? Why is "stealing" not even distributed between workers? I must do same test on two parallel instances of LLR with 3 CPU cores to see, be sure and can verify that problem is same. I meantime: any suggestion.? You can see on picture all info you need I5-9600K , CPU has not HT , 8 GB DDR4
 2020-04-14, 00:29 #501 Prime95 P90 years forever!     Aug 2002 Yeehaw, FL 2×5×691 Posts There was some speculation years ago that Windows serviced interrupts on the first CPU code. I do not know if that is the case or if your machine is getting enough interrupts to explain the timing difference you are seeing.
2020-04-14, 14:14   #502
pepi37

Dec 2011
After milion nines:)

5·257 Posts

Quote:
 Originally Posted by Prime95 There was some speculation years ago that Windows serviced interrupts on the first CPU code. I do not know if that is the case or if your machine is getting enough interrupts to explain the timing difference you are seeing.

Last night I run two instances of LLR ( 3 core per instance) and problem is gone. run time was near identical ( difference was less then 50 seconds on 9600 seconds)

Now I run two instances of Prime95 to see will they have identical run time.

 2020-04-14, 19:04 #503 pepi37     Dec 2011 After milion nines:) 5·257 Posts And last test is done: two instances of Prime95, both have 3 cores. While on LLR the largest time difference was 114 seconds ( so below 2 minutes) on first result time difference in Prime95 is 5 minutes. So it looks like it is Prime95 specific problem. https://www.dropbox.com/s/v970gc6uu6j5kfn/002.jpg?dl=0
2020-04-14, 19:21   #504
James Heinrich

"James Heinrich"
May 2004
ex-Northern Ontario

3·312 Posts

Quote:
 Originally Posted by pepi37 two instances of Prime95, both have 3 cores... So it looks like it is Prime95 specific problem.
Ah, but which cores? Unless you specifically set which cores to use in each instance, each Prime95 instance will attempt to use the same cores and the Windows scheduler will share between them (and all other programs) as best it can, so it's not unexpected that their runtimes would be approximately similar. Whereas with a single instance of Prime95, each worker is running on a specific set of cores and some of these may be more in-demand than others by other programs (or Windows itself), leading to slightly different runtimes for different workers.

Why does this matter, anyways?

2020-04-14, 19:38   #505
pepi37

Dec 2011
After milion nines:)

5×257 Posts

Quote:
 Originally Posted by James Heinrich Ah, but which cores? Unless you specifically set which cores to use in each instance, each Prime95 instance will attempt to use the same cores and the Windows scheduler will share between them (and all other programs) as best it can, so it's not unexpected that their runtimes would be approximately similar. Whereas with a single instance of Prime95, each worker is running on a specific set of cores and some of these may be more in-demand than others by other programs (or Windows itself), leading to slightly different runtimes for different workers. Why does this matter, anyways?
Thanks for point it , and you were partially right! It looks like in this case ( 2 instance with 3 cores) it uses same cores ( by number). Obvious it cannot be true, because if it uses SAME cores then time will be doubled or even more. And time is not doubled ( compared to LLR time)

Second: when you run one instance with two workers then Prime95 choose right ones.
So even it looks like solution, it is not.

Quote:
 Why does this matter, anyways?[
It is very important: look my post above with times on LLR with two instances with 3 cores each ( I didnot setup any cores on LLR) but got near identical run times. With Prime95 I cannot get same results, so I try to find , pinpoint where is problem with Prime95

Maybe it is not big deal that one worker is 10-15% faster then other on dedicated machine, but why is faster.
And please dont throw this in wrong direction: maybe it is not important for you, but it is important for me.
Thanks

Last fiddled with by pepi37 on 2020-04-14 at 19:40

 2020-04-14, 20:40 #506 Prime95 P90 years forever!     Aug 2002 Yeehaw, FL 2·5·691 Posts I believe LLR does not set affinity. So, the OS is in charge of placing the 3 threads amongst the 6 cores. Prime95 uses hwloc to assign affinities. I think there is a way to tell prime95 to assign threads to any core like LLR -- poke around in undoc.txt or the worker windows dialog box. BTW, what is the total thoughput of LLR and prim95? If the throughput is the same prime95's affinity tricks have merely concentrated all the OS overhead into one worker wheras LLR has spread it over both workers.

 Similar Threads Thread Thread Starter Forum Replies Last Post Prime95 Software 71 2017-09-16 16:55 Prime95 Software 95 2017-08-22 22:46 Prime95 Software 175 2011-04-04 22:35 Prime95 Software 143 2010-01-05 22:53 Prime95 Software 159 2009-09-21 16:30

All times are UTC. The time now is 06:10.

Thu Jul 9 06:10:53 UTC 2020 up 106 days, 3:43, 0 users, load averages: 1.31, 1.32, 1.34