mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2020-04-01, 17:22   #496
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

5×23×61 Posts
Default

Quote:
Originally Posted by jdhedden View Post
I'm trying to run mprime on a AMD Ryzen 7 3800X 8-Core Processor. I've run with different numbers of threads with the following approximate throughputs:
1 thread : 20ms/iter
2 threads: 31ms/iter
4 threads: 60ms/iter
7 threads: 100ms/iter.
We need to work on terminology first. Since your timings are getting worse, I think you are timing 1,2,4,7 workers (that is testing 1,2,4,7 different exponents). You can also time one worker using multiple threads. In this case, timings will decrease.

In either scenario, at some point your timings will likely be constrained by your RAM's bandwidth.
Prime95 is offline   Reply With Quote
Old 2020-04-01, 20:58   #497
jdhedden
 
Mar 2020

2 Posts
Default

Quote:
Originally Posted by Prime95 View Post
We need to work on terminology first. Since your timings are getting worse, I think you are timing 1,2,4,7 workers (that is testing 1,2,4,7 different exponents). You can also time one worker using multiple threads. In this case, timings will decrease.

In either scenario, at some point your timings will likely be constrained by your RAM's bandwidth.
Yes, I agree that my terminology may be off. I have 2 exponents. I'm configured to run 2 workers. When running, the system reports 200% cpu usage, indicating 2 threads are being used. (I'm on a Linux machine (Debian OS) with an AMD Ryzen 7 which has 8 cores = 16 cpus (threads).)

When I was using 4 workers/exports, it said 400% cpu - again, 4 threads used. However, ms/iter also went up by 2 which indicates that total "throughput" remained constant even though more cpus were involved.

I've tried setting cpu frequencies:
Code:
for xx in 8 9 10 11 12 13 14 15; do
    sudo cpufreq-set --freq 3.90GHz --cpu $xx
done
In conjunction with cpu affinities in local.txt:
Code:
[Worker #1]
Affinity=8-9,10-11
[Worker #2]
Affinity=12-13,14-15
But this had no affect on reducing the ms/iter.

So it may be that I'm not configured make the most of my computer's resources. Suggestions?
jdhedden is offline   Reply With Quote
Old 2020-04-01, 22:52   #498
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

155478 Posts
Default

Quote:
Originally Posted by jdhedden View Post
Suggestions?
Ryzen owners mmay be able to offer better advice. I would guess you will get maximum throughput with 1 worker using 8 threads.

The benchmark menu choice will tell you which configuration gives he best throughput. Assuming you are doing first time tests, benchmark the 5120K FFT size, 8 cores, 1.2.4.8 workers.

Afterwards use the Worker Windows menu choice to create the proper number of workers using the proper number of threads. Direct editing prime.txt and local.txt is not recommended for newcomers.
Prime95 is offline   Reply With Quote
Old 2020-04-13, 16:43   #499
ixfd64
Bemusing Prompter
 
ixfd64's Avatar
 
"Danny"
Dec 2002
California

29×79 Posts
Default

I found an issue on macOS: if you keep checking and unchecking the "Merge All Workers" option, then Prime95 will keep creating new windows.
ixfd64 is offline   Reply With Quote
Old 2020-04-13, 22:57   #500
pepi37
 
pepi37's Avatar
 
Dec 2011
After milion nines:)

2×653 Posts
Default

https://www.dropbox.com/s/gsew1px1wr...52020.jpg?dl=0


I see this behavior many times, and on different ( my own ) computers. With older and newer Intel CPU-s as with DDR3 or DDR4 memory.
I post image so you can see that Prime95 uses 97 %of CPU time, also you see other process and you see big difference between worker1 and worker2.
Always worker 2 is faster then worker1. Always.
And as you can see difference is around 15% and that is huge difference.
How to pinpoint what process or few of them steal cycles on worker1 but not from Worker2 and why is only worker1 affected? Why is "stealing" not even distributed between workers?
I must do same test on two parallel instances of LLR with 3 CPU cores to see, be sure and can verify that problem is same.

I meantime: any suggestion.?

You can see on picture all info you need
I5-9600K , CPU has not HT , 8 GB DDR4
pepi37 is offline   Reply With Quote
Old 2020-04-14, 00:29   #501
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

5·23·61 Posts
Default

There was some speculation years ago that Windows serviced interrupts on the first CPU code. I do not know if that is the case or if your machine is getting enough interrupts to explain the timing difference you are seeing.
Prime95 is offline   Reply With Quote
Old 2020-04-14, 14:14   #502
pepi37
 
pepi37's Avatar
 
Dec 2011
After milion nines:)

2·653 Posts
Default

Quote:
Originally Posted by Prime95 View Post
There was some speculation years ago that Windows serviced interrupts on the first CPU code. I do not know if that is the case or if your machine is getting enough interrupts to explain the timing difference you are seeing.

Last night I run two instances of LLR ( 3 core per instance) and problem is gone. run time was near identical ( difference was less then 50 seconds on 9600 seconds)


Now I run two instances of Prime95 to see will they have identical run time.
pepi37 is offline   Reply With Quote
Old 2020-04-14, 19:04   #503
pepi37
 
pepi37's Avatar
 
Dec 2011
After milion nines:)

2·653 Posts
Default

And last test is done: two instances of Prime95, both have 3 cores. While on LLR the largest time difference was 114 seconds ( so below 2 minutes) on first result time difference in Prime95 is 5 minutes. So it looks like it is Prime95 specific problem.


https://www.dropbox.com/s/v970gc6uu6j5kfn/002.jpg?dl=0
pepi37 is offline   Reply With Quote
Old 2020-04-14, 19:21   #504
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

3×977 Posts
Default

Quote:
Originally Posted by pepi37 View Post
two instances of Prime95, both have 3 cores... So it looks like it is Prime95 specific problem.
Ah, but which cores? Unless you specifically set which cores to use in each instance, each Prime95 instance will attempt to use the same cores and the Windows scheduler will share between them (and all other programs) as best it can, so it's not unexpected that their runtimes would be approximately similar. Whereas with a single instance of Prime95, each worker is running on a specific set of cores and some of these may be more in-demand than others by other programs (or Windows itself), leading to slightly different runtimes for different workers.

Why does this matter, anyways?
James Heinrich is offline   Reply With Quote
Old 2020-04-14, 19:38   #505
pepi37
 
pepi37's Avatar
 
Dec 2011
After milion nines:)

130610 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
Ah, but which cores? Unless you specifically set which cores to use in each instance, each Prime95 instance will attempt to use the same cores and the Windows scheduler will share between them (and all other programs) as best it can, so it's not unexpected that their runtimes would be approximately similar. Whereas with a single instance of Prime95, each worker is running on a specific set of cores and some of these may be more in-demand than others by other programs (or Windows itself), leading to slightly different runtimes for different workers.

Why does this matter, anyways?
Thanks for point it , and you were partially right! It looks like in this case ( 2 instance with 3 cores) it uses same cores ( by number). Obvious it cannot be true, because if it uses SAME cores then time will be doubled or even more. And time is not doubled ( compared to LLR time)

Second: when you run one instance with two workers then Prime95 choose right ones.
So even it looks like solution, it is not.

Quote:
Why does this matter, anyways?[
It is very important: look my post above with times on LLR with two instances with 3 cores each ( I didnot setup any cores on LLR) but got near identical run times. With Prime95 I cannot get same results, so I try to find , pinpoint where is problem with Prime95


Maybe it is not big deal that one worker is 10-15% faster then other on dedicated machine, but why is faster.
And please dont throw this in wrong direction: maybe it is not important for you, but it is important for me.
Thanks

Last fiddled with by pepi37 on 2020-04-14 at 19:40
pepi37 is offline   Reply With Quote
Old 2020-04-14, 20:40   #506
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

5×23×61 Posts
Default

I believe LLR does not set affinity. So, the OS is in charge of placing the 3 threads amongst the 6 cores.

Prime95 uses hwloc to assign affinities. I think there is a way to tell prime95 to assign threads to any core like LLR -- poke around in undoc.txt or the worker windows dialog box.

BTW, what is the total thoughput of LLR and prim95? If the throughput is the same prime95's affinity tricks have merely concentrated all the OS overhead into one worker wheras LLR has spread it over both workers.
Prime95 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Prime95 version 29.2 Prime95 Software 71 2017-09-16 16:55
Prime95 version 29.1 Prime95 Software 95 2017-08-22 22:46
Prime95 version 26.5 Prime95 Software 175 2011-04-04 22:35
Prime95 version 25.9 Prime95 Software 143 2010-01-05 22:53
Prime95 version 25.8 Prime95 Software 159 2009-09-21 16:30

All times are UTC. The time now is 21:33.

Sat Aug 8 21:33:48 UTC 2020 up 22 days, 17:20, 1 user, load averages: 1.60, 1.56, 1.57

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.