mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2021-06-15, 19:35   #1
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

1BF16 Posts
Default Suggestion - include 3 workers in default benchmark of mprime

I run the mprime 30.6b4 benchmark on a dual-processor 24-core machine on Amazon AWS (instance c5.metal in N. Virginia to be precise). The mprime benchmark defaulted to testing 1, 2, 4 and 48 workers.

Might I suggest that 3 is added to the default list. Although I don't know the reason, and it seemed fairly illogical to me, 3 workers gave optimal throughput. I assume the benchmark was running 3 workers each with 16 cores, but I don't actually know, as the benchmark does not report that information.
I don't think most people with dual-processor machines would consider testing 3 workers, but for some reason I did, and found it optimal. Note that c5.metal on Amazon with 48 cores (96 vCPUs as Amazon calls it), is running on the "bare metal". One has the complete server to oneself, and is not using any form of virtualisation, so this rather odd number is not the result of other uses who are using the same hardware.

I run the benchmark on my Dell 7920, which is also dual-processor. That gave optimal throughput with 2 workers, which seemed more logical.

In all cases I just benchmarked the single FFT size needed for the exponents being tested.
drkirkby is offline   Reply With Quote
Old 2021-06-15, 19:54   #2
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

3×149 Posts
Default

I just had a thought. Those 3 GHz Intel Xeon Platinum 8275CL CPUs used on the Amazon AWS c5.metal are not a common CPU. They are almost certainty only supplied to Amazon. No data sheet will be found on the Intel website. But I would expect them to have 6 memory channels, like other 82xx series of Xeon Platinum CPUs. 3 divides 6, which could conceivably be a reason that 3 workers worked best.

Dual to a sub-optimal memory configuration, I'm only using 4 of 6 memory channels on one of the CPUs in my Dell 7920. When I finally get the memory sorted out optimally, I will try benchmarking 3 workers on that, but for now at least, 2 workers is optimal on my Dell.
drkirkby is offline   Reply With Quote
Old 2021-06-15, 22:39   #3
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

23·17·73 Posts
Default

Quote:
Originally Posted by drkirkby View Post
Although I don't know the reason, and it seemed fairly illogical to me, 3 workers gave optimal throughput.
If I may...

When you're "renting kit", have you considered that possibly only retaining a single CPU attached to an instance might be "optimal"? Particularly when you take costs into consideration. And that you're running someone else's code.

You seem to be a bit fixated about your dual-CPU Dell 7920. To be honest, it's not really all that impressive by today's standards.

I'm truly not trying to be negative here. It's great that you're learning. Have you drilled down on affinity?

Since you've already taken the leap into AWS, why don't you try running some experiments with single CPU instances? No need to go "bare metal", except in very unusual situations. Compare the results, and come to your own conclusion as to where the economic curves cross.

I hope that comes across the way it is intended. 8-)
chalsall is offline   Reply With Quote
Old 2021-06-16, 01:30   #4
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

1BF16 Posts
Default

Quote:
Originally Posted by chalsall View Post
If I may...

When you're "renting kit", have you considered that possibly only retaining a single CPU attached to an instance might be "optimal"? Particularly when you take costs into consideration. And that you're running someone else's code.

You seem to be a bit fixated about your dual-CPU Dell 7920. To be honest, it's not really all that impressive by today's standards.

I'm truly not trying to be negative here. It's great that you're learning. Have you drilled down on affinity?

Since you've already taken the leap into AWS, why don't you try running some experiments with single CPU instances? No need to go "bare metal", except in very unusual situations. Compare the results, and come to your own conclusion as to where the economic curves cross.

I hope that comes across the way it is intended. 8-)
To take your points.

1) I have a couple of instances running on single vCPUs on Amazon AWS accounts. They are painfully slow, using very old Xeon CPUs. For example
M110832257 was assigned on 4th April, and is now 10.0% complete. (P-1 factoring has been done too)
M108888137 was assigned on 30th May, and is now 3.4% complete. (P-1 factoring has been done too)
(When the expiry time for the exponents comes around, I will move them to a faster machine)

2) I'm not renting space on Amazon - I got given some free credits, which do expire, making the use of very slow hardware impossible. .

3) I had PM'd Ben, to ask if he had any suggestions on what was most cost-effective on Amazon. He said he had not benchmarked them since 2019, but told me what he had concluded back then - the instances he suggested were either 48 or 96 vCPUs (i.e. 24 or 48 cores) in N. Virginia. These larger instances have CPUs supporting https://en.wikipedia.org/wiki/AVX-512 which I believe mprime can usefully use.

* Ben is clearly a very bright guy, with a maths and computing background.
* Ben had benchmarked different instances, not guessed.
* Based on prices I could find, I did not see anything more cost-effective now. (Yes, I realise spot instances are cheaper, but I don't have time to keep an eye on when the instance closes.)

4) I was keen to see the performance I could get from AWS if I needed some EM simulations done with memory requirements exceeding those of my Dell. Upgrading the RAM in that is possible, but expensive.

5) I am well aware of the performance of the Dell 7920 compared to other hardware available. Before saying its not impressive by today's standards, you need to define impressive. The Dell is a desktop, that's virtually silent in operation - not something that needs to be in a rack, or makes a lot of noise. (The UPS makes more noise than the computer!) The Dell's slightly obscure 8176M CPUs, which Intel will not give any information about, give reasonable performance for their modest cost (£300 GBP / $422 USD each).

I think there are some more logical choices in what I have done than what you give me credit for. They are not all about running GIMPS software.
drkirkby is offline   Reply With Quote
Old 2021-06-16, 18:10   #5
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

23·17·73 Posts
Default

Quote:
Originally Posted by drkirkby View Post
I think there are some more logical choices in what I have done than what you give me credit for. They are not all about running GIMPS software.
But for the use case OF running mprime, I would argue you should optimize for that alone.

WRT Spot Instance and the significant cost savings... It's trivial to codify the automatic restarting of the instance(s). Or just bid high enough that you're not likely to be killed.

Edit: Just for kicks, I drilled down on the pricing of the On-Demand options. c5.metal: $4.08 an hour; $0.044 per v-core. c6g.4xlarge: $0.544 an hour; $0.034 per v-core. The question then becomes, what is the cost per completed job? Knowing, of course, that you can spin up as many instances to run in parallel as you can afford.

Last fiddled with by chalsall on 2021-06-16 at 18:55
chalsall is offline   Reply With Quote
Old 2021-06-17, 00:38   #6
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

44710 Posts
Default

Quote:
Originally Posted by chalsall View Post
Edit: Just for kicks, I drilled down on the pricing of the On-Demand options. c5.metal: $4.08 an hour; $0.044 per v-core. c6g.4xlarge: $0.544 an hour; $0.034 per v-core. The question then becomes, what is the cost per completed job? Knowing, of course, that you can spin up as many instances to run in parallel as you can afford.
C6g is ARM based,
https://aws.amazon.com/ec2/instance-types/c6/
so that means mprime will not run. I am unaware of any ARM based software for GIMPS that is written in assembly code. I would assume a compiled C program is going to be significantly slower than George's mprime, so I would expect c6g to work out more expensive for a given amount of work than the 48 or 96 vCPU instances on Skylake. However, I have not tested this.

I doubt I will be spending any money with Amazon - I am only using up some free credits I got given.

I might be wrong, but I get the feeling that if one wants more processing power for GIMPS, it is more cost-effective to just buy the hardware for oneself. The electricity cost is less than Amazon, and one owns the hardware which has a resale value.

I'm most likely to buy time on AWS if I needed to run software which needed more RAM than I have. That is a distinct possibility. upgrading my RAM by a factor of 2 means buying 12 expensive RDIMMs. Anything beyond a factor of 2 would mean swapping the 32 GB RDIMMs for 64 or 128 GB LRDIMMs, and those are really pricey. One can't even mix RDIMMS and LRDIMMs.

Last fiddled with by drkirkby on 2021-06-17 at 00:45
drkirkby is offline   Reply With Quote
Old 2021-06-17, 00:56   #7
Uncwilly
6809 > 6502
 
Uncwilly's Avatar
 
"""""""""""""""""""
Aug 2003
101×103 Posts

97×103 Posts
Default

Quote:
Originally Posted by drkirkby View Post
I am unaware of any ARM based software for GIMPS that is written in assembly code. I would assume a compiled C program is going to be significantly slower than George's mprime, so I would expect c6g to work out more expensive for a given amount of work than the 48 or 96 vCPU instances on Skylake.
There is a forum for that. https://www.mersenneforum.org/forumdisplay.php?f=118
Search more. post less.
Uncwilly is offline   Reply With Quote
Old 2021-06-17, 01:08   #8
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

44710 Posts
Default

Quote:
Originally Posted by Uncwilly View Post
There is a forum for that. https://www.mersenneforum.org/forumdisplay.php?f=118
Search more. post less.
Thank you, but I can see an obvious problem there. According to https://www.rieselprime.de/ziki/Mlucas it is for Lucas-Lehmer tests. But Primenet will not issue AIDs for LL tests, as they need double-checking. Maybe there's some PRP capable ARM code. I have not looked, and it is time for bed here.
drkirkby is offline   Reply With Quote
Old 2021-06-17, 01:15   #9
Uncwilly
6809 > 6502
 
Uncwilly's Avatar
 
"""""""""""""""""""
Aug 2003
101×103 Posts

999110 Posts
Default

Quote:
Originally Posted by drkirkby View Post
Thank you, but I can see an obvious problem there. According to https://www.rieselprime.de/ziki/Mlucas it is for Lucas-Lehmer tests. But Primenet will not issue AIDs for LL tests, as they need double-checking. Maybe there's some PRP capable ARM code. I have not looked, and it is time for bed here.
Read more, post less:
https://www.mersenneforum.org/showthread.php?t=26573

Also, users have been known to get assignments and modify the worktodo entries and move them between machines. Lurk more.
Uncwilly is offline   Reply With Quote
Old 2021-06-17, 01:59   #10
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

3·149 Posts
Default

Quote:
Originally Posted by Uncwilly View Post
Read more, post less:
https://www.mersenneforum.org/showthread.php?t=26573

Also, users have been known to get assignments and modify the worktodo entries and move them between machines. Lurk more.
It is not practical to read every forum post. I went to the homepage of the software, where it describes it does LL tests, with no mention of P-1 or PRP. The post you show indicates P-1 is coming soon.

I don't feel I want to try to use ARM, as I don't have a decent ARM CPU here, and I have no intention of paying Amazon for ARM. I just don't have the time needed to do a proper comparision of the different Amazon instances. If I was going to be spending 100's of pounds on running on AWS, then I would put in the time to benchmark them more. But at the moment I don't want to mess with ARM.

Last fiddled with by drkirkby on 2021-06-17 at 02:01
drkirkby is offline   Reply With Quote
Old 2021-06-17, 04:06   #11
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

7×23×31 Posts
Default

Quote:
Originally Posted by drkirkby View Post
It is not practical to read every forum post.
It is not practical for *us* to read every one of your forum posts, when a strong majority of them have already been asked and answered. You're really quite full of yourself to continue to declare that you don't have time to read, yet expect us to answer multiple inane queries per day from your brainstorming.

You're wearing out your welcome, and you seem to not take hints. Uncwilly has been rather direct with you, asking you to post less, and you reply with excuses for why you can't be bothered.

Stop bothering us.
Post less.
Really.
We mean it.
VBCurtis is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mprime doesn't use all physical cores / all workers bur Information & Answers 2 2021-06-17 19:00
Running benchmark with mprime paulunderwood Software 2 2021-05-03 16:49
Optimal number of workers for ECM in mprime bur Information & Answers 3 2021-04-26 06:17
Benchmark run - Can I optimise how mprime is run? drkirkby Information & Answers 2 2021-02-12 14:09
mprime benchmark tests backwards? PerformanceTest Software 4 2017-03-01 14:15

All times are UTC. The time now is 10:04.


Sat Oct 16 10:04:50 UTC 2021 up 85 days, 4:33, 0 users, load averages: 0.88, 0.97, 0.95

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.