mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2019-01-26, 10:26   #188
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

481410 Posts
Default

Quote:
Originally Posted by Prime95 View Post
29.5 build 9 for GP2 and ATH to test.

1) FixedHardwareUID=1 implementation changed
2) Hang in multithreaded add and subtract fixed.
3) JSON tweaks per James' request.

Again, this is likely the last 29.5 build. My plan is for the next release to be 29.6 -- a release candidate.

Linux 64-bit: ftp://mersenne.org/gimps/p95v295b9.linux64.tar.gz
Windows 64-bit: ftp://mersenne.org/gimps/p95v295b9.win64.zip
What's on your to-do list for version 29.6? just curious...
ET_ is online now   Reply With Quote
Old 2019-01-26, 13:38   #189
GP2
 
GP2's Avatar
 
Sep 2003

2×5×7×37 Posts
Default

Quote:
Originally Posted by GP2 View Post
Rather than manually inventing some ComputerGUID, would it be possible to have a setting that forces the ComputerGUID to be the hardware GUID?
Actually, come to think of it, I'm not sure how Primenet would react to this.

Based on the "Computer Properties" for each CPU known to Primenet, I would assume that the same ComputerGUID should only be shared among instances with the same CPU chip type and speed and number of cores, the same RAM size, and the same work-type preference.

So I suppose it makes more sense to just invent one GUID for all instances doing ECM using m cores, another GUID for LL using n cores, etc.

Also, since the GUID is a 128-bit number... maybe it's intended to be unique across all users? What would happen if two people both chose 00000000000000000000000000000000 ?
GP2 is offline   Reply With Quote
Old 2019-01-26, 19:16   #190
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

37×89 Posts
Default

Quote:
Originally Posted by GP2 View Post
Rather than manually inventing some ComputerGUID, would it be possible to have a setting that forces the ComputerGUID to be the hardware GUID?...
In short, no. The computer GUID is generated on the server and is used as a unique ID between tables (ties it to the model/CPU info for reporting).

If you use the feature to keep your hardware GUID the same even when copying your settings to a new computer, it'll keep the same computer GUID even though it's not the same computer.

Why does Primenet care if it's the same computer or not? Because different computers have different CPUs, memory, speed, etc. And they also can have wildly different accuracy. Being able to track an *actually* different computer from the same user is helpful for knowing if we should hand out low exponents to them (will they complete it quickly?), and when I look for systems with terrible accuracy so we can start double-checking their results earlier.

In some cases like cloud computers where they're all generally the same and all generally error-free, not as big a deal. For home computers, I imagine the average user upgrading from one computer to another is doing so because they are vastly different in specs, quality, whatever, so using the same identifier for both of them is taking away some valuable data points.

I don't have any knowledge of what triggers the client to generate a new hardware ID (which then tells Primenet "hey, I'm a totally different computer"). There could be some discussion about whether an OS upgrade would or should qualify... I'd say that maybe the same motherboard but a new CPU, or a change in memory, perhaps should trigger a new hardware GUID. Why? Because a CPU swap or adding more memory are things that could either improve or degrade the quality of results.

Maybe one option would be to setup a special computer ID for cloud computers similar to how "manual testing" is handled. Each user who checks out assignments manually, or returns any using the manual results page, has a "manual testing" computer created for them. If the client would do a check to see if it's running as a virtual machine, it could do something similar? Just a thought.
Madpoo is offline   Reply With Quote
Old 2019-01-26, 19:24   #191
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

37·89 Posts
Default

Quote:
Originally Posted by GP2 View Post
Also, since the GUID is a 128-bit number... maybe it's intended to be unique across all users? What would happen if two people both chose 00000000000000000000000000000000 ?
It's a GUID, so rather than manually choosing one, just generate a GUID using whatever method. (Nearly) guaranteed to be globally unique.

examples:
TSQL: select newid()
online: GUID Generator

Last fiddled with by Madpoo on 2019-01-26 at 19:24
Madpoo is offline   Reply With Quote
Old 2019-01-27, 01:43   #192
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

B7316 Posts
Default

Quote:
Originally Posted by Madpoo View Post
It's a GUID, so rather than manually choosing one, just generate a GUID using whatever method. (Nearly) guaranteed to be globally unique.

examples:
TSQL: select newid()
online: GUID Generator
Or `uuidgen` on the Linux CLI.
Mark Rose is offline   Reply With Quote
Old 2019-01-27, 02:14   #193
GP2
 
GP2's Avatar
 
Sep 2003

2·5·7·37 Posts
Default

Quote:
Originally Posted by Prime95 View Post
29.5 build 9 for GP2 and ATH to test.
I'm now running eight instances of the type that produced the hangs for PRP-2 b=3, we'll see if anything happens in the next few days. If not, no news is good news.

As far as I can tell there are no problems with the FixedHardwareUID=1 stuff. I set the fixed values of ComputerGUID in the local.txt files and merged the CPUs in https://www.mersenne.org/cpus/
GP2 is offline   Reply With Quote
Old 2019-01-28, 15:02   #194
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

2×29×83 Posts
Default version 29.5 build 8

I am running a double check, and having the following messages.

While I see that the expoent is very close to the FFT limit, I wonder if such limit is a bit too aggressive... oor if I should release this exponent.

Code:
[Work thread Jan 28 15:48] Gerbicz error check passed at iteration 5381952.
[Work thread Jan 28 15:49] Gerbicz error check passed at iteration 5392356.
[Work thread Jan 28 15:49] Iteration: 5400000 / 79109021 [6.82%], ms/iter:  3.465, ETA: 70:57:13
[Work thread Jan 28 15:49] Hardware errors have occurred during the test!
[Work thread Jan 28 15:49] 15 or more Gerbicz/double-check errors.
[Work thread Jan 28 15:49] Confidence in final result is excellent.
[Work thread Jan 28 15:49] Gerbicz error check passed at iteration 5403172.
[Work thread Jan 28 15:50] Gerbicz error check passed at iteration 5414621.
[Work thread Jan 28 15:51] Gerbicz error check passed at iteration 5426502.
[Work thread Jan 28 15:52] Gerbicz error check passed at iteration 5439046.
[Work thread Jan 28 15:52] Gerbicz error check passed at iteration 5452042.
[Work thread Jan 28 15:53] Gerbicz error check passed at iteration 5465731.
[Work thread Jan 28 15:54] Gerbicz error check passed at iteration 5480131.
[Work thread Jan 28 15:55] Gerbicz error check passed at iteration 5495260.
[Work thread Jan 28 15:55] Iteration: 5500000 / 79109021 [6.95%], ms/iter:  3.346, ETA: 68:24:36
[Work thread Jan 28 15:55] Hardware errors have occurred during the test!
[Work thread Jan 28 15:55] 15 or more Gerbicz/double-check errors.
[Work thread Jan 28 15:55] Confidence in final result is excellent.
[Work thread Jan 28 15:56] Gerbicz error check passed at iteration 5510885.
[Work thread Jan 28 15:57] Gerbicz error check passed at iteration 5527269.
[Work thread Jan 28 15:58] Gerbicz error check passed at iteration 5544430.
[Work thread Jan 28 15:59] Gerbicz error check passed at iteration 5562655.
[Work thread Jan 28 16:00] ERROR: Comparing Gerbicz checksum values failed.  Rolling back to iteration 5562655.
[Work thread Jan 28 16:00] Continuing from last save file.
[Work thread Jan 28 16:00] Setting affinity to run helper thread 1 on CPU core #3
[Work thread Jan 28 16:00] Setting affinity to run helper thread 2 on CPU core #4
[Work thread Jan 28 16:00] Setting affinity to run helper thread 3 on CPU core #5
[Work thread Jan 28 16:00] Resuming Gerbicz error-checking PRP test of M79109021 using AVX-512 FFT length 4200K, Pass1=1920, Pass2=2240, clm=1, 4 threads
[Work thread Jan 28 16:00] Iteration: 5562656 / 79109021 [7.03%].
[Work thread Jan 28 16:00] Hardware errors have occurred during the test!
[Work thread Jan 28 16:00] 15 or more Gerbicz/double-check errors.
[Work thread Jan 28 16:00] Confidence in final result is excellent.

Last fiddled with by ET_ on 2019-01-28 at 15:03
ET_ is online now   Reply With Quote
Old 2019-01-28, 16:33   #195
simon389
 
Aug 2013

3×29 Posts
Default

My AVX512 machine is totally fine with regular green double checks on version 29.4 b8 but when I run 29.5 b9 it has hardware errors. Like 0.49 > 0.4.


Quote:
Originally Posted by ET_ View Post
I am running a double check, and having the following messages.

While I see that the expoent is very close to the FFT limit, I wonder if such limit is a bit too aggressive... oor if I should release this exponent.

Code:
[Work thread Jan 28 15:48] Gerbicz error check passed at iteration 5381952.
[Work thread Jan 28 15:49] Gerbicz error check passed at iteration 5392356.
[Work thread Jan 28 15:49] Iteration: 5400000 / 79109021 [6.82%], ms/iter:  3.465, ETA: 70:57:13
[Work thread Jan 28 15:49] Hardware errors have occurred during the test!
[Work thread Jan 28 15:49] 15 or more Gerbicz/double-check errors.
[Work thread Jan 28 15:49] Confidence in final result is excellent.
[Work thread Jan 28 15:49] Gerbicz error check passed at iteration 5403172.
[Work thread Jan 28 15:50] Gerbicz error check passed at iteration 5414621.
[Work thread Jan 28 15:51] Gerbicz error check passed at iteration 5426502.
[Work thread Jan 28 15:52] Gerbicz error check passed at iteration 5439046.
[Work thread Jan 28 15:52] Gerbicz error check passed at iteration 5452042.
[Work thread Jan 28 15:53] Gerbicz error check passed at iteration 5465731.
[Work thread Jan 28 15:54] Gerbicz error check passed at iteration 5480131.
[Work thread Jan 28 15:55] Gerbicz error check passed at iteration 5495260.
[Work thread Jan 28 15:55] Iteration: 5500000 / 79109021 [6.95%], ms/iter:  3.346, ETA: 68:24:36
[Work thread Jan 28 15:55] Hardware errors have occurred during the test!
[Work thread Jan 28 15:55] 15 or more Gerbicz/double-check errors.
[Work thread Jan 28 15:55] Confidence in final result is excellent.
[Work thread Jan 28 15:56] Gerbicz error check passed at iteration 5510885.
[Work thread Jan 28 15:57] Gerbicz error check passed at iteration 5527269.
[Work thread Jan 28 15:58] Gerbicz error check passed at iteration 5544430.
[Work thread Jan 28 15:59] Gerbicz error check passed at iteration 5562655.
[Work thread Jan 28 16:00] ERROR: Comparing Gerbicz checksum values failed.  Rolling back to iteration 5562655.
[Work thread Jan 28 16:00] Continuing from last save file.
[Work thread Jan 28 16:00] Setting affinity to run helper thread 1 on CPU core #3
[Work thread Jan 28 16:00] Setting affinity to run helper thread 2 on CPU core #4
[Work thread Jan 28 16:00] Setting affinity to run helper thread 3 on CPU core #5
[Work thread Jan 28 16:00] Resuming Gerbicz error-checking PRP test of M79109021 using AVX-512 FFT length 4200K, Pass1=1920, Pass2=2240, clm=1, 4 threads
[Work thread Jan 28 16:00] Iteration: 5562656 / 79109021 [7.03%].
[Work thread Jan 28 16:00] Hardware errors have occurred during the test!
[Work thread Jan 28 16:00] 15 or more Gerbicz/double-check errors.
[Work thread Jan 28 16:00] Confidence in final result is excellent.
simon389 is offline   Reply With Quote
Old 2019-01-28, 16:52   #196
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

10010110011102 Posts
Default

Quote:
Originally Posted by simon389 View Post
My AVX512 machine is totally fine with regular green double checks on version 29.4 b8 but when I run 29.5 b9 it has hardware errors. Like 0.49 > 0.4.
On what (class of) exponent(s)?
ET_ is online now   Reply With Quote
Old 2019-01-28, 17:27   #197
GP2
 
GP2's Avatar
 
Sep 2003

2×5×7×37 Posts
Default

Quote:
Originally Posted by ET_ View Post
While I see that the expoent is very close to the FFT limit, I wonder if such limit is a bit too aggressive... oor if I should release this exponent.
You should continue with the exponent.

I think we have enough confidence in Gerbicz error checking now, so the program can just continue to run with the smaller FFT length and recover from errors as necessary.
GP2 is offline   Reply With Quote
Old 2019-01-28, 17:32   #198
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

2×29×83 Posts
Default

Quote:
Originally Posted by GP2 View Post
You should continue with the exponent.

I think we have enough confidence in Gerbicz error checking now, so the program can just continue to run with the smaller FFT length and recover from errors as necessary.
I guessed the same. The doubt was about the FFT limit maybe a bit too aggressive.
ET_ is online now   Reply With Quote
Reply

Thread Tools


All times are UTC. The time now is 19:41.

Mon May 17 19:41:18 UTC 2021 up 39 days, 14:22, 0 users, load averages: 2.40, 2.45, 2.49

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.