mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   SkylakeX teasers (aka prime95 29.5) (https://www.mersenneforum.org/showthread.php?t=23723)

ET_ 2019-01-26 10:26

[QUOTE=Prime95;506861]29.5 build 9 for GP2 and ATH to test.

1) FixedHardwareUID=1 implementation changed
2) Hang in multithreaded add and subtract fixed.
3) JSON tweaks per James' request.

Again, this is likely the last 29.5 build. My plan is for the next release to be 29.6 -- a release candidate.

Linux 64-bit: [url]ftp://mersenne.org/gimps/p95v295b9.linux64.tar.gz[/url]
Windows 64-bit: [url]ftp://mersenne.org/gimps/p95v295b9.win64.zip[/url][/QUOTE]

What's on your to-do list for version 29.6? just curious... :smile:

GP2 2019-01-26 13:38

[QUOTE=GP2;506880]Rather than manually inventing some ComputerGUID, would it be possible to have a setting that forces the ComputerGUID to be the hardware GUID?[/QUOTE]

Actually, come to think of it, I'm not sure how Primenet would react to this.

Based on the "Computer Properties" for each CPU known to Primenet, I would assume that the same ComputerGUID should only be shared among instances with the same CPU chip type and speed and number of cores, the same RAM size, and the same work-type preference.

So I suppose it makes more sense to just invent one GUID for all instances doing ECM using m cores, another GUID for LL using n cores, etc.

Also, since the GUID is a 128-bit number... maybe it's intended to be unique across all users? What would happen if two people both chose 00000000000000000000000000000000 ?

Madpoo 2019-01-26 19:16

[QUOTE=GP2;506880]Rather than manually inventing some ComputerGUID, would it be possible to have a setting that forces the ComputerGUID to be the hardware GUID?...[/QUOTE]

In short, no. The computer GUID is generated on the server and is used as a unique ID between tables (ties it to the model/CPU info for reporting).

If you use the feature to keep your hardware GUID the same even when copying your settings to a new computer, it'll keep the same computer GUID even though it's not the same computer.

Why does Primenet care if it's the same computer or not? Because different computers have different CPUs, memory, speed, etc. And they also can have wildly different accuracy. Being able to track an *actually* different computer from the same user is helpful for knowing if we should hand out low exponents to them (will they complete it quickly?), and when I look for systems with terrible accuracy so we can start double-checking their results earlier.

In some cases like cloud computers where they're all generally the same and all generally error-free, not as big a deal. For home computers, I imagine the average user upgrading from one computer to another is doing so because they are vastly different in specs, quality, whatever, so using the same identifier for both of them is taking away some valuable data points.

I don't have any knowledge of what triggers the client to generate a new hardware ID (which then tells Primenet "hey, I'm a totally different computer"). There could be some discussion about whether an OS upgrade would or should qualify... I'd say that maybe the same motherboard but a new CPU, or a change in memory, perhaps should trigger a new hardware GUID. Why? Because a CPU swap or adding more memory are things that could either improve or degrade the quality of results.

Maybe one option would be to setup a special computer ID for cloud computers similar to how "manual testing" is handled. Each user who checks out assignments manually, or returns any using the manual results page, has a "manual testing" computer created for them. If the client would do a check to see if it's running as a virtual machine, it could do something similar? Just a thought.

Madpoo 2019-01-26 19:24

[QUOTE=GP2;506894]Also, since the GUID is a 128-bit number... maybe it's intended to be unique across all users? What would happen if two people both chose 00000000000000000000000000000000 ?[/QUOTE]

It's a GUID, so rather than manually choosing one, just generate a GUID using whatever method. (Nearly) guaranteed to be globally unique. :smile:

examples:
TSQL: select newid()
online: [URL="https://www.guidgenerator.com/"]GUID Generator[/URL]

Mark Rose 2019-01-27 01:43

[QUOTE=Madpoo;506914]It's a GUID, so rather than manually choosing one, just generate a GUID using whatever method. (Nearly) guaranteed to be globally unique. :smile:

examples:
TSQL: select newid()
online: [URL="https://www.guidgenerator.com/"]GUID Generator[/URL][/QUOTE]

Or `uuidgen` on the Linux CLI.

GP2 2019-01-27 02:14

[QUOTE=Prime95;506861]29.5 build 9 for GP2 and ATH to test.[/QUOTE]

I'm now running eight instances of the type that produced the hangs for PRP-2 b=3, we'll see if anything happens in the next few days. If not, no news is good news.

As far as I can tell there are no problems with the FixedHardwareUID=1 stuff. I set the fixed values of ComputerGUID in the local.txt files and merged the CPUs in [url]https://www.mersenne.org/cpus/[/url]

ET_ 2019-01-28 15:02

version 29.5 build 8
 
I am running a double check, and having the following messages.

While I see that the expoent is very close to the FFT limit, I wonder if such limit is a bit too aggressive... oor if I should release this exponent.

[code]
[Work thread Jan 28 15:48] Gerbicz error check passed at iteration 5381952.
[Work thread Jan 28 15:49] Gerbicz error check passed at iteration 5392356.
[Work thread Jan 28 15:49] Iteration: 5400000 / 79109021 [6.82%], ms/iter: 3.465, ETA: 70:57:13
[Work thread Jan 28 15:49] Hardware errors have occurred during the test!
[Work thread Jan 28 15:49] 15 or more Gerbicz/double-check errors.
[Work thread Jan 28 15:49] Confidence in final result is excellent.
[Work thread Jan 28 15:49] Gerbicz error check passed at iteration 5403172.
[Work thread Jan 28 15:50] Gerbicz error check passed at iteration 5414621.
[Work thread Jan 28 15:51] Gerbicz error check passed at iteration 5426502.
[Work thread Jan 28 15:52] Gerbicz error check passed at iteration 5439046.
[Work thread Jan 28 15:52] Gerbicz error check passed at iteration 5452042.
[Work thread Jan 28 15:53] Gerbicz error check passed at iteration 5465731.
[Work thread Jan 28 15:54] Gerbicz error check passed at iteration 5480131.
[Work thread Jan 28 15:55] Gerbicz error check passed at iteration 5495260.
[Work thread Jan 28 15:55] Iteration: 5500000 / 79109021 [6.95%], ms/iter: 3.346, ETA: 68:24:36
[Work thread Jan 28 15:55] Hardware errors have occurred during the test!
[Work thread Jan 28 15:55] 15 or more Gerbicz/double-check errors.
[Work thread Jan 28 15:55] Confidence in final result is excellent.
[Work thread Jan 28 15:56] Gerbicz error check passed at iteration 5510885.
[Work thread Jan 28 15:57] Gerbicz error check passed at iteration 5527269.
[Work thread Jan 28 15:58] Gerbicz error check passed at iteration 5544430.
[Work thread Jan 28 15:59] Gerbicz error check passed at iteration 5562655.
[Work thread Jan 28 16:00] ERROR: Comparing Gerbicz checksum values failed. Rolling back to iteration 5562655.
[Work thread Jan 28 16:00] Continuing from last save file.
[Work thread Jan 28 16:00] Setting affinity to run helper thread 1 on CPU core #3
[Work thread Jan 28 16:00] Setting affinity to run helper thread 2 on CPU core #4
[Work thread Jan 28 16:00] Setting affinity to run helper thread 3 on CPU core #5
[Work thread Jan 28 16:00] Resuming Gerbicz error-checking PRP test of M79109021 using AVX-512 FFT length 4200K, Pass1=1920, Pass2=2240, clm=1, 4 threads
[Work thread Jan 28 16:00] Iteration: 5562656 / 79109021 [7.03%].
[Work thread Jan 28 16:00] Hardware errors have occurred during the test!
[Work thread Jan 28 16:00] 15 or more Gerbicz/double-check errors.
[Work thread Jan 28 16:00] Confidence in final result is excellent.

[/code]

simon389 2019-01-28 16:33

My AVX512 machine is totally fine with regular green double checks on version 29.4 b8 but when I run 29.5 b9 it has hardware errors. Like 0.49 > 0.4.


[QUOTE=ET_;507025]I am running a double check, and having the following messages.

While I see that the expoent is very close to the FFT limit, I wonder if such limit is a bit too aggressive... oor if I should release this exponent.

[code]
[Work thread Jan 28 15:48] Gerbicz error check passed at iteration 5381952.
[Work thread Jan 28 15:49] Gerbicz error check passed at iteration 5392356.
[Work thread Jan 28 15:49] Iteration: 5400000 / 79109021 [6.82%], ms/iter: 3.465, ETA: 70:57:13
[Work thread Jan 28 15:49] Hardware errors have occurred during the test!
[Work thread Jan 28 15:49] 15 or more Gerbicz/double-check errors.
[Work thread Jan 28 15:49] Confidence in final result is excellent.
[Work thread Jan 28 15:49] Gerbicz error check passed at iteration 5403172.
[Work thread Jan 28 15:50] Gerbicz error check passed at iteration 5414621.
[Work thread Jan 28 15:51] Gerbicz error check passed at iteration 5426502.
[Work thread Jan 28 15:52] Gerbicz error check passed at iteration 5439046.
[Work thread Jan 28 15:52] Gerbicz error check passed at iteration 5452042.
[Work thread Jan 28 15:53] Gerbicz error check passed at iteration 5465731.
[Work thread Jan 28 15:54] Gerbicz error check passed at iteration 5480131.
[Work thread Jan 28 15:55] Gerbicz error check passed at iteration 5495260.
[Work thread Jan 28 15:55] Iteration: 5500000 / 79109021 [6.95%], ms/iter: 3.346, ETA: 68:24:36
[Work thread Jan 28 15:55] Hardware errors have occurred during the test!
[Work thread Jan 28 15:55] 15 or more Gerbicz/double-check errors.
[Work thread Jan 28 15:55] Confidence in final result is excellent.
[Work thread Jan 28 15:56] Gerbicz error check passed at iteration 5510885.
[Work thread Jan 28 15:57] Gerbicz error check passed at iteration 5527269.
[Work thread Jan 28 15:58] Gerbicz error check passed at iteration 5544430.
[Work thread Jan 28 15:59] Gerbicz error check passed at iteration 5562655.
[Work thread Jan 28 16:00] ERROR: Comparing Gerbicz checksum values failed. Rolling back to iteration 5562655.
[Work thread Jan 28 16:00] Continuing from last save file.
[Work thread Jan 28 16:00] Setting affinity to run helper thread 1 on CPU core #3
[Work thread Jan 28 16:00] Setting affinity to run helper thread 2 on CPU core #4
[Work thread Jan 28 16:00] Setting affinity to run helper thread 3 on CPU core #5
[Work thread Jan 28 16:00] Resuming Gerbicz error-checking PRP test of M79109021 using AVX-512 FFT length 4200K, Pass1=1920, Pass2=2240, clm=1, 4 threads
[Work thread Jan 28 16:00] Iteration: 5562656 / 79109021 [7.03%].
[Work thread Jan 28 16:00] Hardware errors have occurred during the test!
[Work thread Jan 28 16:00] 15 or more Gerbicz/double-check errors.
[Work thread Jan 28 16:00] Confidence in final result is excellent.

[/code][/QUOTE]

ET_ 2019-01-28 16:52

[QUOTE=simon389;507039]My AVX512 machine is totally fine with regular green double checks on version 29.4 b8 but when I run 29.5 b9 it has hardware errors. Like 0.49 > 0.4.[/QUOTE]

On what (class of) exponent(s)?

GP2 2019-01-28 17:27

[QUOTE=ET_;507025]While I see that the expoent is very close to the FFT limit, I wonder if such limit is a bit too aggressive... oor if I should release this exponent.[/QUOTE]

You should continue with the exponent.

I think we have enough confidence in Gerbicz error checking now, so the program can just continue to run with the smaller FFT length and recover from errors as necessary.

ET_ 2019-01-28 17:32

[QUOTE=GP2;507046]You should continue with the exponent.

I think we have enough confidence in Gerbicz error checking now, so the program can just continue to run with the smaller FFT length and recover from errors as necessary.[/QUOTE]

I guessed the same. The doubt was about the FFT limit maybe a bit too aggressive.


All times are UTC. The time now is 22:33.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.