mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   Prime95 version 28.6 / 28.7 (28.7 now available!) (https://www.mersenneforum.org/showthread.php?t=20156)

Prime95 2015-08-10 03:09

[QUOTE=pepi37;406262]Can you compile 287 version?[/QUOTE]

OK, OK, 28.7 now available.

pepi37 2015-08-10 05:25

[QUOTE=Prime95;407580]OK, OK, 28.7 now available.[/QUOTE]
Thanks! :bow:
TitleOutputFrequency works!

retina 2015-08-10 05:39

[QUOTE=Prime95;407573]Carry propagation takes a "shortcut" ...[/QUOTE]Shortcuts make long delays, [size=1][sub]but inns make longer ones[/sub][/size].

Tolkien had this right.

S485122 2015-08-10 08:16

[QUOTE=Madpoo;407572]...
Anyway, I don't worry about it too much. Yeah, it makes the resulting error code non-zero, but if it's reproducible then the error code reflects that and it's not really too bad.
...[/QUOTE]My point was that the errors were reproducible : they occurred at exactly the same iteration, with the same round off error and only the second time was the error considered reproducible. In the end the test was successful.

Anyway by changing the crossover point this thing will not happen any more. It plays havoc with the PrimeNet reliability of the computer and thus influences the work-units one gets.

Jacob

Madpoo 2015-08-10 15:43

[QUOTE=S485122;407587]... It plays havoc with the PrimeNet reliability of the computer and thus influences the work-units one gets.[/QUOTE]

I don't remember the details, but I don't think the occasional reproducible round-off error will cause the computer's reliability index to suffer too much... at least not to the point where it won't get preferred assignments. George would have more input on that, but I think a reliability of 0.98+ is all it takes to get preferred work, so there is some accounting for "typical" round-off cases.

Plus you can login and tick your CPU's and mark them "corrected" to set the reliability back to 1.0 with the "Reset, I fixed the hardware" option.

I'm more concerned at the moment with all of the flaky machines I'm finding (of course, I'm [B]looking[/B] for them, so it's not surprising I'm finding them) that have high reliability, their results had a zero error code, but the residue is bad. Obviously these machines suffered along the way with some problems or another... overclocked memory perhaps.

Some systems return bad results so often, I'm surprised their system in general isn't crashing on a daily basis. :smile:

So yeah, when systems like that can skate by (at least until the double checkers discover their true nature) and good machines get flagged because of reproducible round-offs, it can seem unfair, but like I said, if you know your system is okay and all you've ever seen are those repro round-offs, just mark the "I fixed it" option and carry on.

ewmayer 2015-08-12 00:57

[QUOTE=S485122;407587]My point was that the errors were reproducible : they occurred at exactly the same iteration, with the same round off error and only the second time was the error considered reproducible. In the end the test was successful.[/QUOTE]

Sorry if I am misunderstanding some detail here - just happened across last few posts here and too busy/lazy to read back in time - but wouldn't one need the 'second time' to establish reproducibility? Or are you referring to 'second time' in the sense of second 'hit-error/restart-from-last-savefile/see-if-retry-hits-same-error' program-flow cycle?

Thanks,
-Ernst

Prime95 2015-08-12 02:15

[QUOTE=ewmayer;407713]Sorry if I am misunderstanding some detail here[/QUOTE]

1) You get a roundoff > 0.4 on iteration X
2) Prime95 rolls back to the last save file
3) When you get to iteration X again, on good hardware prime95 expects to get a roundoff > 0.4 error again. However, this is not guaranteed.

Madpoo 2015-08-12 02:53

[QUOTE=Prime95;407716]1) You get a roundoff > 0.4 on iteration X
2) Prime95 rolls back to the last save file
3) When you get to iteration X again, on good hardware prime95 expects to get a roundoff > 0.4 error again. However, this is not guaranteed.[/QUOTE]

This reminds me... can you alter the frequency that save files are generated from 30 minutes to maybe 5 minutes in an attempt to lower the impact of these rollback/retry cycles?

Or would 5 minutes not roll back enough iterations to retry using the alternate method for those cases when it's reproducible?

I see about one of my workers per day doing this on various 34M exponents that I presume were near the FFT crossover... just thinking of ways to recoup that lost hour or so.

sdbardwick 2015-08-12 03:18

2 Attachment(s)
[QUOTE=Madpoo;407717]This reminds me... can you alter the frequency that save files are generated from 30 minutes to maybe 5 minutes in an attempt to lower the impact of these rollback/retry cycles?[/QUOTE]This? Minimum is 10 minutes from the GUI. Don't know if changing [I]DiskWriteTime=30[/I] in prime.txt to lower than 10 works.

[ATTACH]12955[/ATTACH]
EDIT: Manually editing the prime.txt seems to work:[ATTACH]12956[/ATTACH]

Madpoo 2015-08-12 04:03

[QUOTE=sdbardwick;407718]This? Minimum is 10 minutes from the GUI. Don't know if changing [I]DiskWriteTime=30[/I] in prime.txt to lower than 10 works.

[ATTACH]12955[/ATTACH]
EDIT: Manually editing the prime.txt seems to work:[ATTACH]12956[/ATTACH][/QUOTE]

Yeah, I just set it to 10 minutes but I didn't know if it was safe to do so in terms of retrying the iterations with the alternate method.

Well, I guess it should be fine... I have most of my 34M work being done by 6-core workers, and even those can go through an amazing # of iterations at that FFT size in 30 minutes. :smile:

I assume then that with 10 minutes between intermediate writes, at most I'd lose 20 minutes instead of an hour. Works for me.

sdbardwick 2015-08-12 04:21

IIRC (I did a search of the forum but no joy), not setting ErrorCheck=1 and SumInputsErrorCheck=1 to force error checking every iteration means the error checking occurred every 100 iterations (+- 1 factor of 10), so any positive integer setting of DiskWriteTime would be fine from a number of iterations standpoint.

I might have read that in an ancient readme or undoc file.

[SIZE="1"][COLOR="SlateGray"][FONT="Arial Narrow"](Then again, I might be misinterpreting the whole situation...)[/FONT][/COLOR][/SIZE]


All times are UTC. The time now is 20:38.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.