mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2012-07-13, 22:01   #67
ixfd64
Bemusing Prompter
 
ixfd64's Avatar
 
"Danny"
Dec 2002
California

5·479 Posts
Default

I'm not sure if this is a bug, but stopping all workers will cause those running to display a superfluous "Resuming worker at user request" line:

Quote:
[Jul 13 14:47] M56479121 stage 1 is 10.02% complete. Time: 17.976 sec.
[Jul 13 14:47] M56479121 stage 1 is 10.08% complete. Time: 18.237 sec.
[Jul 13 14:48] M56479121 stage 1 is 10.15% complete. Time: 18.213 sec.
[Jul 13 14:48] M56479121 stage 1 is 10.21% complete. Time: 20.183 sec.
[Jul 13 14:48] Stopping worker at user request.
[Jul 13 14:57] Resuming worker at user request.
[Jul 13 14:57] Worker stopped.
ixfd64 is offline   Reply With Quote
Old 2012-07-13, 23:15   #68
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

7,537 Posts
Default

Quote:
Originally Posted by ixfd64 View Post
I'm not sure if this is a bug, but stopping all workers will cause those running to display a superfluous "Resuming worker at user request" line:
It isn't a bug but certainly qualifies as an ugliness. It results from prime95 originally being designed to run without multi-threading in mind (one worker window). After multi-threading was wedged in, the "i'd like to stop and start specific windows" feature was added as a wart on top of a wart.

I can probably find a way to bypass the offending output. However, the equally ugly wart where start just one worker outputs starting and stopping messages in all other worker windows will be a bit harder.
Prime95 is offline   Reply With Quote
Old 2012-07-15, 17:38   #69
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

23·17·71 Posts
Default How do I change (increase) the default FFT size for LL?

I have troubles with 4 expos in the 45M area, p95 is trying to LL them with 2400k and always fail somewhere between 35% and 50% of the work, telling me that "some error" appeared, which is "not hardware" because is "reproducible" (:P) and the "confidence is low". As I don't feel confident myself with such stuff on screen, killed everything, deleted all temp files, started again. Same story. I think I need to increase the FFT. So, how can I teach p95 to use a larger FFT in this area?

(edit: CL selects 2480k for this range, which works fine, but is slower then the "manual tuned" one, 2592k, which is about 15% faster on gtx580. I would like to "play" with this selection for p95 too, if it is possible)

Last fiddled with by LaurV on 2012-07-15 at 17:45
LaurV is online now   Reply With Quote
Old 2012-07-15, 18:03   #70
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3×29×83 Posts
Default

Test=N/A,FFT2=2480K,45xxxxxx,72,1
Dubslow is offline   Reply With Quote
Old 2012-07-20, 11:58   #71
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

23×17×71 Posts
Default

Quote:
Originally Posted by Dubslow View Post
Test=N/A,FFT2=2480K,45xxxxxx,72,1
Thanks. Unfortunately, the problem is not there. It seems that old bug of p95 is still not solved:
Code:
[Thu Jul 19 18:10:31 2012]
Iteration 9000000 / 45487279
M45487279 interim We4 residue DC2998592C8F8E27 at iteration 9000000
M45487279 interim We4 residue B7FE9F1B55C2199D at iteration 9000001
M45487279 interim We4 residue 2E2CEE049ED14D2C at iteration 9000002
[Thu Jul 19 18:17:07 2012]
Iteration 9000000 / 45502937
M45502937 interim We4 residue 44D0C96A87C14E92 at iteration 9000000
M45502937 interim We4 residue 5788E205D7003EED at iteration 9000001
M45502937 interim We4 residue 3E3E082E7D85BF02 at iteration 9000002
[Thu Jul 19 19:45:51 2012]
Iteration 9000000 / 45463799
M45463799 interim We4 residue 26F03738AA9A4E88 at iteration 9000000
M45463799 interim We4 residue 3C14C10E7396B46B at iteration 9000001
M45463799 interim We4 residue 1C163E6C41506F5D at iteration 9000002
[Fri Jul 20 07:46:29 2012]
Iteration: 9076902/45487279, ERROR: ROUND OFF (244419.1983) > 0.40
Continuing from last save file.
Iteration: 9076902/45487279, ERROR: ROUND OFF (244419.1983) > 0.40
Continuing from last save file.
Disregard last error.  Result is reproducible and thus not a hardware problem.
For added safety, redoing iteration using a slower, more reliable method.
Continuing from last save file.
Iteration: 9076903/45487279, ERROR: ROUND OFF (244419.1983) > 0.40
Continuing from last save file.
Disregard last error.  Result is reproducible and thus not a hardware problem.
For added safety, redoing iteration using a slower, more reliable method.
Continuing from last save file.
Iteration: 9076904/45487279, ERROR: ROUND OFF (244419.1983) > 0.40
Continuing from last save file.
Disregard last error.  Result is reproducible and thus not a hardware problem.
For added safety, redoing iteration using a slower, more reliable method.
Continuing from last save file.
Iteration: 9076905/45487279, ERROR: ROUND OFF (244419.1983) > 0.40
Continuing from last save file.
Disregard last error.  Result is reproducible and thus not a hardware problem.
For added safety, redoing iteration using a slower, more reliable method.
Continuing from last save file.
Iteration: 9076906/45487279, ERROR: ROUND OFF (244419.1983) > 0.40
Continuing from last save file.
Disregard last error.  Result is reproducible and thus not a hardware problem.
For added safety, redoing iteration using a slower, more reliable method.
Continuing from last save file.
Iteration: 9076907/45487279, ERROR: ROUND OFF (244419.1983) > 0.40
Continuing from last save file.
Disregard last error.  Result is reproducible and thus not a hardware problem.
For added safety, redoing iteration using a slower, more reliable method.
Continuing from last save file.
Iteration: 9076908/45487279, ERROR: ROUND OFF (244419.1983) > 0.40
Continuing from last save file.
Disregard last error.  Result is reproducible and thus not a hardware problem.
For added safety, redoing iteration using a slower, more reliable method.
Continuing from last save file.
Iteration: 9076909/45487279, ERROR: ROUND OFF (244419.1983) > 0.40
Continuing from last save file.
Disregard last error.  Result is reproducible and thus not a hardware problem.
For added safety, redoing iteration using a slower, more reliable method.
Continuing from last save file.
Iteration: 9076910/45487279, ERROR: ROUND OFF (244419.1983) > 0.40
Continuing from last save file.
Disregard last error.  Result is reproducible and thus not a hardware problem.
For added safety, redoing iteration using a slower, more reliable method.
Continuing from last save file.
Iteration: 9076911/45487279, ERROR: ROUND OFF (244419.1983) > 0.40
Continuing from last save file.
Disregard last error.  Result is reproducible and thus not a hardware problem.
For added safety, redoing iteration using a slower, more reliable method.
Continuing from last save file.
Iteration: 9076912/45487279, ERROR: ROUND OFF (244419.1983) > 0.40
Continuing from last save file.
Disregard last error.  Result is reproducible and thus not a hardware problem.
For added safety, redoing iteration using a slower, more reliable method.
Continuing from last save file.
Iteration: 9076913/45487279, ERROR: ROUND OFF (244419.1983) > 0.40
Continuing from last save file.
Disregard last error.  Result is reproducible and thus not a hardware problem.
For added safety, redoing iteration using a slower, more reliable method.
Continuing from last save file.
Iteration: 9076914/45487279, ERROR: ROUND OFF (244419.1983) > 0.40
Continuing from last save file.
Disregard last error.  Result is reproducible and thus not a hardware problem.
For added safety, redoing iteration using a slower, more reliable method.
Continuing from last save file.
Iteration: 9076915/45487279, ERROR: ROUND OFF (244419.1983) > 0.40
Continuing from last save file.
Disregard last error.  Result is reproducible and thus not a hardware problem.
For added safety, redoing iteration using a slower, more reliable method.
Continuing from last save file.
Iteration: 9076916/45487279, ERROR: ROUND OFF (244419.1983) > 0.40
Continuing from last save file.
Disregard last error.  Result is reproducible and thus not a hardware problem.
For added safety, redoing iteration using a slower, more reliable method.
Continuing from last save file.
Iteration: 9076917/45487279, ERROR: ROUND OFF (244419.1983) > 0.40
Continuing from last save file.
Disregard last error.  Result is reproducible and thus not a hardware problem.
For added safety, redoing iteration using a slower, more reliable method.
Continuing from last save file.
Iteration: 9076918/45487279, ERROR: ROUND OFF (244419.1983) > 0.40
Continuing from last save file.
Disregard last error.  Result is reproducible and thus not a hardware problem.
For added safety, redoing iteration using a slower, more reliable method.
Continuing from last save file.
Iteration: 9076919/45487279, ERROR: ROUND OFF (244419.1983) > 0.40
Continuing from last save file.
Disregard last error.  Result is reproducible and thus not a hardware problem.
For added safety, redoing iteration using a slower, more reliable method.
Continuing from last save file.
Iteration: 9076920/45487279, ERROR: ROUND OFF (244419.1983) > 0.40
Continuing from last save file.
Disregard last error.  Result is reproducible and thus not a hardware problem.
For added safety, redoing iteration using a slower, more reliable method.
Continuing from last save file.
Iteration: 9076921/45487279, ERROR: ROUND OFF (244419.1983) > 0.40
Continuing from last save file.
Disregard last error.  Result is reproducible and thus not a hardware problem.
For added safety, redoing iteration using a slower, more reliable method.
Continuing from last save file.
Iteration: 9076922/45487279, ERROR: ROUND OFF (244419.1983) > 0.40
Continuing from last save file.
Disregard last error.  Result is reproducible and thus not a hardware problem.
For added safety, redoing iteration using a slower, more reliable method.
then about 8-9 MegaBytes of this same crap follows, rendering one of the 4 workers completely unprofitable for all the day (last 13-14 hours since I left the kbd).

Any clue?
LaurV is online now   Reply With Quote
Old 2012-07-20, 12:25   #72
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

11·311 Posts
Default

Quote:
Originally Posted by LaurV View Post
ERROR: ROUND OFF (244419.1983) > 0.40
That seems very suspicious: It's not slightly too large as would normally be the case (e.g. "0.45 > 0.40"), it's half a million times too large.
James Heinrich is offline   Reply With Quote
Old 2012-07-20, 13:46   #73
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

7,537 Posts
Default

One other user reported the same symptoms. A stable computer one day starts spitting out massive roundoff errors. He deleted the save files restarted and has had no trouble since.

All signs point to a bug in prime95 but I could not reproduce it from his save files.

Do you remember if you did anything unusual before this started happening?
Prime95 is offline   Reply With Quote
Old 2012-07-20, 14:54   #74
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

965610 Posts
Default

Quote:
Originally Posted by Prime95 View Post
One other user reported the same symptoms. A stable computer one day starts spitting out massive roundoff errors. He deleted the save files restarted and has had no trouble since.

All signs point to a bug in prime95 but I could not reproduce it from his save files.

Do you remember if you did anything unusual before this started happening?
I remember that discussion but I can't find the thread. There was something with the initialization of the variables, and this symptom I have is certainly related to restarts. I am still digging on it. If P95 is not stopped/resumed, it runs. After stop+continue, or exit+restart, one or two of the workers start to display the "low confidence" story. If I kick it out with task manager, it may start correctly after many retries. If I stop it with test/stop or test/exit (so it rewrites the temp files) it will continue to display the "confidence" message. Now I did the mistake to use "stop" and I lost all the work for the worker 4, every restart came with 3 workers working, and one cycling through the story in the code tags above. I deleted the temp files of the 4th worker. Restart P95. 4th worker works fine (but from scratch!) and... now the third worker starts displaying the "confidence" shit. Trying to use debug switches.
LaurV is online now   Reply With Quote
Old 2012-07-20, 19:06   #75
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

11101011100012 Posts
Default

The confidence crap is normal. It just says that sometime during the LL test some bad errors were happening.

What I need to know is did you do anything unusual right before it started giving you those big roundoff errors. One more thing are these multi-threaded workers?
Prime95 is offline   Reply With Quote
Old 2012-07-21, 00:58   #76
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

1C3516 Posts
Default

Quote:
Originally Posted by LaurV View Post
I remember that discussion but I can't find the thread.
That roundoff bug was apparently only for really small FFTs, though this evidence suggests otherwise.
Dubslow is offline   Reply With Quote
Old 2012-07-21, 06:56   #77
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

23·17·71 Posts
Default

@Dubslow: exactly, that was, thanks. I am reading into it.

@George: yes, I know the "crap" is normal, just telling the story FYI.
I don't remember doing anything unusual. I will watch for it more careful next time. Now I had to restart from scratch the workers 3 and 4. First two workers still survive. Of course, I lost the behavior and can't reconstruct it anymore, but be at easy, it will come back! Usually at 35-50% (the workers which survived are now are about 28-30%).

Something is wrong since I switched the version, looking back I have these 45M expos assigned from GPU272 about the same period (March-April) and never been able to finish them. I did plenty of other things meantime, but the 45M expos never finished, they always returned with errors, I always had to restart them, and I always (incorrectly) blamed the "bad FF size", even asked the forum for help about it.

Yes, these were multi-threaded workers, 4 workers on i7-2600K, each having 2 threads, and using a full physical core with helper (HT enabled). Also, the "roundoff" and "sum(inputs)" error checking were enabled ("checked" on the P95 menu). According with the log, the error appeared few minutes after a p95 restart. I later tried with or without the "error check" options (both, combination) and with or without multi-threading, the errors still appeared, but maybe that was because the files were already corrupted. Interesting is that the test was somehow progressing, P95 used the "slow method" every time, and the iteration number was still increasing, snail-speed.

Anyhow, I will bother you next time, I would like to hope that you got rid of me, but the realistic part of me says the error will be back. If I can do something to help you diagnose the cause, please tell me. I still don't get it what the debug switch is doing exactly. I expected a more verbose message/log, but it wasn't the case.
LaurV is online now   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Prime95 version 27.3 Prime95 Software 148 2012-03-18 19:24
Prime95 version 26.3 Prime95 Software 76 2010-12-11 00:11
Prime95 version 25.5 Prime95 PrimeNet 369 2008-02-26 05:21
Prime95 version 25.4 Prime95 PrimeNet 143 2007-09-24 21:01
When the next prime95 version ? pacionet Software 74 2006-12-07 20:30

All times are UTC. The time now is 10:33.


Mon Aug 2 10:33:02 UTC 2021 up 10 days, 5:02, 0 users, load averages: 1.71, 1.51, 1.34

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.