mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2021-07-15, 03:08   #331
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

763210 Posts
Default

Sounds like a bug in workers coordinating memory allocation. Read undoc.txt and look for the option that limits each workers maximum memory. Also I'd consider limiting the number of workers that can be in stage 2 at the same time.

Something like for 6 workers / 16GB, only allow 3 workers do stage 2 at one time with a 5.3GB cap on each workers mem usage.
Prime95 is offline   Reply With Quote
Old 2021-07-15, 17:44   #332
Viliam Furik
 
Viliam Furik's Avatar
 
"Viliam Furík"
Jul 2018
Martin, Slovakia

23·5·17 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Sounds like a bug in workers coordinating memory allocation. Read undoc.txt and look for the option that limits each workers maximum memory. Also I'd consider limiting the number of workers that can be in stage 2 at the same time.

Something like for 6 workers / 16GB, only allow 3 workers do stage 2 at one time with a 5.3GB cap on each workers mem usage.
I'll keep that in mind in case I need to apply it - so far, 6 well-fed workers are working fine.

But there is a roundoff related bug. P-1 test of M1041949, B1=10,000,000, FFT=50K, Pass1=640, Pass2=80, clm=1, seems to have the same error - it reports possible roundoff error with value 0.4375 and cycles from the last savefile - at the same place during Stage 1 (49.15%). Behavior repeats after the deletion of all related savefile and a fresh start.
Viliam Furik is offline   Reply With Quote
Old 2021-07-15, 18:00   #333
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

1DD016 Posts
Default

Quote:
Originally Posted by Viliam Furik View Post
But there is a roundoff related bug. P-1 test of M1041949, B1=10,000,000, FFT=50K, Pass1=640, Pass2=80, clm=1, seems to have the same error - it reports possible roundoff error with value 0.4375 and cycles from the last savefile - at the same place during Stage 1 (49.15%). Behavior repeats after the deletion of all related savefile and a fresh start.
I'll work on improving error recovery in 30.7. For now, you'll need to force a larger FFT length. SafetyMargin in undoc.txt may help.
Prime95 is offline   Reply With Quote
Old 2021-07-15, 18:48   #334
Viliam Furik
 
Viliam Furik's Avatar
 
"Viliam Furík"
Jul 2018
Martin, Slovakia

23·5·17 Posts
Default

Quote:
Originally Posted by Prime95 View Post
I'll work on improving error recovery in 30.7. For now, you'll need to force a larger FFT length. SafetyMargin in undoc.txt may help.
Setting it to 0.005 should do the trick?

EDIT:
For some reason, setting the ExtraSafetymargin to 0.005 and 0.01 just makes it crash right at the beginning.

Last fiddled with by Viliam Furik on 2021-07-15 at 19:18
Viliam Furik is offline   Reply With Quote
Old 2021-07-19, 14:33   #335
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

44710 Posts
Default

One can add lines to worktodo.txt whilst mprime is running by use of worktodo.add, as documented in undoc.txt. A similar facility to remove entries, using a file worktodo.remove, would be useful.
drkirkby is offline   Reply With Quote
Old 2021-07-19, 14:55   #336
S485122
 
S485122's Avatar
 
"Jacob"
Sep 2006
Brussels, Belgium

24·109 Posts
Default

Quote:
Originally Posted by drkirkby View Post
One can add lines to worktodo.txt whilst mprime is running by use of worktodo.add, as documented in undoc.txt. A similar facility to remove entries, using a file worktodo.remove, would be useful.
One obvious way to do that, and it can help cleaning up manual assignments one don't bother about anymore, is to go to the Assignments page on Primenet and un-assign them. If they are not manual assignments, the next synchronisation, be it manual or automatic, will remove them from the work list.

Jacob
S485122 is offline   Reply With Quote
Old 2021-07-19, 15:24   #337
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

52×127 Posts
Default

Quote:
Originally Posted by Viliam Furik View Post
Setting it to 0.005 should do the trick?

EDIT:
For some reason, setting the ExtraSafetymargin to 0.005 and 0.01 just makes it crash right at the beginning.
At 200K AVX-512 FFT I had to use "ExtraSafetymargin=0.03" for it to use the next higher 240K FFT, but I guess it depends on how far from the border you are, and maybe on the current FFT.

The next FFT above 50K for you should be 60K, so keep increasing ExtraSafetymargin until it chooses 60K for the assignments.
ATH is offline   Reply With Quote
Old 2021-07-19, 16:13   #338
Viliam Furik
 
Viliam Furik's Avatar
 
"Viliam Furík"
Jul 2018
Martin, Slovakia

23·5·17 Posts
Default

Quote:
Originally Posted by ATH View Post
At 200K AVX-512 FFT I had to use "ExtraSafetymargin=0.03" for it to use the next higher 240K FFT, but I guess it depends on how far from the border you are, and maybe on the current FFT.

The next FFT above 50K for you should be 60K, so keep increasing ExtraSafetymargin until it chooses 60K for the assignments.
I've made it 0.1. I will see if it works.
Viliam Furik is offline   Reply With Quote
Old 2021-07-19, 18:56   #339
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

44710 Posts
Default

Quote:
Originally Posted by S485122 View Post
One obvious way to do that, and it can help cleaning up manual assignments one don't bother about anymore, is to go to the Assignments page on Primenet and un-assign them. If they are not manual assignments, the next synchronisation, be it manual or automatic, will remove them from the work list.

Jacob
Thank you.
drkirkby is offline   Reply With Quote
Old 2021-07-19, 19:16   #340
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

132278 Posts
Default

Quote:
Originally Posted by drkirkby View Post
to remove entries, using a file worktodo.remove, would be useful.
Why not text editor including save file, followed by stop all workers, continue all workers, manual communication, send new expected completion dates to server?
kriesel is offline   Reply With Quote
Old 2021-07-19, 19:53   #341
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

6778 Posts
Default

Quote:
Originally Posted by kriesel View Post
Why not text editor including save file, followed by stop all workers, continue all workers, manual communication, send new expected completion dates to server?
That requires stopping all the workers, which I would rather not do. I suspect George must have felt it desirable to be able to add entries to worktodo.txt without stopping workers, which is probably why the worktodo.add facility exists.

As a test I just
  1. Added something like "PRP=1,2,big_exponent,-1,76,1" to worktodo.add
  2. Forced a manual communication with mprime -c. The server assigned me the big exponent, so it got added to worktodo.txt
  3. Unassigned the unwanted exponent at https://www.mersenne.org/workload/
  4. Forced a second manual communication (mprime -c). The entry got removed from worktodo.txt, just as S485122 wrote it would.
I think that's preferable to stopping all workers.
drkirkby is offline   Reply With Quote
Reply

Thread Tools


All times are UTC. The time now is 12:34.


Thu Oct 21 12:34:16 UTC 2021 up 90 days, 7:03, 1 user, load averages: 1.00, 1.15, 1.28

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.