mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > GMP-ECM

Reply
 
Thread Tools
Old 2018-03-06, 12:19   #1
yoyo
 
yoyo's Avatar
 
Oct 2006
Berlin, Germany

3·197 Posts
Angry ECM RAM issues

Hello,

my minions in yoyo@home running now also ECM stage 2 workunits which needs up to 10 GB RAM. On systems with many cores (e.g. 8 and 32 GB RAM) it happens that BOINC doesn't react fast enough to pause some ECM and the system runs out of memory. In this case ecm (or ggmp) can't allocate memory. This would be ok if in this case ecm returns an error. But unfortunately it exits with 0 or e.g. 3 which are considered as successful.

Some examples:

GMP can't allocate memory, but ecm returns with 0
http://www.rechenkraft.net/yoyo_ops/...lt&id=47192784
Code:
wrapper: running ecm ( -v -timestamp -chkpnt checkpnt -maxmem 10000 -resume in 2900000000)
GNU MP: Cannot allocate memory (size=96)

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
Assertion failed in ecm, but ecm returns with 3
http://www.rechenkraft.net/yoyo_ops/...lt&id=47238831
Code:
wrapper: running ecm ( -v -timestamp -chkpnt checkpnt -maxmem 10000 -resume in 2900000000)

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
Assertion failed!

Program: C:\ProgramData\BOINC\slots\2\ecm
File: stage2.c, Line 425

Expression: Tree[i] != ((void *)0)
EXIT_STATUS: 3
http://www.rechenkraft.net/yoyo_ops/...lt&id=47267247
Code:
wrapper: running ecm (-resume checkpnt -param 0 -v -timestamp -chkpnt checkpnt -inp in -maxmem 1800 850000000)

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
Out of memory in mpzspm_init()
Assertion failed!

Program: C:\ProgramData\BOINC\slots\0\ecm
File: stage2.c, Line 359

Expression: mpzspm != ((void *)0)
EXIT_STATUS: 3
I would like, that in all such cases ecm returns with -1.

yoyo
yoyo is offline   Reply With Quote
Old 2018-03-06, 18:46   #2
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

2×3×1,699 Posts
Default

Quote:
Originally Posted by yoyo View Post
Hello,

my minions in yoyo@home running now also ECM stage 2 workunits which needs up to 10 GB RAM. On systems with many cores (e.g. 8 and 32 GB RAM)
Is it possible to tell your minions to run ecm with the maxmem parameter set to something sane? By default, that is --- please let those with oodles of RAM to work at full speed.
xilman is offline   Reply With Quote
Old 2018-03-07, 07:59   #3
yoyo
 
yoyo's Avatar
 
Oct 2006
Berlin, Germany

24F16 Posts
Default

I limit RAM usage already to 10 GB with -maxmem 10000. If I limit it more workunits will run very long. In most cases it is working OK, BOINC pauses workunits or doesn't start them. But sometimes BOINC reacts too slow. E.g. BOINC starts 8 workunits, because there is 30 GB free RAM. But after some time all 8 jump at the same time to a usage of 10 GB each and it takes some seconds until BOINC jumps in an suspends some tasks.
yoyo is offline   Reply With Quote
Old 2018-03-07, 09:23   #4
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

3×5×383 Posts
Default

Quote:
Originally Posted by yoyo View Post
I limit RAM usage already to 10 GB with -maxmem 10000. If I limit it more workunits will run very long. In most cases it is working OK, BOINC pauses workunits or doesn't start them. But sometimes BOINC reacts too slow. E.g. BOINC starts 8 workunits, because there is 30 GB free RAM. But after some time all 8 jump at the same time to a usage of 10 GB each and it takes some seconds until BOINC jumps in an suspends some tasks.
Wouldn't it be fairly easy to split stage 1 and 2 into separate jobs?
henryzz is offline   Reply With Quote
Old 2018-03-07, 09:38   #5
yoyo
 
yoyo's Avatar
 
Oct 2006
Berlin, Germany

3×197 Posts
Default

Yes, I do it already for such monsters. The problems happens for the stage 2 workunits.
yoyo is offline   Reply With Quote
Old 2018-03-07, 16:44   #6
chris2be8
 
chris2be8's Avatar
 
Sep 2009

29·67 Posts
Default

It would also be helpful if ecm ran out of memory because something else is using a lot of memory. You want to be able to tell an error occurred weather or not you were expecting one.

Chris
chris2be8 is offline   Reply With Quote
Old 2018-04-28, 05:28   #7
kosta
 
Jan 2013

23·7 Posts
Default

Personally, I have run into this problem on my computers, in fact the computers on some cluster had no virtual memory at all and this led to a kernel panic on few occasions.

What i do these days is *always* start ecm instances in staggered fashion, so that stage 2 is not simultaneous in different threads. This is in my ecm.py . Also, I have a bash script which does it after the fact pauses ecm in order to achieve this effect.

The problem is that a good value for the DELAY variable depends alot on the number being factored and B1. The python script tries to take that into account, if anybody wants it, pm me.

Code:
#!/bin/sh

#to sleep all of them one by one:
# ps -o pid -C ecm | grep [0-9] | xargs kill -SIGSTOP 

PROGRAMNAME=ecm

ARG=$1

if [[ $((ARG)) -ge 1 ]]; then
DELAY=$ARG
else
DELAY=3600
fi
echo "Using DELAY=$DELAY"

SLEEPTIME=0

PIDS=`ps -o pid -C $PROGRAMNAME | grep [0-9] | xargs echo`
echo "Working with $STEAMID of $PIDS"

COUNT=1

for id in $PIDS 
do
echo "Now SIGSTOP-ing $id for $((SLEEPTIME/60)) minutes"
kill -19 $id 
sleep $SLEEPTIME
kill -18 $id
SLEEPTIME=$((SLEEPTIME+DELAY))
done

# 19 is SIGSTOP, 18 is SIGCONT
kosta is offline   Reply With Quote
Old 2018-04-28, 05:51   #8
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

587910 Posts
Default

Perhaps a better solution would be for the ECM code to pause and try later to get the memory it needs. Say a loop with a 60 second delay or whatever. Maybe giving up if it has to wait more than 12 hours or something. That way the threads/processes will auto-stagger themselves.
retina is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Windows 10 Issues Sulamandora Software 5 2015-08-13 06:47
Mersenne.org Issues SiliconSentry Information & Answers 3 2014-05-21 22:36
New GPU; new issues... chalsall GPU Computing 18 2013-06-12 19:28
AffinityScramble issues willmore Software 9 2009-10-26 20:47
Speed issues... Xyzzy Lounge 42 2003-10-08 01:27

All times are UTC. The time now is 00:50.

Thu Nov 26 00:50:09 UTC 2020 up 76 days, 22:01, 3 users, load averages: 1.13, 1.12, 1.17

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.