mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > PrimeNet

Reply
 
Thread Tools
Old 2008-02-01, 06:39   #100
chappjc
 
chappjc's Avatar
 
Jul 2007

22·5 Posts
Default

@James: I have seen all of that before, which is why I was curious to see if anything has changed.

Low and behold, it works fine now in the x64 version. Two workers split the RAM in half (almost, e.g. 475MB & 496MB) and the other two do the primality tests. ("Other threads are using lots of memory now. Looking for work that uses less memory. Starting primality test...")

Interestingly, if i close and restart prime95, different workers (definitely different exponents) will be chosen to get the bulk of the RAM. Seems like a random decision -- whichever starts first I guess.
chappjc is offline   Reply With Quote
Old 2008-02-01, 18:25   #101
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

113228 Posts
Default

chappjc, the same happened to me, with two threads fighting for RAM during ECM. I decided to trial-factor with the first thread.

OTOH, I noticed that my PC, who served me fine for more than a year with no errors in exponents results, now behaves badly.

I have a dual core AMD X2 @ 2 GHz, and ONE bank of RAM (it is DDR2 533).

I used to run different factoring and primality programs at once, without problems, but now that I run prime95 v25.6 (two threads, factoring and ecm) and prime95 24.14 for LL-tests, I find my PC with the BSOD

Maybe I should buy two rims of DDR2-800 RAM to cope with the processes.

I'll let you all know.

Luigi
ET_ is offline   Reply With Quote
Old 2008-02-01, 22:27   #102
chappjc
 
chappjc's Avatar
 
Jul 2007

101002 Posts
Default

Quote:
Originally Posted by ET_ View Post
I used to run different factoring and primality programs at once, without problems, but now that I run prime95 v25.6 (two threads, factoring and ecm) and prime95 24.14 for LL-tests, I find my PC with the BSOD
Yet another problem we have had in common. To my surprise, it turned out to be my network driver that was triggering the blue screens (not that this is also your problem)! Something about how prime95 used the network that upset the network driver. I found out by:
  1. Install Windows Debug Tools and the debug symbols.
  2. Run Windbg.
  3. If you didn't install symbols, set the symbols path to:
    SRV*c:\symbols*http://msdl.microsoft.com/download/symbols
  4. File - Open crash dump.
  5. Select the minidump file in C:\windows\Minidump
  6. Inspect output when it is finished processing (can take a minute)
Very helpful tool that Windbg. :) Oh yeah, updating the driver fixed it for me, but I think I got lucky...
chappjc is offline   Reply With Quote
Old 2008-02-02, 12:21   #103
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

11×311 Posts
Default

The RAM problem isn't as by-chance as I thought. I have 3 threads doing TF now, and only 1 thread doing anything that requires any sizable amount of RAM (ECM on F25). And it's running out of memory all by itself:
Code:
[Feb 1 23:41] F25 curve 1 stage 1 at prime 47051 [94.10%]. Time: 5349.774 sec.
[Feb 2 00:16] Stage 1 complete. 1078497 transforms, 1 modular inverses. Time: 2069.933 sec.
[Feb 2 00:16] Using 1003MB of memory in stage 2.
[Feb 2 00:17] Out of memory!
[Feb 2 00:17] Out of memory!
[Feb 2 00:17] Work thread stopped.
[Feb 2 06:51] Work thread starting
[Feb 2 06:51] Using all-complex FFT length 2048K
[Feb 2 06:51] ECM on F25: curve #1 with s=7600232995194080, B1=50000, B2=5000000
[Feb 2 06:52] F25 curve 1 stage 1 at prime 48523 [97.04%].
[Feb 2 07:10] Stage 1 complete. 39827 transforms, 1 modular inverses. Time: 1105.292 sec.
[Feb 2 07:10] Using 1003MB of memory in stage 2.
[Feb 2 07:11] Out of memory!
[Feb 2 07:11] Out of memory!
[Feb 2 07:11] Out of memory!
[Feb 2 07:11] Work thread stopped.
You can see I tried restarting the thread and it failed again, about 1 minute after starting stage2 (TaskManager shows the RAM in use slowly increasing as expected, until it fails). Available RAM is set to 1792MB in Prime95. The machine actually has about 2260MB of RAM actually free (3580MB usable to Windows, 1320MB already in use), so I don't see why it would actually be running out of memory?

edit: maybe it was just weirdness on my machine. Within an hour I'd got Windows Explorer perpetually crashing on itself and had to reboot, and after the reboot Prime95 seems to be behaving itself, not running Out of Memory.

Last fiddled with by James Heinrich on 2008-02-02 at 13:03
James Heinrich is online now   Reply With Quote
Old 2008-02-03, 20:24   #104
ckdo
 
ckdo's Avatar
 
Dec 2007
Cleves, Germany

2·5·53 Posts
Exclamation ERROR: Unable to open spool file.

Here's one of the more absurd bugs.

"mprime -s" will create prime.spl if there is none. All good and well, except it will set the file's permissions to 644 and the owner to whoever invoked the command.

Now, if you run mprime under a user account you may not get too happy with a prime.spl which is owned by root just because you wanted to look at your status from a root shell...

So either the owner for prime.spl should be copied from worktodo.txt (or whatever), or permissions should be 666, or the file never be deleted in the first place.

Similar problems should be expected with other files which are created automagically in case they don't exist.
ckdo is offline   Reply With Quote
Old 2008-02-03, 21:52   #105
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

11×311 Posts
Default

The 26.6 x64 build isn't playing nice on Vista64 (AMD X2, 2GB RAM). One thread LL, one thread TF.

For some reason it's causing svchost.exe to take up basically a full core doing I don't know what, so I've got about 50% CPU on Prime95 and 50% on svchost.exe
32-bit version doesn't do this (on either Vista32 or Vista64), and 25.5 x64 didn't do this either.
James Heinrich is online now   Reply With Quote
Old 2008-02-03, 21:57   #106
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

5×11×137 Posts
Default

Quote:
Originally Posted by ckdo View Post
"mprime -s" will create prime.spl if there is none. All good and well, except it will set the file's permissions to 644 and the owner to whoever invoked the command.
Try "umask 0"
Prime95 is online now   Reply With Quote
Old 2008-02-04, 09:51   #107
ckdo
 
ckdo's Avatar
 
Dec 2007
Cleves, Germany

10228 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Try "umask 0"
Well ... no, sorry, not too helpful.

I wanted to prevent any of mprime's data files to have the wrong owner (or permissions) upon accidental invocation of mprime as the wrong user (usually root).

I have now turned on the SUID and SGID bits on the mprime executable which should solve the problem, at least for me. I'm still concerned about other people running into the same problem who have not taken any preventive measures.

I guess that being unable to write to prime.spl will cause mprime to happily proceed through its worktodo.txt without ever returning a result from that point on, discarding the exponents it completes. That's what it did for me, at least, if only on two intermediate factoring results.

Without wanting to check, being unable to write to worktodo.txt will probably cause mprime to check out exponents which are never processed and/or process the same exponent time and again. One can certainly come up with a whole bunch of problems which arise from being unable to write to (or read from) any of the data files, and I think this should be taken care of in the code at least to some extent.

Like by alerting the user, giving him the chance to solve the problem, and then retrying the failed operation (which is probably hard to implement in all the right places). Or by checking that the SUID and SGID bits on the executable are set at launch time (if my logic is right). Or by allowing only the owner of the mprime executable to actually run it, regardless of the file permissions. Whatever.

Just my $.02, of course. The problem has supposedly always been there and it's unlikely that anyone can tell how much trouble it has caused.
ckdo is offline   Reply With Quote
Old 2008-02-04, 12:59   #108
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

1101010111012 Posts
Default

Just curious how the server assigns TF assignments. I seem to get exponents to TF somewhere 63-64 bits and 66-67 bits. Do I assume it randomly pulls from the pool and gets factored up by (only?) 1 bit depth and thrown back in the pool (assuming no factors)? As opposed to the previous technique of factoring up from whatever it's currently at to the max (67/68/69/etc-bit) all at once? Is this more efficient than assigning 63->67 to one computer?
James Heinrich is online now   Reply With Quote
Old 2008-02-04, 13:06   #109
garo
 
garo's Avatar
 
Aug 2002
Termonfeckin, IE

276410 Posts
Default

James, I think yes. I remember George saying somewhere that TFs will be assigned one bit level at a time. I doubt it is computationally more efficient - probably the same but with added overhead. I think the motivation was to break up TF into smaller pieces to 1) allow slow computers to participate and 2) not loose work when a computer does all TF levels but the last and does not report and hence all work is lost.
garo is offline   Reply With Quote
Old 2008-02-04, 13:32   #110
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

11×311 Posts
Default

Quote:
Originally Posted by garo View Post
I remember George saying somewhere that TFs will be assigned one bit level at a time.
I remember George saying that v25.x reports status after each TF bit level (not just at the end of everything), so even if a computer died halfway through factoring 63->67, those levels it had actually completed should be reported already so the work isn't really "lost". However, it would take ~60+ days to realize that the computer isn't going to report status (and therefore the rest of the factoring should be reassigned). So while in a perfect world (where all computers finish all their assignments) this method has more overhead and is less efficient, in the real world (while still technically less efficient due to increased overhead) it should allow more compact ranges to be completed faster.

However, in the interest of efficiency, I think it would be good to more restrictively assign TF based on computer power. For the faster systems out there (e.g. Core2 equivalent) preference should be given to higher bit ranges. On my C2Q@3.5GHz it takes ~3.5h to factor 66->67, and only 20 minutes for 63->64. I'd prefer to see faster machines only getting TF in the 66+ range. Perhaps a rough formula like "what is the largest TF bit range I can factor in <12> hours?". Especially taking into account what architectures are most efficient at any particular TF depth.
James Heinrich is online now   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Prime95 version 27.3 Prime95 Software 148 2012-03-18 19:24
Prime95 version 26.3 Prime95 Software 76 2010-12-11 00:11
Prime95 version 25.5 Prime95 PrimeNet 369 2008-02-26 05:21
Prime95 version 25.4 Prime95 PrimeNet 143 2007-09-24 21:01
When the next prime95 version ? pacionet Software 74 2006-12-07 20:30

All times are UTC. The time now is 21:05.


Sun Aug 1 21:05:16 UTC 2021 up 9 days, 15:34, 0 users, load averages: 1.06, 1.37, 1.45

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.