mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > PrimeNet

Reply
 
Thread Tools
Old 2002-09-16, 02:52   #1
ADBjester
 
Aug 2002

2×3×5 Posts
Default Server failure caused a lockup?

On Sep 13th at about 2 am UTC, two of my machines tried to contact Primenet to update its progress. On both of those machines (each running 22.9), it said "Contacting Primenet Server".... and hung.

CPU dropped to zero.
No further iterations were done.
No further contact with the server was done.

Both clients were locked up hard enough to require a system reboot (though the machines themselves continued to run other apps well). Very little in common between these two machines. One, a Duron 950, runs Windows NT 4. The other, a P-III 450, runs Windows 2000.

I have other 22.9 machines that did their updates at other times, and they worked fine.... no lockups.

It seems only like those that did updates at about 2 am +/- an hour had the problem.

I think its an extraordinarily bad thing that a failed communication attempt with the server can cause the client to stop crunching.... at worst, it should have said "unable to contact host, will try later", and keep doing iterations.

1) Did this happen to anyone else?

2) Is there going to be a fix, if it is a problem in the client?

Thanks, George!

Jeff Woods
ADBjester is offline   Reply With Quote
Old 2002-09-16, 04:59   #2
dswanson
 
dswanson's Avatar
 
Aug 2002

23·52 Posts
Default

I've run into this same problem, using v22.8. Same solution too - complete reboot required to revive...
- One P-III 800 on Windows ME.
- One P4 1800 on Windows XP.
- One P4 1800 on Windows 98.
dswanson is offline   Reply With Quote
Old 2002-09-16, 09:14   #3
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

11011111001002 Posts
Default

Try putting Debug=1 in primenet.ini. If you can replicate the problem, then send me the prime.log file and maybe we can figure out where it hung and fix it.
Prime95 is offline   Reply With Quote
Old 2002-09-16, 11:08   #4
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

1E2416 Posts
Default

Does "Debug=1" slow Prime95 down any?
Xyzzy is offline   Reply With Quote
Old 2002-09-16, 14:14   #5
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

714010 Posts
Default

Debug=1 in PRIMENET.INI does not slow things down. It does fill up your prime.log file with a lot of verbose text every time you contact the server.
Prime95 is offline   Reply With Quote
Old 2002-09-17, 00:46   #6
garo
 
garo's Avatar
 
Aug 2002
Termonfeckin, IE

9CE16 Posts
Default

I have run across this problem too. There is a better way than rebooting though. Go to taskmanager and terminate the Prime95 process. That usually seems to work.
Garo
garo is offline   Reply With Quote
Old 2002-09-17, 07:20   #7
NickGlover
 
NickGlover's Avatar
 
Aug 2002
Richland, WA

22×3×11 Posts
Default

It seems that the server is down right now, and v22.9 took 15 minutes to fail, but I had turned on debugging. Here is the relevant info from "prime.log": ( Machine is Win98 SE, WinModem Internet connection )

Times from my computer are Eastern

[Tue Sep 17 02:40:53 2002 - ver 22.9]
Updating computer information on the server
Sending expected completion date for M7901281: Sep 22 2002
Sending expected completion date for M12304217: Oct 04 2002
Sending expected completion date for M12304921: Oct 15 2002
Getting exponents from server

// Nick: This was before I turned debugging on. It just got stuck here for a few minutes and I eventually killed the task.

[Tue Sep 17 02:44:49 2002 - ver 22.9]
host = mersenne.org, port = 80
IP-addr = 216.120.70.80
GET /cgi-bin/pnHttp.exe?ps&32516&.&. HTTP/1.0



[Tue Sep 17 02:59:49 2002 - ver 22.9] // Finally failed after 15 minutes
RECV: HTTP/1.1 502 Gateway Error

Server: Microsoft-IIS/4.0

Date: Tue, 17 Sep 2002 06:58:56 GMT

Content-Length: 186

Content-Type: text/html



<head><title>CGI Application Timeout</title></head>
<body><h1>CGI Timeout</h1>The specified CGI application exceeded the allowed time for processing. The server has deleted the process.
Return code is not 200
ERROR: Primenet error 1

// The next two communication attempts failed "gracefully" and quickly.

-- // A bug with the prime.log where it sometimes doesn't put the date/time separater
host = mersenne.org, port = 80
IP-addr = 216.120.70.80
Error in connect call: 10061
// I think this also gave error 2250 in the program window

--
host = mersenne.org, port = 80
IP-addr = 216.120.70.80
Error in connect call: 10061
// I think this also gave error 2250 in the program window

Hopefully this helps George prevent the program from locking up in the future. Would it possible to have Stop/Exit kill the communication attempt no matter what, so we don't have to kill the prime process manually ( which can lose a little bit of work )?

Oh, and after the sequence from my "prime.log" above, I exited and started Prime95 again. When I tried to connect again, it looks like it is getting stuck for 15 minutes all over again.
NickGlover is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Prime95, V28.10 Build1 x64, Lockup Stopping workers dkemppai Software 14 2017-05-17 00:26
Assertion failure in 6.4.2 bsquared GMP-ECM 4 2013-03-01 15:52
LA Failure R.D. Silverman NFSNET Discussion 10 2007-05-23 21:53
Failure Functioins Unregistered Miscellaneous Math 0 2004-02-12 11:51
New Server Hardware and price quotes, Funding the server Angular PrimeNet 32 2002-12-09 01:12

All times are UTC. The time now is 14:31.

Fri Oct 30 14:31:59 UTC 2020 up 50 days, 11:42, 1 user, load averages: 1.77, 1.98, 1.96

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.