mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > PrimeNet

View Poll Results: Should we buy a backup server?
Yes 12 40.00%
No 18 60.00%
Voters: 30. You may not vote on this poll

Reply
 
Thread Tools
Old 2003-11-29, 21:22   #12
Old man PrimeNet
 
Old man PrimeNet's Avatar
 
Jan 2003
Altitude>12,500 MSL

101 Posts
Default

There is no need for a backup server. And the crashes had nothing whatsoever to do with client loading.

Here's why:

The PrimeNet/Mersenne.org server, hosted at a Level 3 datacenter (now independently of Entropia), can handle well over 50 URL hits/sec, probably much more. It's been through a lot of load testing to verify multi-threaded safety and capacities: Prime95 has special compile flags for this purpose, creating accounts and turning around dummy factoring and LL results instantly to enable dozens of parallel instances to hammer hard at the server.

The current box is a high end dual-CPU Dell series server with redundant power supplies, redundant controllers and network, and a RAID 5 disk array - about $7000 when I bought it new, and literally about 20x stronger than PrimeNet needs for client loading. Even the hourly database reports drive only 60% capacity utilization for a few minutes. Here's a photo of PrimeNet's rack just prior to its Level 3 installation:
http://mersenne.org/primenet/primenet.jpg

As far as non-Prime95 clients go, PrimeNet v4 already supports networked Linux and Unix MPrime versions, as well as OS/2 clients. The server's manual testing web forms support an even broader variety of Mac and other clients - manual testing only because the folks working on those clients were unwilling or unable to perform the requisite work to add the necessary minimal network protocol support to talk to PrimeNet, a non-trivial effort. A second major challenge was security, because the v4 server was designed for a trusted-binary client model, not an open-source client model having transaction throttling/braking controls to block rogue or malicious clients - a situation that unfortunately happens too often, and will be addressed anew in the v5 design.

Having 'failover' servers is a small part of the v5 implementation planning. The purpose of a failover server is mainly for geographically distributed disaster risk management, not necessarily fault-tolerance if the 'main' server goes offline. While failover capability sounds desirable, the added complexity of synchronizing mulitple servers for real-time client cutovers will in all likelihood be a big headache that the new v5 team will not immediately want to face.

Remember, the client software is quite intentionally designed to reconnect and synchronize automatically if the server is for any reason unavailable -- and to keep busy in the interim period. The fact that the work units in GIMPS are nominally days if not weeks long makes this strategy particularly convenient.

PrimeNet has been operating for nearly 7 years; the degree of concern expressed lately about it seems quite out of proportion with circumstance, and the amount of proposed technical infrastructure is far in excess of what GIMPS requires or is likely to require for the foreseeable future. Let's just watch those new server status icons and see ...
Old man PrimeNet is offline   Reply With Quote
Old 2003-11-29, 23:17   #13
jeff8765
 
jeff8765's Avatar
 
Aug 2002
A Dyson Sphere

32·7 Posts
Default

The main reason that I am concerned right now is the possiblity that the server will be down at the exact moment that the new prime hits the news. If the server is down at that time thousands of people may come to the site, see that the server is down, and leave thinking that it is always like that. I agree that a second server is unnecessary, but is there any way to make the server less likely to crash for about one day after the new prime is announced?
jeff8765 is offline   Reply With Quote
Old 2003-11-30, 00:22   #14
Old man PrimeNet
 
Old man PrimeNet's Avatar
 
Jan 2003
Altitude>12,500 MSL

101 Posts
Default

Hey, I said I was rolling up my sleeves with my toolkit... ! I made two fixes to the server Thursday AM. Several of the 'outages' you may have seen Wed-Fri were caused by my debugger breakpoints trapping.

I've been holding off proclaiming success as I don't know the previous MTBF (mean time between failures) and several multiples of that interval are necessary to acquire confidence of a correct fix. Nonetheless, the server has been running the fixed code for a few days without incident.

The evidence suggests we don't need to worry about downtime when the M40 press brings new folks in over the next month or so. But I'm waiting for the Monday client traffic to see what happens - not for the load, but rather a broader variety of clients and transaction conditions.

Feels like old times, but long ago PrimeNet went through much greater growing pains while membership grew with discoveries of M37, M38 and M39. Having been through this several times before I am quite at ease about GIMPS growing and the v4 system holding up just fine.
Old man PrimeNet is offline   Reply With Quote
Old 2003-11-30, 17:08   #15
clowns789
 
clowns789's Avatar
 
Jun 2003
The Computer

23×72 Posts
Default

I was thinking we could use the extra server in conjunction with the regular one. Consider the attached file. In the background, there are a lot of servers, but they are racked and all do the same task as opposed to two different servers. In other words, it would be like Prime95 where one computer calculates for everyone else in a circle.
Attached Thumbnails
Click image for larger version

Name:	cairofn8.jpg
Views:	405
Size:	16.0 KB
ID:	75  
clowns789 is offline   Reply With Quote
Old 2003-11-30, 17:43   #16
GP2
 
GP2's Avatar
 
Sep 2003

5·11·47 Posts
Default

Quote:
Originally posted by Old man PrimeNet
Hey, I said I was rolling up my sleeves with my toolkit... ! I made two fixes to the server Thursday AM. Several of the 'outages' you may have seen Wed-Fri were caused by my debugger breakpoints trapping.
Scott, you're doing great work and you know we all appreciate it.

I have one minor suggestion: currently summary.txt is generated "in place", which means that every hour between N:00 and N:04 the file is usually nonexistent or truncated. How about generating it under a different temporary name, and only when complete rename the new file to summary.txt?
GP2 is offline   Reply With Quote
Old 2003-11-30, 20:12   #17
PrimeCruncher
 
PrimeCruncher's Avatar
 
Sep 2003
Borg HQ, Delta Quadrant

2×33×13 Posts
Default

Quote:
Originally posted by clowns789
I was thinking we could use the extra server in conjunction with the regular one. Consider the attached file. In the background, there are a lot of servers, but they are racked and all do the same task as opposed to two different servers. In other words, it would be like Prime95 where one computer calculates for everyone else in a circle.
That's one of the problems that was discussed in that 9-page long thread. We can have multiple servers running BUT how can we be certain that they aren't giving out the same assignments, updating each other regularly and correctly, etc, etc. Replication might be worked into the v5 design but it won't be easy.
PrimeCruncher is offline   Reply With Quote
Old 2003-12-04, 23:43   #18
QuintLeo
 
QuintLeo's Avatar
 
Oct 2002
Lost in the hills of Iowa

26×7 Posts
Default

One master server, the rest "proxy" servers, is a model that works well for Distributed.Net - the master assigns "subblocks" of work to each subserver, and the individual subservers only assign work out of their specific subblocks.

Given that the Primenet server HAS seen outages in the 1-2 week range, and that the *default* in Prime95 is to only keep "30 days" work on hand - which is ONE EXPONENT for some LL work - I'd say that either the Prime default needs to go up some (60-90 days) for clients doing LL work, or the server needs to get more redundant.

I personally have had cases where one of my machines finished an exponent, started on a second exponent, found a factor in less than a day in the TF stage, and needed a 3'd exponent - all in less than a 24 hour period. If I had left those machines at the 30 day default, they WOULD have run out of work - and one of those cases happened during a PrimeNet outage of some days length.


My view might be skewed somewhat by the fact that all the LL work I do is on 10,000,000+ exponents - I don't know how long the current "leading edge" LL tests would take on any of my machines.
QuintLeo is offline   Reply With Quote
Old 2003-12-05, 01:03   #19
Old man PrimeNet
 
Old man PrimeNet's Avatar
 
Jan 2003
Altitude>12,500 MSL

10110 Posts
Default

There will be no more such outages, barring a natural disaster.

The server's design will be revealed in the v5 Server Development forum next month.
Old man PrimeNet is offline   Reply With Quote
Old 2003-12-05, 01:39   #20
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

32·5·107 Posts
Default



WHEEEE!!!


That's what I like of this forum!

Luigi
ET_ is offline   Reply With Quote
Old 2003-12-05, 02:03   #21
Complex33
 
Complex33's Avatar
 
Aug 2002
Texas

5×31 Posts
Default

The anticipation is killing me me so excited

A second Christmas

Last fiddled with by Complex33 on 2003-12-05 at 02:04
Complex33 is offline   Reply With Quote
Old 2003-12-05, 19:46   #22
PrimeCruncher
 
PrimeCruncher's Avatar
 
Sep 2003
Borg HQ, Delta Quadrant

10101111102 Posts
Default

Quote:
Originally posted by Complex33
The anticipation is killing me me so excited
I second that motion! Bring on the v5 server!
PrimeCruncher is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Need backup solution advice jasong jasong 17 2013-04-22 03:30
Please recommend a backup solution for my computer jasong jasong 3 2013-01-05 09:24
Anyone using a cloud backup/sharing solution? petrw1 Lounge 9 2012-04-18 15:17
PrimeNet Database backup? Dubslow PrimeNet 26 2011-12-20 03:39
Backup Files Unregistered Information & Answers 1 2008-05-30 03:30

All times are UTC. The time now is 16:18.


Fri Jul 16 16:18:10 UTC 2021 up 49 days, 14:05, 1 user, load averages: 1.77, 1.71, 1.78

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.