mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > PrimeNet

Closed Thread
 
Thread Tools
Old 2003-01-19, 22:31   #144
Old man PrimeNet
 
Old man PrimeNet's Avatar
 
Jan 2003
Altitude>12,500 MSL

101 Posts
Default proxies vs. distributed redundant servers

GIMPS has had proxy servers you can download & install anywhere since 1999. http://mersenne.org/ips/proxy.html

BTW a significant reason for occasional PrimeNet's availability is that all of its traffic routes through one of these PrimeNet proxy agents running on the www.entropia.com web server, forwarding transactions to the real PrimeNet server. Sometimes updating the web site ends up with the proxy offline. It's one of the issues I tabled with Entropia, and hope to address soon. Stay tuned.

That the GIMPS clients point to "entropia.com" is an early design decision I made to ensure IT folks analyzing traffic understood the connections were only going to a single, trusted place - a popular concern of high profile back in 1998... . Managing security & IT impacts is an important requirement for anyone building a grid system.

What are the main utilities for a proxy? Pragmatically, they are:
(a) aggregating network traffic
(b) caching & managing intermediate work state
(c) additional security checkpoint
(d) decentralization of risk/control

My observation about the current entropia web server PrimeNet proxy illustrates the point that a proxy represents an additional system dependency, and therefor is an operational risk. If the proxy goes down its dependent machines are blocked unless they have 'plan B' connectivity pre-configured. Generally, distributed systems are best designed to minimize the number of things that can go wrong or jam up. Add statefulness to proxies and you multiply your headaches exponentially

Consider:
- Prime95 already capably supports (b) independently (& invariantly) of a proxy, and,
- even a stateful proxy needs to synchronize itself to a central state so we still have to deal with (re)connectivity & (re)syncing;
And so my earlier reasoning went... as a result the current version of the free downloadable PrimeNet proxy agent is stateless, providing only (a) network aggregation, a request we received from the IT departments of many GIMPS-participant companies. Was there a compelling argument for anything other than a pass-thru proxy? I might have missed something. Assignment handling for GIMPS is somewhat trickier than the key ranges issued on d.net, I'm uncertain what we'd learn from that.

Regarding (d), complete decentralization is neither possible nor desirable - GIMPS still has to drive the global inputs and outputs with nominal authority and security.

However a few geographically distributed redundant, synchronized open-sourced PrimeNet servers with DNS-directed load-balancing is both interesting and achievable. Most of the synchronization challenges are readily tackeled easiest at the infrastructure level below the application design itself. I believe the redundant distributed open-source server approach holds the most promise as a next step for GIMPS. -sjk
Old man PrimeNet is offline  
Old 2003-01-19, 22:32   #145
QuintLeo
 
QuintLeo's Avatar
 
Oct 2002
Lost in the hills of Iowa

26·7 Posts
Default

I ran a Distribute.Net personal proxy for a few years.

Mostly what I got out of it was a *single* mass keyrate figure - and a little more buffering for when their master server went down.

It was needed a *lot* more with their projects, though - especially RC5. A semi-fast Athlon box could *easily* go through the contents of a *maxxed out* client buffer in 3 days, and they had the same connectivity problems at times as Entropia has had with the PrimeNet server.

Prime clients can buffer a LOT more work - proxies aren't as needfull, though I suspect they're still very handy for use on a firewall box in some cases.

Not saying that a proxy server group would be no help at all, though - especially if the Prime group grows fast enough....
QuintLeo is offline  
Old 2003-01-19, 22:56   #146
aga
 
Oct 2002

25 Posts
Default

Quote:
Originally Posted by Joe O
Don't knock proxies! Distributed Net used 18 - 21 proxies in addition to its master keyserver very effectively.
As have been already told, dnet is not GIMPS.

With dnet, each project consist of finite number of blocks, neither server nor proxies track expiration date or ETA for the particular blocks, and finding *the* block proves that other blocks are 'empty' (in case of cryptography project)., or might be processed quite more effectively (in case of Golomb rulers).

With GIMPS, there is infinite number of exponents but lower exponents are significantly simpler to test; so server tracks expiration dates to be capable of reassigning exponents before GIMPS reaches infinity :) , finding Mersenne prime gives no advantage in regard of testing other exponents, including smaller exponents (at least not until next great math discovery).

So, with dnet proxies do not prevent you from taking the block from one proxy but returning it to another. With GIMPS, all exponents assigned by a particular proxy would become tied to it. It still might benefit few particular computer farms, but generally it does not sound as attractive or GIMPS, so it's not of high priority.

Also, not clear what to do if a particular proxy dies but GIMPS clients under it survive. This might be resovled somehow, but I see no elegant solution (except making backups).

Quote:
Originally Posted by Joe O
They did also provided "personal proxies" for the "top producers" and teams. These were just stripped down versions of their proxies
During certain period I run dnetpp at a single computer, serving needs of the single computer only. :) And in fact, the 'main' proxies run exactly the same software as personal proxies, all the different is that main proxies owners know ip address of the master keyserver, and dnetc staff track master proxies avaialbility and exclude them from DNS as neccessary.
aga is offline  
Old 2003-01-20, 00:42   #147
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

7,537 Posts
Default

Quote:
Originally Posted by aga
I intend to completely avoid 'master server' and make servers truly symmetric.
I read the MySQL chapter on replication at http://www.mysql.com/documentation/mysql/bychapter/manual_MySQL_Database_Administration.html#Replication

It seems to support only a one master / many slaves configuration. All updates must be sent to the Master.

It does briefly mention setting up a circular master-slave relationship along with some warnings about being very careful ordering your updates and inserts.

In the master / slave relationship we still have the single point of failure problem. In the circular relationship it looks like updates won't be propogated uniformly unless every server is on-line!

Am I missing something?
Prime95 is online now  
Old 2003-01-20, 00:55   #148
Old man PrimeNet
 
Old man PrimeNet's Avatar
 
Jan 2003
Altitude>12,500 MSL

101 Posts
Default redundant servers

Still catching up on this thread.

A spooling message queue might be needed to exert enough explicit control over synchronization. Might provide other opportunities, such as different database platforms at various servers, or visibility into spooled messages for debugging.
Old man PrimeNet is offline  
Old 2003-01-20, 01:45   #149
aga
 
Oct 2002

25 Posts
Default

While MySQL replication is a great thing for backing database up, and has some other nifty applications (for example, spreading heavy queries along slaves, as mentioned in MySQL documentation), I think GIMPS could use more flexible model.

The ideas is to replicate not SQL update statements, but use higher level operations (messages, events, business transactions), for example:

'computer abcd sent new completion date of 23 Jun 2009 for exponent M12345'
'the exponent M34567 that was earlier in common pool is now assigned to server 5 for allocation'

and even:

'I'm server 4, I'm getting short on safely assignable exponents, please hand me out few if you can'
'I'm server 3, handing out exponent M56789 which was previously allocated to my assignable oool to the server 4'

As can be seen, there will be few messages that will require some 'master' capabilities at one of the server to throw new exponents in. But the 'master' server just need to equally spread them among servers, and then servers will communicate between each other if somehow servers face non-even traffic.

To make this part of the interserver communication algorithm scenario more complete: if a server runs out of safely assignable exponents (got private assignable pool small/empty), and for some reason other servers don't donate exponents fast enough, and a client comes and requests assignment, server just hands off a random unassigned exponent from some other's server assignable pool, and immediatelly generates a message like:

'hey server 3, I just took your exponent M54321, please forgive me. All servers, please reflect the change'

Hopefully the server 3 will not yet assign the exponent to someone else (so searvers should avoid stealing lowest exponents); if it just did it, that's not fatal, that will be go as LL test and double check run in parallel.

And one more note: only exponents still assignable to clients should be assigned to servers. As soon as some server assigns the exponent to client, the exponent in the database loses its server-assignment record. From now on, all servers will be able to handle requests related to the exponents equally (as updating/returning exponent actions do not have potential for race condition).

And one more note: if we assume that alive servers see each other, the messaging layer is pretty straightforward. But that is not requirement: if we ad a bitset 'seenby' field to each message (plus sequential message IDs assigned by originating server), then servers can route messages, and thus RAIS continues functioning even if there are less than n*(n-1)/2 links work. This might happen with remote servers on internet and last for dozens of minutes, I have experienced such behavious quite a few times. Indeed, problems like misconfigured router can last for weeks, that is not sufficient reason to let RAIS to fall aparts.

After wording all that, I naturally expect that someone will tell 'implementing own non-SQL replication does not look good at all, it will involve coding an extra messaging layer'. In fact, using task-specific messages instead of SQL statements is very natural (for example, it allows servers to handle database layer slightly differently, might easily happen if we upgrade software currently - with loosely coordinated servers that might take a week or more), I estimate it will add just 20-50Kb of source code. Considering all the flexibility task-specific messages provide (and also, reduce of interserver network traffic!), 50Kb definitelly worth the typing efforts.
aga is offline  
Old 2003-01-20, 01:53   #150
aga
 
Oct 2002

25 Posts
Default

I see Scott today is quite faster than me. :) Also, I'm quite glad to see the very similar approach: as those were developed independently, unlikely there is a drastic double-thinko.

I wonder, what's the current status of the new generation primenet server at Entropia? Is it just in planning 'would-be-nice-to-eventually-have' stage, or maybe there are some specific things ready?
aga is offline  
Old 2003-01-20, 02:55   #151
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

7,537 Posts
Default

Quote:
Originally Posted by aga
While MySQL replication is a great thing for backing database up, I think GIMPS could use more flexible model.

The ideas is to replicate not SQL update statements, but use higher level operations (messages, events, business transactions).
Nuts. I had hoped the database would do this work for us:

1) We wouldn't have to write the code, it would come pretested.
2) It makes it incrementally harder for other DC projects to use our open source software (this should definitely be a goal)

One thing is for certain, we'd better debug the heck out of it. If these multiple server databases get out of sync we'd have a pretty big mess on our hands! It wouldn't hurt to have sanity check messages built in:

"I'm server 1 and I think I'm up-to-date as of 12:15 Jan 19 2003. I have 100,000 LL results and 5,200 outstanding reservations, ..."

Other servers could then verify that their databases match. If a discrepancy is found it they could then begin a dialogue to narrow down where the discrepancy is.
Prime95 is online now  
Old 2003-01-20, 09:40   #152
barcode
 
Jan 2003

228 Posts
Default

Sounds like the perfect application for SOAP-RPC style asyncronous webservice... any thoughts on that? It will allow us to send messages in a strongly typed and fault tolerant way and also enable us to leverage the huge body of code and experience of doing things this way (as compared to a hand rolled solution).
barcode is offline  
Old 2003-01-20, 23:12   #153
Old man PrimeNet
 
Old man PrimeNet's Avatar
 
Jan 2003
Altitude>12,500 MSL

1458 Posts
Default ok now what?

I was thinking along soap or XML lines, too.

More about redundant servers:
- How big is N redundant servers? 3, 5, 7?
- Why not evaluate MySQL replication? We will probably learn something useful even if we decide not to use it. What are representative test cases?
- What kind of replicated servers? Simple succession-failover redundancy, or load-balanced w/failover?

Has anyone done the top-down to-do bullet list to figure out all this stuff? I'm happy to kick it off if necessary.
Old man PrimeNet is offline  
Old 2003-01-21, 01:27   #154
aga
 
Oct 2002

408 Posts
Default Re: ok now what?

Quote:
Originally Posted by Old man PrimeNet
- How big is N redundant servers? 3, 5, 7?
I was thinking that 2 to 5 should be ok, with 3 optimal. But data structures should allow up to 30 servers - that should cover any imaginable practical need (remember, stats servers might biggyback on core servers without being included into RAIS), but wll not increase database size by more than 4 bytes per record (message).

Quote:
Originally Posted by Old man PrimeNet
- Why not evaluate MySQL replication? We will probably learn something useful even if we decide not to use it. What are representative test cases?
Well, learning MySQL can not hurt at all. But instead evaluating it I prefer relying on my experience and knowledge.

I see no way how MySQL SQL-level replication could help. First, and most, a MySQL server can not have more than one master. That limits to only 2 servers, or a circle of servers and if the circle breaks, databases will quickly become assynchonized.

Second, installation of MySQL replication requires whole access to MySQL engine. If we do not rely in that, then unprivileged login is sufficient.

Third, MySQL slave requires a login at master with File privilegy. With decentralized servers, that is not secure at all.

Forth, that will not always work. For example, if we keep a table of outstanding assignment, and GIMPS client first updates an exponent and 15 seconds later finishes testing and return result to a different server, then MySQL replication might lead to situation where slave attempts to update record that have been already deleted; that will shutdown slave thread stopping replication until human intervention. With custom messaging layer, that can be gracefully worked around by applying rule 'if the exponent is not assigned, then its update can be safely disregarded' or maybe 'tried later' (and incident logged, so that if situation repeats suspiciously too often, there was some stuff fordebugging).

Fifth, MySQL server can log only one set of replication data. This precludes from using a shared MySQL server.

And so on. Just believe me that for this task raw MySQL replication will cause more pain than use, and only limited solution is possible with raw MySQL replication.

From the other side, MySQL replication is going to be a perfect thing to piggyback stats servers over core server(s); and also do realtime backups.

Quote:
Originally Posted by Old man PrimeNet
- What kind of replicated servers? Simple succession-failover redundancy, or load-balanced w/failover?
Load-balanced RAIS would be most interesting, and I believe most worthy thing to apply efforts to. As you told that reliability problems with current primenet server will be mostly solved soon, we have few extra months to develop top-notch solution, and not just resort to a minimalistic approach as randomly choosen low-level software permits.

Quote:
Originally Posted by Old man PrimeNet
Has anyone done the top-down to-do bullet list to figure out all this stuff? I'm happy to kick it off if necessary.
Mostly, in the forum you can find more questions than answers. Majority of people here prefer discussing best color of CPU cooler for GIMPS server rather then share opinion on questions like should database keep all GIMPS related data, or adopt the approach similar to the current primenet implementation when after exponent is tested, it's moved to another database maintained semi-manually offline. Things like that directly influence database structure, notwithstanding how many servers there will be and how they will communicate with each other.

My high-scale vision on the development stages is:

1) Decide which information stored online, and how long does it maintained there. This includes already tested exponents, returns history (to have information useful to estimate how fast and trustful computer is, which influences if small exponents get assigned to the computer)

2) Client-level operations allowed in network. This will/might include: team creation, user account creation (also, creation of anonymous account? Or maybe this should be better moved to client side?), setting/changing team for user, exponent reservation/assignment, exponent return/release, exponent return/result - or maybe 'exponent reservation for trial factoring' etc., reassigning exponent from computer to another computer (within the same user account). Also account alterations (updating email address, maybe joining accounts, etc.) The final list should include all parameter associated with each kind of operation, and their types, and also limitations - for example, I think it's reasonable to add a rule like 'if someone returns an exponent that was not previously assigner to the user, such result is accepted but considered as low-quality, and user who had exponent assigned continues it's testing. Results from computers that earlier returned wrong results are considered as low-quality Exponent is considered as tested only if it have 2 (3?) or more high-quality results' (and probably low-quality results can be throws away most often, as they might be forged or stolen)

3) Now, core part of the database might be written down.

4) Replication messages. This will include all from 2 (but now, the important thing is not lient experience but interserver communication that does not experience dangerous race conditions), plus server-specific messages like 'I'm low on assignable exponents, share some with me'.

5) Database structure from 3 is extended with tables that queue messages, and additonal fields to store things like server id who owns a particular exponent.

6) Write down algorithms used to process each message from 2 and 4. This indeed will be mostly done while stages 2 and 4 take place, but here we formalize everything - if a fault found, we return back to 2 or 4.

7) Decide on auxilary softwares and protocols used (for interserver messaging, etc). Also, agree on protocol modifications or extensions neccessary for GIMPS client-side software.

8) Code everything

9) Test everything

10) Test everything once again :)

11) Switch GIMPS to the new server(s). Current primenet server and new RAIS might coexist for prolonged period of time, there is no need switching all traffic at once.

So if there are no corrections to this list, lets start with 1 (information online database should contain) - that's factually what I'm waiting for. Here yours and George's experience is of highest value. After the stage 1 done, we then can move further at increasing pace.
aga is offline  
Closed Thread



Similar Threads
Thread Thread Starter Forum Replies Last Post
Report of monitoring primenet server unavailability Peter Nelson PrimeNet 13 2005-10-18 11:17
Is Entropia in trouble? ekugimps PrimeNet 1 2005-09-09 16:18
mprime stalls if primenet server is unavailable :( TheJudger Software 1 2005-04-02 17:08
Primenet Server Oddity xavion PrimeNet 28 2004-09-26 07:56
PrimeNet server replacement PrimeCruncher PrimeNet 10 2003-11-19 06:38

All times are UTC. The time now is 14:55.


Mon Aug 2 14:55:48 UTC 2021 up 10 days, 9:24, 0 users, load averages: 2.54, 3.01, 3.44

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.