![]() |
|
|
#144 |
|
Jan 2003
Altitude>12,500 MSL
1458 Posts |
GIMPS has had proxy servers you can download & install anywhere since 1999. http://mersenne.org/ips/proxy.html
BTW a significant reason for occasional PrimeNet's availability is that all of its traffic routes through one of these PrimeNet proxy agents running on the www.entropia.com web server, forwarding transactions to the real PrimeNet server. Sometimes updating the web site ends up with the proxy offline. It's one of the issues I tabled with Entropia, and hope to address soon. Stay tuned. That the GIMPS clients point to "entropia.com" is an early design decision I made to ensure IT folks analyzing traffic understood the connections were only going to a single, trusted place - a popular concern of high profile back in 1998... . Managing security & IT impacts is an important requirement for anyone building a grid system. What are the main utilities for a proxy? Pragmatically, they are: (a) aggregating network traffic (b) caching & managing intermediate work state (c) additional security checkpoint (d) decentralization of risk/control My observation about the current entropia web server PrimeNet proxy illustrates the point that a proxy represents an additional system dependency, and therefor is an operational risk. If the proxy goes down its dependent machines are blocked unless they have 'plan B' connectivity pre-configured. Generally, distributed systems are best designed to minimize the number of things that can go wrong or jam up. Add statefulness to proxies and you multiply your headaches exponentially Consider: - Prime95 already capably supports (b) independently (& invariantly) of a proxy, and, - even a stateful proxy needs to synchronize itself to a central state so we still have to deal with (re)connectivity & (re)syncing; And so my earlier reasoning went... as a result the current version of the free downloadable PrimeNet proxy agent is stateless, providing only (a) network aggregation, a request we received from the IT departments of many GIMPS-participant companies. Was there a compelling argument for anything other than a pass-thru proxy? I might have missed something. Assignment handling for GIMPS is somewhat trickier than the key ranges issued on d.net, I'm uncertain what we'd learn from that. Regarding (d), complete decentralization is neither possible nor desirable - GIMPS still has to drive the global inputs and outputs with nominal authority and security. However a few geographically distributed redundant, synchronized open-sourced PrimeNet servers with DNS-directed load-balancing is both interesting and achievable. Most of the synchronization challenges are readily tackeled easiest at the infrastructure level below the application design itself. I believe the redundant distributed open-source server approach holds the most promise as a next step for GIMPS. -sjk |
|
|
|
|
#145 |
|
Oct 2002
Lost in the hills of Iowa
26×7 Posts |
I ran a Distribute.Net personal proxy for a few years.
Mostly what I got out of it was a *single* mass keyrate figure - and a little more buffering for when their master server went down. It was needed a *lot* more with their projects, though - especially RC5. A semi-fast Athlon box could *easily* go through the contents of a *maxxed out* client buffer in 3 days, and they had the same connectivity problems at times as Entropia has had with the PrimeNet server. Prime clients can buffer a LOT more work - proxies aren't as needfull, though I suspect they're still very handy for use on a firewall box in some cases. Not saying that a proxy server group would be no help at all, though - especially if the Prime group grows fast enough.... |
|
|
|
|
#146 | ||
|
Oct 2002
25 Posts |
Quote:
With dnet, each project consist of finite number of blocks, neither server nor proxies track expiration date or ETA for the particular blocks, and finding *the* block proves that other blocks are 'empty' (in case of cryptography project)., or might be processed quite more effectively (in case of Golomb rulers). With GIMPS, there is infinite number of exponents but lower exponents are significantly simpler to test; so server tracks expiration dates to be capable of reassigning exponents before GIMPS reaches infinity :) , finding Mersenne prime gives no advantage in regard of testing other exponents, including smaller exponents (at least not until next great math discovery). So, with dnet proxies do not prevent you from taking the block from one proxy but returning it to another. With GIMPS, all exponents assigned by a particular proxy would become tied to it. It still might benefit few particular computer farms, but generally it does not sound as attractive or GIMPS, so it's not of high priority. Also, not clear what to do if a particular proxy dies but GIMPS clients under it survive. This might be resovled somehow, but I see no elegant solution (except making backups). Quote:
|
||
|
|
|
|
#147 | |
|
P90 years forever!
Aug 2002
Yeehaw, FL
7,537 Posts |
Quote:
It seems to support only a one master / many slaves configuration. All updates must be sent to the Master. It does briefly mention setting up a circular master-slave relationship along with some warnings about being very careful ordering your updates and inserts. In the master / slave relationship we still have the single point of failure problem. In the circular relationship it looks like updates won't be propogated uniformly unless every server is on-line! Am I missing something? |
|
|
|
|
|
#148 |
|
Jan 2003
Altitude>12,500 MSL
1458 Posts |
Still catching up on this thread.
A spooling message queue might be needed to exert enough explicit control over synchronization. Might provide other opportunities, such as different database platforms at various servers, or visibility into spooled messages for debugging. |
|
|
|
|
#149 |
|
Oct 2002
3210 Posts |
While MySQL replication is a great thing for backing database up, and has some other nifty applications (for example, spreading heavy queries along slaves, as mentioned in MySQL documentation), I think GIMPS could use more flexible model.
The ideas is to replicate not SQL update statements, but use higher level operations (messages, events, business transactions), for example: 'computer abcd sent new completion date of 23 Jun 2009 for exponent M12345' 'the exponent M34567 that was earlier in common pool is now assigned to server 5 for allocation' and even: 'I'm server 4, I'm getting short on safely assignable exponents, please hand me out few if you can' 'I'm server 3, handing out exponent M56789 which was previously allocated to my assignable oool to the server 4' As can be seen, there will be few messages that will require some 'master' capabilities at one of the server to throw new exponents in. But the 'master' server just need to equally spread them among servers, and then servers will communicate between each other if somehow servers face non-even traffic. To make this part of the interserver communication algorithm scenario more complete: if a server runs out of safely assignable exponents (got private assignable pool small/empty), and for some reason other servers don't donate exponents fast enough, and a client comes and requests assignment, server just hands off a random unassigned exponent from some other's server assignable pool, and immediatelly generates a message like: 'hey server 3, I just took your exponent M54321, please forgive me. All servers, please reflect the change' Hopefully the server 3 will not yet assign the exponent to someone else (so searvers should avoid stealing lowest exponents); if it just did it, that's not fatal, that will be go as LL test and double check run in parallel. And one more note: only exponents still assignable to clients should be assigned to servers. As soon as some server assigns the exponent to client, the exponent in the database loses its server-assignment record. From now on, all servers will be able to handle requests related to the exponents equally (as updating/returning exponent actions do not have potential for race condition). And one more note: if we assume that alive servers see each other, the messaging layer is pretty straightforward. But that is not requirement: if we ad a bitset 'seenby' field to each message (plus sequential message IDs assigned by originating server), then servers can route messages, and thus RAIS continues functioning even if there are less than n*(n-1)/2 links work. This might happen with remote servers on internet and last for dozens of minutes, I have experienced such behavious quite a few times. Indeed, problems like misconfigured router can last for weeks, that is not sufficient reason to let RAIS to fall aparts. After wording all that, I naturally expect that someone will tell 'implementing own non-SQL replication does not look good at all, it will involve coding an extra messaging layer'. In fact, using task-specific messages instead of SQL statements is very natural (for example, it allows servers to handle database layer slightly differently, might easily happen if we upgrade software currently - with loosely coordinated servers that might take a week or more), I estimate it will add just 20-50Kb of source code. Considering all the flexibility task-specific messages provide (and also, reduce of interserver network traffic!), 50Kb definitelly worth the typing efforts. |
|
|
|
|
#150 |
|
Oct 2002
25 Posts |
I see Scott today is quite faster than me. :) Also, I'm quite glad to see the very similar approach: as those were developed independently, unlikely there is a drastic double-thinko.
I wonder, what's the current status of the new generation primenet server at Entropia? Is it just in planning 'would-be-nice-to-eventually-have' stage, or maybe there are some specific things ready? |
|
|
|
|
#151 | |
|
P90 years forever!
Aug 2002
Yeehaw, FL
7,537 Posts |
Quote:
1) We wouldn't have to write the code, it would come pretested. 2) It makes it incrementally harder for other DC projects to use our open source software (this should definitely be a goal) One thing is for certain, we'd better debug the heck out of it. If these multiple server databases get out of sync we'd have a pretty big mess on our hands! It wouldn't hurt to have sanity check messages built in: "I'm server 1 and I think I'm up-to-date as of 12:15 Jan 19 2003. I have 100,000 LL results and 5,200 outstanding reservations, ..." Other servers could then verify that their databases match. If a discrepancy is found it they could then begin a dialogue to narrow down where the discrepancy is. |
|
|
|
|
|
#152 |
|
Jan 2003
2×32 Posts |
Sounds like the perfect application for SOAP-RPC style asyncronous webservice... any thoughts on that? It will allow us to send messages in a strongly typed and fault tolerant way and also enable us to leverage the huge body of code and experience of doing things this way (as compared to a hand rolled solution).
|
|
|
|
|
#153 |
|
Jan 2003
Altitude>12,500 MSL
101 Posts |
I was thinking along soap or XML lines, too.
More about redundant servers: - How big is N redundant servers? 3, 5, 7? - Why not evaluate MySQL replication? We will probably learn something useful even if we decide not to use it. What are representative test cases? - What kind of replicated servers? Simple succession-failover redundancy, or load-balanced w/failover? Has anyone done the top-down to-do bullet list to figure out all this stuff? I'm happy to kick it off if necessary. |
|
|
|
|
#154 | ||||
|
Oct 2002
3210 Posts |
Quote:
Quote:
I see no way how MySQL SQL-level replication could help. First, and most, a MySQL server can not have more than one master. That limits to only 2 servers, or a circle of servers and if the circle breaks, databases will quickly become assynchonized. Second, installation of MySQL replication requires whole access to MySQL engine. If we do not rely in that, then unprivileged login is sufficient. Third, MySQL slave requires a login at master with File privilegy. With decentralized servers, that is not secure at all. Forth, that will not always work. For example, if we keep a table of outstanding assignment, and GIMPS client first updates an exponent and 15 seconds later finishes testing and return result to a different server, then MySQL replication might lead to situation where slave attempts to update record that have been already deleted; that will shutdown slave thread stopping replication until human intervention. With custom messaging layer, that can be gracefully worked around by applying rule 'if the exponent is not assigned, then its update can be safely disregarded' or maybe 'tried later' (and incident logged, so that if situation repeats suspiciously too often, there was some stuff fordebugging). Fifth, MySQL server can log only one set of replication data. This precludes from using a shared MySQL server. And so on. Just believe me that for this task raw MySQL replication will cause more pain than use, and only limited solution is possible with raw MySQL replication. From the other side, MySQL replication is going to be a perfect thing to piggyback stats servers over core server(s); and also do realtime backups. Quote:
Quote:
My high-scale vision on the development stages is: 1) Decide which information stored online, and how long does it maintained there. This includes already tested exponents, returns history (to have information useful to estimate how fast and trustful computer is, which influences if small exponents get assigned to the computer) 2) Client-level operations allowed in network. This will/might include: team creation, user account creation (also, creation of anonymous account? Or maybe this should be better moved to client side?), setting/changing team for user, exponent reservation/assignment, exponent return/release, exponent return/result - or maybe 'exponent reservation for trial factoring' etc., reassigning exponent from computer to another computer (within the same user account). Also account alterations (updating email address, maybe joining accounts, etc.) The final list should include all parameter associated with each kind of operation, and their types, and also limitations - for example, I think it's reasonable to add a rule like 'if someone returns an exponent that was not previously assigner to the user, such result is accepted but considered as low-quality, and user who had exponent assigned continues it's testing. Results from computers that earlier returned wrong results are considered as low-quality Exponent is considered as tested only if it have 2 (3?) or more high-quality results' (and probably low-quality results can be throws away most often, as they might be forged or stolen) 3) Now, core part of the database might be written down. 4) Replication messages. This will include all from 2 (but now, the important thing is not lient experience but interserver communication that does not experience dangerous race conditions), plus server-specific messages like 'I'm low on assignable exponents, share some with me'. 5) Database structure from 3 is extended with tables that queue messages, and additonal fields to store things like server id who owns a particular exponent. 6) Write down algorithms used to process each message from 2 and 4. This indeed will be mostly done while stages 2 and 4 take place, but here we formalize everything - if a fault found, we return back to 2 or 4. 7) Decide on auxilary softwares and protocols used (for interserver messaging, etc). Also, agree on protocol modifications or extensions neccessary for GIMPS client-side software. 8) Code everything 9) Test everything 10) Test everything once again :) 11) Switch GIMPS to the new server(s). Current primenet server and new RAIS might coexist for prolonged period of time, there is no need switching all traffic at once. So if there are no corrections to this list, lets start with 1 (information online database should contain) - that's factually what I'm waiting for. Here yours and George's experience is of highest value. After the stage 1 done, we then can move further at increasing pace. |
||||
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Report of monitoring primenet server unavailability | Peter Nelson | PrimeNet | 13 | 2005-10-18 11:17 |
| Is Entropia in trouble? | ekugimps | PrimeNet | 1 | 2005-09-09 16:18 |
| mprime stalls if primenet server is unavailable :( | TheJudger | Software | 1 | 2005-04-02 17:08 |
| Primenet Server Oddity | xavion | PrimeNet | 28 | 2004-09-26 07:56 |
| PrimeNet server replacement | PrimeCruncher | PrimeNet | 10 | 2003-11-19 06:38 |