![]() |
Replacing Entropia's primenet server
With the many server outages the last three weeks the question has been raised, why not have our own GIMPS server?
First off, some background. Scott Kurowski founded Entropia as a for-profit startup in the field of distributed computing. GIMPS was, in essence, his proof of concept to investors. In 1997, Primenet was launched and steadily improved in 1998. All this was done at no cost to GIMPS and they have been gracious enough to continue supporting Primenet ever since. Server availability has been pretty good during that time. Roughly one week a year we have an irritating outage that takes time to track down and resolve. Despite the recent outage GIMPSers have been quite appreciative of all Entropia has done for us rather than flaming about the lousy service. The Primenet server is on an NT machine. The database is SQLServer. Entropia owns all the server source code. The current situation has several disadvantages: 1) If Entropia folds or has a change in leadership, they may no longer support Primenet. Scott has assured me has the legal right and would be able to get a new primenet server up and running in short order. 2) I feel guilty asking a for-profit company that has been good to us to devote development dollars on new features that would improve the GIMPS experience. This includes teams with members having their own userids, P-1 as a separate work type, better charts and graphs, better synchronization or even better merging the master and primenet database, etc. 3) Brad is my point-of-contact. If he is out of town, it can be difficult getting him to nudge the proper folks to fix a primenet problem. 4) The server is 3000 miles away from me. Many of these problems wouldn't exist if we were in the same town. What would it take to run our own server? 1) A machine with enough bandwidth and proper up-time guarantee. 2) Money to run such a machine. 3) All server side software needs to be rewritten and improved and tested. 4) Existing accounts and stats need to be migrated to the new database. I did some preliminary investigative work last year. Linux / Apache / PostgreSQL / PHP seems to provide all the basic tools. I guess MySQL would work as I see they now support Commit / Rollback. This would be a substantial development effort. How have other distributed projects handled this? Would a community development effort be feasible? I'm sure there are some issues I've overlooked and welcome any comments on the subject. |
One more point - security. Any new server would need monitoring for DOS attacks, attempted or successful hackings, needed software upgrades, etc. I am not a cyber-security expert. My background is actually database programming.
|
How much bandwidth does the existing server use? What kind of machine is it currently being run on?
Colo cost and bandwidth are not cheap with Tier 1 providers. Plus the cost of the machine itself. The only main problem is cost I believe. Where would the money be generated from? Porting from NT to Linux isnt an easy task. Managing it is also totally different. For DOS attacks there really isnt much you can do on the server itself other than not responding to the individual packets, the provider needs to help out and block some of the traffic upstreams. |
Over the years, the Primenet server has proven to be both robust and stable. Reimplementing its functionality and maintaining a server are nontrivial tasks, to say the least. I'd like to know if there is anything that we can do to help keep it running, without reinventing the wheel. Unfortunately, I don't even know what direction to think in since there's so much I don't know. So... here are a few questions for you:
Are the issues related to failing hardware? If some new server hardware were to fall from heaven, would moving the software to a new machine fix the problem? If so, is moving the software even a realistic option? Would it be too labor intensive for Entropia to do for free? Is the problem related to excessive CPU/IO utilization? If a hardware upgrade is not a good answer (see above), would it be possible for volunteers to attempt to make speed improvements? I don't hold out much hope for this one due to IP issues and difficulty in simulating performance without access to the actual server, but it's worth asking whether tuning a few queries might alleviate the problem. Incidentally, I've often had an itch to see the code that seems to take so long to generate that status report. ;) Are the WWW pages being served by the same machine? Are they a significant resource drain? Would relocating this workload to a new server help alleviate things? Individual account reports probably could not be moved, but snapshot stats and anything static could be. This would also facilitate the maintenance/improvement of purely WWW content such as pretty charts, graphs, and other derivative statistics. Admittedly, these are all band-aids and leave some core issues (new functionality, dependence on proprietary software) unaddressed, but might they be worthwhile? Tofer |
[quote="xtreme2k"]How much bandwidth does the existing server use? What kind of machine is it currently being run on?[/quote]
a dual 800Mhz PIII with 36GB fast RAID 5 disks (18GB usable) and 256MB, on a T1 or DS3 (not sure which) with Win2000. |
I hope I didn't come across as harsh in the other thread. I was already familiar with the background and I appreciate everything Entropia has done for GIMPS. It would be difficult to set up a new system which would make an [i]overall[/i] improvement in what they are already doing. It would probably be a full-time job for someone to keep it running smoothly (or at least full-time on call).
If we want to try, I believe a community development effort would be feasible. Apparently a lot of people are willing to devote time &/or resources. But once it is developed it cannot be run "by committee". Also, besides startup costs there would be ongoing costs and we would need to ensure that George doesn't get stuck with those down the road. One thing I am curious about: would Scott be hurt, or relieved, if we were to move? |
Entropia
Hi all,
please do not forget that GIMPS has done as much for Entropia as they did for us, to say the least. What do you need when starting a company which wants to sell dc-power? Yes, you need at least one running project. And this is GIMPS. So where would Entropia be without GIMPS....? And I do not think that the problems started Dec 2001. I joined GIMPS Dec 1998 and there were more outages during that time. A friend of mine got a new P4 but if you use the manual way to get exponents from the server, it doesn´t know this type of processor. "Primenet news and information" is more than one year old and talking about V19 of PRIME95. Statistics charts..... March 2001.......... Regards Achim |
This server has MySQL and PHP and stuff, and is rated for 500MB a day... I don't know if that is enough, and I don't know what CPU load is required to run PrimeNet... The bandwidth here can be expanded...
That said, hosting the static web pages would be trivial... I have no clue about the database... I do have the capability of colocating a server for 100 bux a month... The hardware would cost around 1000 bux... But it would be Solaris based as this is all I know... The connection would be 1 megabit sustained... I'd be willing to donate a few hundred bux towards a solution like this... I can also host a box here at home for around the same cost... If you can get Brad to mail me the box I am willing to just plug it in if you want... It would be a dedicated 768Kb/s SDSL line... If someone needs a MySQL database to goof around with to try to fix this just let me know... Personally, the longer I run GIMPS the less I find a need for PrimeNet... Sure the stats are fun, but George keeps the master database anyways... My concern now isn't about getting points, it is just making sure I don't replicate work... I'm not saying PrimeNet is worthless... I'm just saying that I don't need it... This is just my personal opinion, so please don't take it the wrong way... :) |
[quote="Xyzzy"]
Personally, the longer I run GIMPS the less I find a need for PrimeNet... Sure the stats are fun, but George keeps the master database anyways... My concern now isn't about getting points, it is just making sure I don't replicate work... I'm not saying PrimeNet is worthless... I'm just saying that I don't need it... This is just my personal opinion, so please don't take it the wrong way... :)[/quote] Hi Mike! Unfortunately it seems to me that if we really want to attract initiates on our own project, we have to arrange a nice packaging, like a gift wrapping... Some interesting stats page or an updated status page, as Akim said, may help, I think. Think about all the graphics versions of some clients: you'll never be able to convince all the people to turn-down their screen savers 'cause they'r "stealing" cpu cycles we may use for other purposes... So no way but to give'em an inefficient client which is nevertheless better than nothing... Even more this is a math project, so it is aiming at ethereal and impalpable stuff like a strange kind of numbers, not like Seti, just for an example ;) |
Well, lets put together a plan and set up a CVS.
I am scared whenever I heard a mission critical server ran on MS Windows of any kind. It is good we can move away from there. Even if we dont change the hosting, switch to linux could save us a lot of outage, I suppose. Do we get to access the currnet Windows server code and/or documents? Startup costs is less of a concern. Those very capable rack mount dual AMD servers go for under 3 grands(we wont run prime95 on the server, will we?). If we dont generate results on the fly, we dont even need that much CPU power. The hosting, however, costs more for the long run. 500MB daily isnt enough. T1 should do. And we are going to hit that 10M digit prime tomorrow. That could fund us 20 times. :D |
[quote="Xyzzy"]I do have the capability of colocating a server for 100 bux a month... The hardware would cost around 1000 bux[/quote]
We're still in the early investigation stage. What would a home T1 line cost? Would a 100 bux/month colocating service agreement provide enough bandwidth and service? What are the downsides to colocation? What happens if their are hardware troubles on the colocated server? or if there is a botched software upgrade? Someone mentions using a home DSL line, what are the disadvantages of that? Finally, if we decide to do this it won't happen quickly, at least not until I move to my new house Jan. or Feb. next year. |
I talked to a friend in hosting business in the bay area. He told me the cost for 1/4 rack and 1Mbps committed bandwidth was about $500 per month. I asked if there was a price for a 1U unit since 1/4 is more than 15U. He said 1/4 rack was the smallest he knew. FastEther connection to backbone.
Reliablity of home DSL is hard to gurantee. My adsl had outage every other day for some 2 week period. Some dsl service may be better but still I doubt it is up to critical missions. I dont know much home T1 but I can say they are very rare very expensive. The market is small. Who needs 1.5Mbps to home? |
The colo I was talking about is a one megabit line (1024Kb/s) while a T1 is 1536Kb/s... It is right down the street from me, so if need be I could pop in and restart it... That said, my particular solution has LOM, which allows remote reboots and crap like that...
A home DSL line is fine but not 100% reliable... My line loses a few hours a month due to whatever... The advantage is you can IM/email/call me and I can hobble over to the server and kick it... I seriously doubt that you'd need anything beyond a 768Kb/s SDSL line for this... There are a few pages that are super big, but those could be trimmed or compressed to save bandwidth... |
COLO/Dedicated is going to be a must for such a server. xDSL/Cable can be a good net connection, but if you want the most stable you will most likely to need a colo.
With colo I personally think it is best to go with a Tier 1 provider for such purpose. It will cost more but if you dont use all that much data it doesnt cost THAT much more. Some cheaper providers connection can be very lousy and unstable. That was why I was asking earlier how much bandwidth the server was working, in terms of GB/month. |
[quote="Xyzzy"]The colo I was talking about is a one megabit line (1024Kb/s) while a T1 is 1536Kb/s... It is right down the street from me, so if need be I could pop in and restart it... That said, my particular solution has LOM, which allows remote reboots and crap like that...
A home DSL line is fine but not 100% reliable... My line loses a few hours a month due to whatever... The advantage is you can IM/email/call me and I can hobble over to the server and kick it... [/quote] ... We are waiting here so long for something better than ISDN... Maybe when you'll reach Mars we'll get a cable line!...It is just sour grapes... :( |
You need to move Guido :D and we definately must have a dedicated T1 or its ilk running the server. Frequent outages in a DC computing project will affect way too many people as there is always SOMEBODY trying to connect. If there's a serious move to relocate the server, do a professional job and create something that can expand, rather than someone's pet project. We already have an Uber version of that. :)
|
[quote="Deamiter"]You need to move Guido :D[/quote]
Think you're rigth... :( To tell the truth, there are some cables here as well, but just in the biggest cities like Milan or Rome: let's hope that sooner or later they reach even the country... Otherwise... Regards |
[quote]there are some cables here as well, but just in the biggest cities like Milan or Rome[/quote]
Guido what we really need is professionality of our Carriers... Our ADSL lines work like yours... 24/36 hours of outages a week. Like having none. :( Too bad we can't help to the upgrade from here... Luigi |
[quote="ET_"][quote]there are some cables here as well, but just in the biggest cities like Milan or Rome[/quote]
Guido what we really need is professionality of our Carriers... Our ADSL lines work like yours... 24/36 hours of outages a week. Like having none. :( Too bad we can't help to the upgrade from here... Luigi[/quote] Nice to know it enough! I've just ask Telecom for an ADSL line, 'cause it should be available soon here between the nothern mountains... I'll try to ask for some discount for each hour of outage... ;) Ciao Luigi! Saluti a Roma! Casomai passassi dalle parti del lago di Garda, fammi un fischio chè ti offro un caffè... (eng: Hi Luigi! If you have to come around here close to the Garda Lake, let me know: it will'be nice to have a cup of coffee together...) |
Is there any web server stats available for the PrimeNet server ? That would tell us the current traffic level and minimum requirements. Future levels can then be extrapolated.
Another issue is the server itself. You will have to evaluate the DB load, the WEB load and the scheduled task loads. I guess this is the area where the biggest improvement could be made. I personally guess this would be ideal for a university to host. They are usually very qualified technically, most of the have access to high speed networks and this is after all a research project worthy of their attention. In UK a dedicated 2M line (leased line or E1) is about 1200GBP per month. That type of connection is not cheap. BTW I run one of these from home :D . I could technically run such a server, but at the moment I do not have any dedicated server available. Alf |
Scott wrote to me a year ago (I'll ask Brad if he happens to know the current GB/month bandwidth needs):
The principle CPU requirement is to drive the database. The web server runs on the same box for little overhead, but the SQL reports often eat almost 100% CPU for minutes. RAM makes a big performance difference since SQL Server caches pretty much the entire set of tables and only the write-thrus do I/O (thus reports are mostly CPU/bus bound) Transactions average 2 to 3/sec for nominal (around the clock average)client load, with spikes of up to several dozen per second. Heavy loads are sometimes sustained for at most an hour or so due to default Prime95 retry timing if the server/DNS/network has been offline for more than an hour. If the server has been offline for several hours or more, it's important to have the capacity to handle the initial onslaught of overdue client connections, or you will never get the server caught up and back into its normal rhythm. Having a decent network pipe applies here, too. A DSL line would do most of the time, but it's important to have a T1+ or cablemodem speed line for the inevitable occasional heavy traffic load. |
I talked to Brad. It looks like mersenne.org uses 58MB bandwidth per day - most of it outbound. That's 1.7GB per month. I'd estimate that a revamped server with neat charts and more stats and backups over the Internet would at least double that bandwidth requirement.
|
My opinion is that just use a dedicated/colocated server with a Tier 1 provider instead of running a server on a Cable/DSL. The latter are not stable enough for 'mission critical' server use and cost more than colocating. Getting a dedicated connection (eg ATM) is too costly.
As I see the usage isnt all that high I am pretty sure you can get away with just a 50GB/month plan, which doesnt cost all that much even with a good ISP I believe. |
why not use distributed.net
i didn't see if this discussion came up, so forgive me if i repeat other people.
why not "transfer" this project to distributed.net? dnet has just finished rc5-64; what some people one this board have called "waste of cpu". dnet says they're looking for new projects and have to release a new client anyhow. sure, gimps runs only on x86 now, but adding in the old lucas code for other architectures would be easy (and the others would eventually be optimized). dnet has tons of clients, is an established tax-free org (needed if gimps wins the $100k anyhow), has a world-wide network of proxies, has a master server, a dba, some web guys, lots of donated bandwidth, gigs of scsi drives, and a statistics system. it has alot of the work done already, so why not ask for their help. two established projects merging for the better of both. the only non-plus that i see is the loss of "ownership" that the small gimps community has now. plus the prize money of $100k is a bit more than the RSA $10k, so the disbersement of the prize money might have to change, but george can negotiate with dnet for the optimal solution. opinions? -j |
Re: why not use distributed.net
[quote="veggiespam"]why not "transfer" this project to distributed.net?[/quote]
1. Culture clash. People running d.net generally don't have the patience to run LL tests taking weeks or months to complete; and most people from that crowd would be interested in the prize, but not so much in doing initial (or double) checks on exponents below 30M. 2. Architecture. The problems which d.net was designed to attack have lots of tiny blocks which are simply "done" or "not done". Computers are assigned many blocks at a time; and they inform the server when they are done, but don't pass any answer back. 3. Independence. This discussion arose partly out of a desire to make GIMPS independent of Entropia; going from being dependent upon Entropia to being dependent upon United Devices wouldn't solve that issue. (Yes, d.net is officially a separate organization; but with all their major people employed by United Devices, that's rather a sham IMHO.) While the above problems aren't insurmountable, I think they surpass the difficulty of setting up a separate server to coordinate GIMPS. |
Re: why not use distributed.net
[quote]1. Culture clash. People running d.net generally don't have the patience to run LL tests taking weeks or months to complete; and most people from that crowd would be interested in the prize, but not so much in doing initial (or double) checks on exponents below 30M.[/quote]
with dnet, you can turn off projects that you do not wish to run. so, those with out patience, would turn off mprime. if it took 5 years to solve the last problem, i don't think taking 3 months to complete a single "block" would be a leap in many people's minds - the novelty of checking your stats everyday wears off quickly. as for initial/double checks, dnet double checks now by sending the block to different people to prevent stat inflating fraud. it just hides this fact from the user. so, instead of actually telling users that they're doing a check, just tell them they're doing a value. the point of this checking for mprime is to see if a bad cpu said "not-prime" when in fact is was prime. so a double check has the ability to win the prize still. it is just a matter of convincing users or just not telling them all the information. [quote]2. Architecture. The problems which d.net was designed to attack have lots of tiny blocks which are simply "done" or "not done". Computers are assigned many blocks at a time; and they inform the server when they are done, but don't pass any answer back.[/quote] this is simply not true. dnet has many projects, one of which is OGR (what it is: [url]http://www.distributed.net/ogr/[/url]. with OGR, you basically take a permutation of a bunch of numbers, record the sum, permutate again, record again. in the end, you save the lowest value for your "branch" and send the results back. [quote]3. Independence. This discussion arose partly out of a desire to make GIMPS independent of Entropia ...[/quote] which arose due to a lack of reliability on entropia's servers. dnet's stat servers seem to have load problems once and a while, but stats are not escential. proxies may go down, but your client will just try another proxy. if the master goes down, you'll never notice, since you're connecting to a proxy. [quote]While the above problems aren't insurmountable, I think they surpass the difficulty of setting up a separate server to coordinate GIMPS.[/quote] we're talking about setting a whole infrastructure here, coordination between the dba, the web guy, a new network code layer, the server client, and the people we politely beg to give us free space inside of the hosting site. this sounds like more work. if gimps does not want "transfer" or ask for dnet's help, then let us not do it for the right reasons. i'm willing to help with an independent (non-dnet,non-entropia) gimps project, but i do think asking for dnet's help is the best avenue. -j |
Re: why not use distributed.net
[quote="veggiespam"]with dnet, you can turn off projects that you do not wish to run. so, those with out patience, would turn off mprime.[/quote]
Yes, but they'd probably turn it off *after* being assigned an exponent to test. [quote]as for initial/double checks, dnet double checks now by sending the block to different people to prevent stat inflating fraud. it just hides this fact from the user. so, instead of actually telling users that they're doing a check, just tell them they're doing a value. the point of this checking for mprime is to see if a bad cpu said "not-prime" when in fact is was prime. so a double check has the ability to win the prize still. it is just a matter of convincing users or just not telling them all the information.[/quote] First, I don't think d.net does double checks; if they do, that's something I wasn't aware of. Second, the odds of a double-check finding a prime are rather absurdly low. Third, I don't think "not telling them all the information" is a workable strategy when you're asking people to donate their spare cycles. [quote]OGR[/quote] I hadn't forgotten about OGR, but I had forgotten that they get useful (non-boolean) results back. [quote]we're talking about setting a whole infrastructure here, coordination between the dba, the web guy, a new network code layer, the server client, and the people we politely beg to give us free space inside of the hosting site. this sounds like more work.[/quote] I think people are overestimating the difficulty of setting up a server for something like this. When I ran PiHex, I had a "server" consisting of 200 lines of C running on my home PC -- a windows 95 box connected to the internet via my cable modem. And for a recent paper (Computational investigations of the Prouhet-Tarry-Escott problem) I had 300 machines querying a 50 line script which I was running in my university-provided webspace. While I don't suggest anything quite so crude for GIMPS, the point remains that distributed computing -- especially on such coarse-grained problems as GIMPS -- is *not* a hard thing to automate. |
As ebx (and Mr T) said: [b]We need a plan![/b]
I would like to suggest that we start by designing proxies. Teams/individuals could set up proxies that buffer work-to-do and results when primenet is down and communicate when primenet is up. These proxies could pass all the authentification information back and forth so that we would still have accountability. This would require a slight change to primenet to recognize and authenticate the proxies so that they could pass info for multiple machines. As far as checking out blocks, they could initially be checked out to the proxy as a machine, and then transferred to the individual machine by the proxy that would tell primenet what it had done. A second, parallel effort, would be to mirror the stats, either on George's site or on this one. Primenet could forward info to this new site, either as it comes in or periodically in batches. Everyone would be encouraged to use the mirror site, reducing the load on primenet. Once these two initiatives are complete, we could re-examine the possibility of moving. We would be in a much better position to do so, if we so chose. Even if we stayed, we would still be in a much better position. Joe. |
In difference to most other DC projects, GIMPS is to a certain extent dominated by universities. I think the we should appeal to one of those learned institution for help.
Apart from that; implemeting a cache - proxy server functionality is smart. This project should be able to run on medium bandwidth network. 58 or even 200MB a day is not all that much. I believe the problem is the server(s). The project needs to be built on a high performance plattform - specially if we are to introduce more advanced server supplied statistics. I could personally host a site like that from a network bandwidth point of view, but I am uncertain that I have a server with the necessary performance available. We need more data on what it is doing. Meanwhile I think it would be a good idea to collect, without prejudice, all ideas for future functionality. This is important, because this functionality is part of the reason people chose a project - client stability is another. After making up a long and impossible list we can calmly remove the ones that are either too expensive or too difficult to implement. That is the time someone can make a reasonable guestimate about real server requirements and also real network requirements. As far as I know there are already a few on the table. So let me start by listing them. Feel free to add as many as you can, nothing is stupid at this point. 1. Linux 2. Apache 3. PostgreSQL (or similar) 4. PHP (or similar) 5. Teams with members having their own userids 6. P-1 as a separate work type 7. Better charts and graphs 8. Better synchronization 9. Security (network - machine - os etc) 10. Queue , cache or proxy server functionality 11. Bandwidth 12. Seperate (own) server(s) - architectually probably better with 2, web and db 13. More updated (PrimeNet stats) 14. Graphical front-end to the client 15. Better support for team stats - see TPR as an example 16. Maybe 2 server locations (master and mirror) for availability reasons 17. DB server with lots of memory - performance issue 18. Raid, SCSI and all the paraphenelia of a high availability server solution 19. More stats (what does people want here, what is useful - list please) 20. ? PS: Remember we are dealing with 30-40.000 users and hopefully growing. This is not a system you run on a win95 box :rolleyes: Alf |
[quote="Prime Monster"]19. More stats (what does people want here, what is useful - list please)[/quote]
Ability to sort status report and cleared exponents report by various criteria. For example, here's a list of ones I run regularly that I'd love to see automated: Exponent status - sort by days run - sort by days to expired - sort by account ID - sort by assignment type (ie, F, D, D*, DF, *, " ") Cleared exponents - sort by date returned - sort by factor (for factored exponents) - sort by account ID By extension, the ability to sort by -any- column in either of the reports, either ascending or descending, would be nice. Even better would be the ability for the user to mix and match sorts, combined with the ability to specify ranges. For example, here's a sort I do daily on the exponent status list: - Find all exponents less than 7.1M - Merge with all exponents less than 12M whose assignment type is * or " " - Sort by days to expired This gives me a quick snapshot of which small exponents are expiring each day. Another one I do occasionally is a cross-check between the cleared exponents list and a slightly older version of the exponent status list, looking for recently-cleared exponents in which the cleared report account ID doesn't match the status report account ID. Good for finding poachers. |
Since there is a time to market issue and we will never be able to release a perfect server in the first shot, it is important for us to classify all the requirements. What are the must haves, what are the highly desires and what are the would be nices. I say phase 1 is a prototype to prove ideas. Phase 2 is at least as good as current server and start serving the cummunity. Phase 3 is high flying.
The platform is the easiest to decide. But we need to find a way to fund it. Next comes to the hosting. That can, again, be phased in. We dont need any big pipe for the developing even the beta stages. The server code is the most time consuming. If we are sure where we are heading, it is good to set up a battlebase(CVS, mailing list, etc) so we can hand out working items just like gimps itself. 58MB daily traffic surprised me by large. I was affraid that I myself along would generate this much. :D |
Would someone want to set up a sourceforge account, or should it be hosted privately?
Also, what about asking Team Prime Rib donate some of their stats code for a baseline or starting point? Andrew |
[quote="adpowers"]Would someone want to set up a sourceforge account, or should it be hosted privately?
[/quote] I guess not. This is different in the sense that not everyone would want to run the code on his/her basement machines. Will sourceforge host non GPL projects? A description/diagram of current server would save us a lot of effort. |
TNGG (The Next Generation of GIMPS)
Lets get back to basics here. How we do it and where we do it, is at the moment not important. I am very serious: I could quite easily handle the network bandwidth requirement as it is today - personally :mrgreen: . What I am trying to find out, is what we would like to see in The Next Generation of Gimps (TNGG). Talking about Source Forge (or any similar solution) is a bit premature at the moment. This is not to say that this type of solution will not be needed in the future.
[b]dswanson[/b] is on the right track: What would he like to see in TNGG, and he gives his answer. I am only asking what would you like to see in TNGG? I am not really concerned about feasability, or 1st, 2nd, 3rd or 4th generation functionality. If you do not make a well designed DB, then you can forget about g2, g3, g4 etc, so I assume that is done reasonably well. What I am asking about, is what do you want from TNGG. If we have or get a good database design, then it is a piece of cake to provide any report you can think of - the web programs might take a bit of time to implement, but that is another issue. Lets get the requirement and wishlist in place first, then we, George, PrimeNet, and other parties will have a chance to say / tell their opinions about the list. Then we desing and after that we implement. We need someone to volunteer to be "project manager" for the "discovery" process, ie for identifying and documenting all requirements. I suggest Xyzzy, not only because he runs this bulletin board and therefore is available, but also because I think and believe he would not jeopardize the project and would only think in terms of doing the best for TNGG. Alf / Prime Monster / heretic / etc :D |
[quote="Prime Monster"]PS: Remember we are dealing with 30-40.000 users and hopefully growing. This is not a system you run on a win95 box :rolleyes: [/quote]
I wouldn't want to see GIMPS using a win95 box as a server... but let's look at that "30-40,000" figure for a moment. That's not 40,000 permanent connections to the server; that's 40,000 boxes which connect to the server maybe once a week. That means an average of about [i]four hits per minute[/i]. phpBB tells me that this page took 0.574978 seconds to generate (which says something about the sluggishness of php, but let's not get sidetracked); if it takes more than half that time to assign an exponent to a machine, something is severely broken. Slashdot normally handles 25 hits per second with 10 machines, and their pages are far more complicated than anything GIMPS-server would be doing. As for talk of proxies... there are two purposes behind proxies, and both are unnecessary. The first is to handle load; but the load which GIMPS would take is easily small enough for a single box to handle. The second is to hide transient server failures; a much better solution is to simply have prime95 not complain as much if it can't contact the server. It's been a week since George has posted here; perhaps we should put off our discussions until he clarifies further what he wants to do? |
[quote="cperciva"]As for talk of proxies... there are two purposes behind proxies, and both are unnecessary. The first is to handle load; but the load which GIMPS would take is easily small enough for a single box to handle. The second is to hide transient server failures; a much better solution is to simply have prime95 not complain as much if it can't contact the server.
It's been a week since George has posted here; perhaps we should put off our discussions until he clarifies further what he wants to do?[/quote] What I want to do??? I want my house construction nightmare to end! Seriously, this discussion has been useful. I had not considered designing in proxies. I'm not sure how that would work or be implemented. It is true that one server can handle the current load and small outages are a nuisance. But wouldn't proxies serve as insurance against catastrophic failure? A serious hardware failure could take a few days to repair. A hurricane could knock out power for a week. Also, redundancy helps keep our data safe. Are these advantages worth the headache of solving several technical hurdles? I don't know. I agree with previous posters that there are two major hurdles to solve. One is cost. I think enough interest has been expressed here that the roughly $100 / month cost could be handled. Even an initial server outlay may not be a problem. The second hurdle is design and implementation. Coming up with a grand plan and implementing in stages seems reasonable. Has xyzzy expressed interest in coordinating a planning document? |
Re: why not use distributed.net
[quote="cperciva"]why not "transfer" this project to distributed.net?[/quote]
> 1. Culture clash. People running d.net generally don't have the patience to run LL tests taking weeks or months to complete Some might not. A lot would - and they've got easily 10 TIMES as many users at any given point. A LOT of us d.net folks have been active in d.net for 3+ years - over 5 in my case. > 2. Architecture. The problems which d.net was designed to attack have lots of tiny blocks which are simply "done" or "not done". Not true for OGR. OGR does in fact pass an answer back - the length and designation of the smallest ruler found in a given "block". The answers that Prime passes back shouldn't be any longer - I think they would be shorter. The data Primes pulls FROM the server is definitely shorter, though not a lot. The assigned "problems" are a lot shorter, I grant. > 3. Independence. This discussion arose partly out of a desire to make GIMPS independent of Entropia; going from being dependent upon Entropia to being dependent upon United Devices distributed.net IS NOT PART OF UNITED DEVICES. They do have quite a few (NOT a majority) of the major "names" working for UD. Doesn't invalidate the rest of your point here, though - it *is* still a change of dependency, not going "independent". |
With only 58 MB per day of bandwidth (what does this spike to on heavy days, like last year when M39 was discovered?), I was thinking this would be trivial to host on a $30/month account at pair.com (this forum is hosted there), but when George said the reports can gobble 100% CPU for several minutes that went out the window. You can't do longrunning CPU intensive processes on their shared machines. You'd need the services of one of their dedicated servers, the cheapest being $250/month. The only problem is that's a lot of money for a lot more bang than the project needs in terms of bandwidth (1 GB per day on the lowest level).
But it might be possible to share that with others. For instance, I need to find hosting for my daughter's school, and I've made inquiries with pair.com. In terms of space and bandwidth, we also need about what we can get from one of those $30 accounts, but the restrictions on what you can do with the shared machines (to ensure you don't step on everybody else's toes) make that solution less than desirable. We don't need CPU, we need scripting flexibility for small mailing lists and we want to run a newsserver, so cohabiting with a site that hogs the CPU for several minutes once an hour is not onerous as long as the web site stays responsive. I myself would like to open up a small site for myself, I'd probably go for the $18 a month option. If we get enough people that want a share of a dedicated server, it could fly. |
I must point out that if someone impliments good stats served from the, well, server :) then the bandwidth usage will probably shoot through the roof.
|
[quote="trif"]In terms of space and bandwidth, we also need about what we can get from one of those $30 accounts, but the restrictions on what you can do with the shared machines (to ensure you don't step on everybody else's toes) make that solution less than desirable. We don't need CPU, we need scripting flexibility for small mailing lists and we want to run a newsserver, so cohabiting with a site that hogs the CPU for several minutes once an hour is not onerous as long as the web site stays responsive. I myself would like to open up a small site for myself, I'd probably go for the $18 a month option. If we get enough people that want a share of a dedicated server, it could fly.[/quote]
All of TPR, this BBS, and my many other sites are all on one of Pair's $30 a month plan... Just the other day all of my stuff used 1.6GB in one day and the server just motored on... The only thing I do not get is CPU time... Remember when this BBS had those awful times pulling pages? That was because someone else on the shared server was nuking it with a runaway process... The server will kill it eventually, but in the meantime, the runaway process kills interactive performance... And you know human nature requires that if it don't work the first time, we must click the button again... :rolleyes: Personally, for a project like the stats, I'd prefer to have a bit more control over the system... Pair is nice, but they severely limit what I can do... I'm only here for the uptime... |
[quote="Prime95"]Has xyzzy expressed interest in coordinating a planning document?[/quote]
I have absolutely no experience with doing that, but that never stopped me before... :shock: I would suggest that if there is someone more qualified than me then they should do this... But I am willing to do whatever you all want me to... (I suppose I might have to find out what a "planning document" is!) :) |
[quote="Prime95"]Seriously, this discussion has been useful. I had not considered designing in proxies. I'm not sure how that would work or be implemented. It is true that one server can handle the current load and small outages are a nuisance. But wouldn't proxies serve as insurance against catastrophic failure? A serious hardware failure could take a few days to repair. A hurricane could knock out power for a week.[/quote]
If that is a real concern, I'd suggest getting a dedicated server from somewhere like rackspace.com; given that they advertise 99.999% reliability (and have, at least in the past, provided it), I think getting a server from them would mean that hardware issues are practically solved. (Of course, it would be more expensive on a monthly basis; on the other hand, it wouldn't require the initial purchase of a server.) But I don't really think it is a real concern. I always kept a couple weeks of work queued up; as long as any downtime doesn't exceed the duration for which people have their machines queuing work, there should be no damage (apart from perhaps scaring people with error messages). |
Xyzzy,
If you need any help :D I will be willing to help. This is the easy phase, just document what people come up with of ideas and requirements. It might be a good idea to post something to the mailing list about this as well. Next phase is to turn all the ideas and requirements into a design - specification document and that is somewhat harder. Not difficult, but you have to be very cynical here, because what you are producing should be possible to implement. As for the web-server itself; with the current load it can run nearly anywhere. It is the database server that is the difficult part. It will need to be a high-performance server and as far as I know most service companies charge a large amount of dosh for those (if they provide them at all). As Lumly pointed out; If the server starts to provide better stats, then the load will go up quite a lot. But again those stats will, to a large extent, be provided by the database server. Alf |
If that is a real concern, I'd suggest getting a dedicated server from somewhere like rackspace.com; given that they advertise 99.999% reliability (and have, at least in the past, provided it), I think getting a server from them would mean that hardware issues are practically solved. (Of course, it would be more expensive on a monthly basis; on the other hand, it wouldn't require the initial purchase of a server.)
[/quote] I cannot recommend Rackspace. They harbor spammers who pay a hefty premium, and as such a considerable amount of their IP space is in SPEWS (a public list of spam sources which many sites, including some US government sites, use to block spam with), and their entire IP space in many private blocking lists. GIMPS does need to send out mail now and then, even if the mailing list isn't moved to the new hosting (and I don't see why not). A site that can't deliver a large portion of that mail is much less valuable. This is one of the reasons why I recommended pair.com, as they are similarly reliable, and don't take pink contracts from spammers, so their IP's are not blocked. However, I've looked more closely at Pair's dedicated servers, and they don't allow multiple clients to share one dedicated server, and they don't allow "resale" out of their dedicated servers. But I still think this would be the way to go, we just need to find a reliable, nonspamfriendly provider. |
[quote="trif"]I cannot recommend Rackspace. They harbor spammers who pay a hefty premium, and as such a considerable amount of their IP space is in SPEWS (a public list of spam sources which many sites, including some US government sites, use to block spam with), and their entire IP space in many private blocking lists.[/quote]
Hmm, I hadn't heard about that. Rackspace *used* to be good... |
About joining forces with distributed.net
I don't believe GIMPS should team up with distributed net, at least not right now. And why? Because of what happened to the OGR effort.
This was a relatively small DC effort, with work being handed out and returned through web forms, in a system requiring human attention on both ends. In 1999, the project was growing to a size which required a different solution. In July that year, OGR-23 became the last work finished by the old project. Then distributed.net stepped in. And what happened next? Let us see what the d.net website says: "Feb 15, 2000 OGR-24 contest is started, but is later suspended due to unexpected problems. July 13, 2000 OGR-24 is officially relaunched! August 1, 2000 The first pass of OGR-24 is completed and distribution of OGR-25 begins. The second verification pass of OGR-24 is done gradually in parallel with the first pass of OGR-25." That is the last piece of information on OGR in the "History" section of their site. OGR-24 was supposed to be finished in just a few weeks. But to this day, no final result has come out of the distributed.net effort. The last news on the subject came about a week ago (note that this is not an exact quote, but it captures the spirit of the announcement pretty well): "Yes, there is still a problem with the current client (that everyone is still running), but we think we may be able to fix it, but don't expect anything to happen for a while because we want to get RC5-72 running first." I would hate to see GIMPS getting "swallowed and killed" by distributed.net. I don't mean to bash the d.net staff. I know they are volunteers and have made great efforts for the good of their projects. But the d.net organization clearly lacks the resources necessary to take on any more tasks right now. I understand that GIMPS is in a different situation than OGR. It has George taking great care of the client, and it has a working quality assurance effort. Therefore it is not certain that a GIMPS-d.net cooperation would be a fiasco like OGR, but I think the OGR experience is scary enough to make us want to stay away from d.net until all other solutions have been tried. |
I agree GIMPS should not be part of d.net. However, we should look at all their publicly available code and features. They certainly have some good features we should mimc.
One more item for the wish list: Priority assignments. As mentioned in another thread we could give out the 1000 lowest double-checks and first-time tests to computers that have completed at least one assignment and running on fast enough computers. This helps the milestones move faster. This is a political hot-potato as it smacks of favoritism, so I'd have to tread carefully before really implementing it. |
[quote]This is a political hot-potato as it smacks of favoritism, so I'd have to tread carefully before really implementing it.[/quote]
If this is really a problem, then make assignments prioritized by the most work returned in the last 30 or 60 days. Assign the most critical, read oldest, work to those computers that have turned in the most work. This is not favoriticism due to the fact you know the oldest work will get done and it NEEDS to get done. |
One other lesson to learn from DNet:
Have the stats server separate from the DataBase server. An outage on one will not kill the other. An overload on one will not kill the other.
Before someone says "We don't need this, one box is big enough!" let me say "Just because it fits on one box, doesn't mean it should run on one box!" |
I'm not sure if it would be worth doing it on two seperate boxes. I think the web server takes up minimal bandwidth. We're going to have to have a high speed connection for the database server, so we might as well put stats there too. Anyways, if the database were to go down, the stats wouldn't be updated, and useless to the most people checking stats (those of us who have look every hour for updates).
As for priority exponents, just get a few volunteers to do them. When they expire, George could have the server hold them, the volunteers would be notified of any new assignments through e-mail, and they would start crunching on it as soon as they could put it into their box's w2d. It wouldn't take many people, seeing as there aren't a vast number of small exponents expiring daily. Like oulnder said, it wouldn't be favoritism, it's just making sure what needs to be done gets done. I think a more dicey issue would be poaching when there are only a few exponents left below M38, and the people are going too slow for our liking. That's why I think we should do priority exponents, just to make sure a slow person doesn't get luckily assigned an important exponent. I know one person got lucky and reserved 20 exponents below 8M, but have yet to check in, and it's been more than a month :evil: . I might poach some of the latter ones they reserved. They're so far down, even if they do check in at some point, there's no way they've started to work on it (which in my book is a legal poach). |
[quote="Kevin"]We're going to have to have a high speed connection for the database server[/quote]
The Database server in itself will not need a high speed connection, what it will need is high (or large enough) processing speed. Most of the processing power will probably be used to work out stats. The webserver will server the stats not work it out. You could even operate with two distinct databases; 1 for assignments and updates, and of for the stats engine. The later is the one with the hard job and IMO should be run as a separate machine. Alf |
[quote="Kevin"]As for priority exponents, just get a few volunteers to do them. When they expire, George could have the server hold them, the volunteers would be notified of any new assignments through e-mail, and they would start crunching on it as soon as they could put it into their box's w2d.[/quote]
I could easily set up 1 or more "private" forums here for the purpose of assigning these exponents... Email is nice, but I feel that a forum is faster and easier to read... If your email is broken this still works... It also allows one message to be directed towards several participants easily... We could also make a private forum for the server committee... |
The forum idea would work too. I was just thinking that people would check e-mail more often than the forums (at least I hope they would). Depending on how things progress, a new forum area for planning the new server could be helpful. There would be thread for each "issue" so they could be tracked and worked on seperately instead of having them all jumbled in this one big thread.
|
[quote="Prime95"]
One more item for the wish list: Priority assignments. [/quote] We can associate each machine some additional fields to indicate the speed and reliability from work turned in the (recent) past. Some one has to figure out an exact formula. Assignments are easy with that data. |
[quote="ebx"]
We can associate each machine some additional fields to indicate the speed and reliability from work turned in the (recent) past. Some one has to figure out an exact formula. [/quote] ...and don't forget that some people can elude it with the "vacation" option used too often. Luigi |
[quote="ET_"]
...and don't forget that some people can elude it with the "vacation" option used too often. Luigi[/quote] That can easily be taken care. We can even add a client option that say no priority assignment for me please. We need to change the client a bit so the priority assignments can preempt the todo list - as soon as the current task is done, priority assignment is up. Some example criteria for calculation could be - at least 10 P90 years reported in the past 90 days, the more the better. (all cpu slower than 40x P90 are dropped. hosts younger than 90 days are dropped. fast cpu that dont report as expected are dropped). - one priority assignment each node - vacation hosts are dropped - current task finish date The beauty is that once the rules are set, all is automatic. |
Are these "priority exponents" ones that need factoring, or need LLing?
Or both? |
[quote="QuintLeo"]Are these "priority exponents" ones that need factoring, or need LLing?[/quote]
LL testing and double-checking. If the smallest available exponents in these categories are assigned to "trusted" clients, then you should get a more orderly progression of milestones (the "all exponents belox X have been tested once" and "all exponents below Y have been dchked"). At present, the server just blindly hands out small exponents to the next user to ask for an exponent of that work type. |
By "priority exponents" it is meant exponents below a given milestone. For example, the approx 130 exponents still to be double-checked in order to prove that M38 *is really* M38. If they are in the hands of slow or careless testers, the time needed to achieve this milestone may get unnecessarily long.
|
[quote="lycorn"]By "priority exponents" it is meant exponents below a given milestone. [/quote]
As for double-checking, lower, important exponents are often automagically given to slower or less reliable machines... Must be Murphy. :( Luigi |
Slower and less reliable machines are probably not going for the 10 million exponent prize.
I don't think it would be practical for me to swap my K5 and K6 machines over to doublechecking - they're REAL slow on floating point - but I figure that having them dedicated to factoring should still help the situation. |
woah that was random... I don't think there's any worries that these slow machines will tie up the winning 10M exponent here.
Still, it'd be a good idea to give out the smallest exponent to more reliable, or at least faster machines. It would be extremely simple (as I see it) to limit the "priority exponents" to machines over 500 MHz or something. T'hen at least you'd have a better chance of getting them done (and you wouldn't have to go to all the trouble of determining reliability) |
[quote="Deamiter"]It would be extremely simple (as I see it) to limit the "priority exponents" to machines over 500 MHz or something. T'hen at least you'd have a better chance of getting them done (and you wouldn't have to go to all the trouble of determining reliability)[/quote]
I know primers that installed Prime95 on PCs at office, working 8 hours a day 5 days a week, and did not modify the "Hours per day this program will run" item: So if they had a 750 MHz Pentium III they would reserve DC exponents (or even LL) with an actual 178.5 MHz Pentium III efficiency... Luigi |
[quote="Kevin"] :evil: . I might poach some of the latter ones they reserved. They're so far down, even if they do check in at some point, there's no way they've started to work on it (which in my book is a legal poach).[/quote]
I'm suprized that no one picked up on Kevin's comment. [b]Is poaching condoned?[/b] Joe O. aka JMO |
[quote="Joe O"][quote="Kevin"] :evil: . I might poach some of the latter ones they reserved. They're so far down, even if they do check in at some point, there's no way they've started to work on it (which in my book is a legal poach).[/quote]
I'm suprized that no one picked up on Kevin's comment. [b]Is poaching condoned?[/b] [/quote] Not that I know of, but it isn't much frowned upon either as long as it is kept within reasonable bounds. Obviously there has to be a point where a "reserved" exponent is tested by someone else, just to keep things moving along smoothly. |
So the fact that a person could be working on an exponent and someone else poaches it [b] is not frowned upon?[/b]
I am setting up a Dual Celeron/450 to run PRIME95. It has no internet connection. I put exponents from my internet box into the nonet worktodo.ini and start on them. Lo and behold 2 of them are taken from me. One by the server, the other possibly by the server, possibly by a poacher. The CPU time I have had wasted for me bothers me. [b]Especially if it were a poacher![/b] This is theft! And it is not frowned upon? |
When I posted that, I was under the impression that no-net boxes would use the internet form, and get work from George, not from the server. I didn't realize that people would use one computer to have Primenet reserve exponents for a bunch of them. I never condone poaching (anymore), and didn't even bother with it then. My concern before was with making sure that all the one's below certain milestones were completed, and that problem is being dealt with on its own.
|
[quote="Joe O"]So the fact that a person could be working on an exponent and someone else poaches it [b] is not frowned upon?[/b][/quote]
Poaching is definitely frowned upon. You'll see this from a couple of heated discussions on the Mersenne mailing list the last two years. In version 16, prime95 did not check in every 28 days. Thus, a user on a slow computer could get an exponent and tell the server it would take 2 years. If the user gave up, the server would not recycle the exponent for 2 years + 60 days!! Some users frustrated by this poached. Now that the client must check in every 28 days, there is no need to poach. The server will know in a timely manner if an exponent is not being worked on. I cannot stop poaching. It can happen either willfully or through ignorance. Even a very slow Pentium can complete a double-check in four months, so as long as your computer is running prime95 24 hours a day you should rarely be the victim of a poacher. |
[quote="Prime95"]I cannot stop poaching. It can happen either willfully or through ignorance. [/quote]
[b]Yes, you can![/b] Don't give credit to a poacher. You already subtract credit if an LL is shown to be wrong. Subtract the credit for a poacher. If they find a factor remove their name from it. If they find a prime, do what the do in the university for a plagarizer. Don't give them credit. Make it a published policy, part of the "terms of agreement" for using Prime95. Publish the names of the flagrant offendors. The stocks worked! Joe O. |
We can discourage poaching, but we can't stop it.
I agree CPU credit should not be given. Perhaps the new server can keep track of each users poaching activity and email them or cancel their accounts. I like the idea of some verbiage in the license. We must also be careful to differentiate between a malicious poacher and the occasional result submitted after a reservation expires. With money on the line if a prime is found, we'll need to tread extra carefully to avoid trouble in today's litigious society. At present there is a clause saying GIMPS is not responsible if someone finds a prime and you had the exponent reserved (that is, the poacher gets the credit/money). This needs to be discussed further, I hereby appoint you the person to make sure we don't drop the ball on this issue when the new server is coded. |
[code:1]We must also be careful to differentiate between a malicious poacher and the occasional result submitted after a reservation expires.
[/code:1] George has a point there. Sometimes poaching is inadvertant. As in, for some reason my box didn't update with the server and the exponent assigned to it expired while I was working on it. Then it got assigned to you. But I returned a result before you did. So poaching is not completely avoidable. But as George mentions there are ways to discourage out and out poachers. Those of us who have been looking at the milestones closely, such as dswanson, trif, daran and myself know the folks who still regularly poach and can name them and have evidence in the form of status reports and results. But you also have to admit that sometimes you lose exponents inadvertantly too. |
I thought I had made it clear, that I thought not all "poaching" was deliberate. Rereading my posts I see that I have not done so. Yes, mistakes, glitches, "burps" happen. That is one reason that I said "only flagrant poachers names should be published." There is no reason to embarass someone for something that they didn't mean to do, or didn't really do because the server had problems.
The problem, as I see it, is that George needs an automated mechanism (which he can overide, if he so chooses) to detect poachers and deny them CPU credit. If this can be done in a way that allows the first timers to get credit, then all the better. If this can not be done, and everyone who reports on an unreserved exponent is denied credit, then there will have to be an appeal process. Hopefully, this will not happen too often, and George can handle the few emails that will ensue. Does anyone already have an automated way of checking for "unreserved results"? This has to be an automated easy process so as not to burden George any more than he already is. If no one has such a process, is anyone willing to design one? If not, I'm willing to give it a go. |
I really doubt that a poacher is poaching to get credit for CPU years, so penalizing them by denying them credit might not be effective... I used to really get riled up about poaching, but it doesn't bother me much anymore... If you stick with ordinary exponents and get the work done in a reasonable amount of time, you have nothing to worry about... If you are "manipulating" the system to get low exponents to put onto slow boxes, you just might have problems... I'm not saying you can't use slow boxes for GIMPS, just that maybe a double check in the regular range, though more time-consuming, might be your best bet...
|
[quote="Xyzzy"]If you are "manipulating" the system to get low exponents to put onto slow boxes, you just might have problems....[/quote]
I'm not "manipulating the system". I'm just trying to do trial factoring on a nonet box that is actually faster than the net box (AMD K6III/400) that I used to get the assignments, 2 1/4 times faster as a matter of fact. There is no other way to get trial factors. I don't think that the nonet box is fast enough for double checks. It is a dual Celeron/450. I am doing double checks on my other net box a PIII/500 and that is slow enough for me, thank you. When I finish the 2 exponents that it has, I will probably go to trial factors on it as well. [quote="Xyzzy"]I'm not saying you can't use slow boxes for GIMPS[/quote] I hope not. Joe. |
Joe-
I didn't mean you in particular... :) The majority of poachings occur when people pull recently released exponents at 0601UTC... Since these exponents are smaller, they are a bit more desirable for older computers... But sometimes these exponents are "hot", in that they are part of a few that need to be finished up for a milestone to get completed, so they are frequently poached... |
Since garo mentioned my name, I thought I'd chip in my $0.02. Without naming names, I've spotted three different poaching strategies in the low doublechecks (which I qualify as any exponent less than (probable) M38, 6972593).
Poacher 1 keeps a close eye on the days-to-expired value in the status report, and poaches only those numbers that are within a couple of days of expiring. This poacher returns the result usually within a day or two before the exponent expires, but occasionally misses and turns it in a bit after it expires (I once lost half a P4-day of work to this when a result was returned for my newly-assigned exponent :( ). This strategy stretches the rules a bit, but probably does little harm because the exponents most likely would expire anyway. It also prevents small exponents from rolling over to someone else who may either complete them slowly or not at all. Because of this strategy, no exponent less than 7M has actually expired in the past month or so. So those of us who specialize in grabbing and completing small exponents as they expire have been out of luck. Poacher 2 also watches the status report, but poaches those numbers that have been assigned for a long period of time (days-run usually in excess of 300) and still have a long time to go (days-to-go usually in excess of 200). This strategy is less ethical, but I can certainly see the temptation if your goal is to prove M38 as quickly as possible. Poacher(s) 3 simply grabs whatever looks interesting. I recently saw one exponent which the assignee had been working on for several hundred days (and reporting in regularly with slow but steady progress) get poached when it was within 7 days of being completed. This is the kind of behavior that can rapidly drive legitimate participants away from a DC project :evil: . Another observation is that the poaching is escalating as we get closer to proving M38. Poacher 1 started in midsummer when we still had 300+ exponents to go. Poacher 2 was very busy in September, but lately seems to have run out of numbers that meet his criteria. Poacher(s) 3 have been at work for the past few weeks. I expect that once M38 is proven, the poaching will drop way off. I haven't been watching the low first-times as closely, so I don't know if there is similar behavior going on there. I suspect not, since we're not really close to any significant milestone there. |
DSwanson, Thank you for your very informative post. I hesitate to disagree with you but I must to some extent. All poaching does harm, all poaching is unethical, all poaching is cheating, all poaching will drive legitimate participants away!
People are leaving SETI because of the widespread cheating. Many are stating that the only thing worse than the cheating is the fact that the sponsors refuse to do anything about it! I get very worried when I read [quote]I used to really get riled up about poaching, but it doesn't bother me much anymore.[/quote] Complacancy is the first step on the road to disaster. The next step is "Well everyone else is doing it, why shouldn't I do it too!" [quote="dswanson"] This strategy stretches the rules a bit, but probably does little harm because the exponents most likely would expire anyway. This strategy is less ethical, but I can certainly see the temptation if your goal is to prove M38 as quickly as possible. This is the kind of behavior that can rapidly drive legitimate participants away from a DC project :evil: . Another observation is that the poaching is escalating as we get closer to proving M38. [/quote] Again, by poaching I mean the deliberate stealing of an exponent assigned to someone else. I do not mean the cases where the server "burps" and doubly assigns work, or any other circumstances where someone inadvertantly works on an exponent assigned to someone else. Joe. |
Not giving credit for poaching would deter some poachers, but definitely not others. But the server should always give credit if you have legitimately been assigned an exponent anytime in the past year. So for the database folks, the server should keep track of the assignment history of an exponent. Probably the thing that would stop almost all poaching is if they got an email from George saying, "Please don't poach." :D It could even be automated, you return an exponent you were never assigned, and the server fires off an email. There also needs to be in place a mechanism for handling trailing edge assignments that will not complete in a reasonable amount of time, such as contacting the person assigned the exponent. I think that most of these poachers think that if they don't poach then an exponent with a projected completion of 300 days out will just sit there holding up a milestone.
Other things that could be done is to make poaching useless or at least appear to. If someone turns in a result for an exponent not assigned to them, it doesn't show up in the Completed Exponents report, and it disappears from the AER only if the computer that holds it checks in and reports that no work has been done on it (it is removed from the worktodo, no work has been wasted). If it subsequently expires, of course, it is removed and not handed back out. George could write a strong license for the use of Prime95 and the PrimeNet server (including the report pages). Coupled with measures designed to prevent exponents from holding things up, poaching could be eliminated. But one thing that I have noticed is that diligent measures to work off exponents on the trailing edge just exposes exponents farther up to poaching. If the trailing edge of doublechecks right now was 7.5M, there would still be poachers with fast P4's who would decide on their own that six months was too long to complete a doublecheck. One problem right now is machines that aren't on very often, but whoever installed Prime95 didn't change the "this machine runs 24 hours a day" setting. Prime95 is fairly good at adjusting the completion estimate when a machine is slower than usual due to CPU load and speed, but it doesn't adjust completion estimates at all if Prime95 isn't being run, or the computer is completely off. This leads to situations with machines reporting microprogress for a year or more, and having an estimated completion date of a few days forever. This is not so bad, but when these machines get down to less than the default 10 days expected completion, they go out and reserve another exponent, and that exponent will be out for the months/years it takes to complete the first assignment, and the time taken to complete the second assignment. The responsibility for adjusting requested assignments based on "do the work the makes the most sense" and time expected to completion is given to the client. I'd like to suggest that the new server code take that responsibility. That would also solve the problem of old clients still using the old breakpoints for LL/DC/TF and requesting work that will take a very long time to complete. Poaching definitely hurts GIMPS. And something definitely needs to be done about it. |
[quote="trif"]So for the database folks, the server should keep track of the assignment history of an exponent. [/quote]
I agree, it should be kept with the logs. (or can it be mined from the logs?) The one danger would be database bloating -- you'd want to clear the info once the exponent had been completely processed.[quote="trif"] ... but it doesn't adjust completion estimates at all if Prime95 isn't being run, or the computer is completely off.[/quote] When prime95 starts up, it should compare the clock to the last timestamp of the most frequently updated file. It could add this 'dead' time to a total that gets reduced/eliminated whenever the server is contacted. Prime95 could use this total to adjust its estimates. |
Let them poach, but don't give them credit. Give the credit to whoever the exponent was assigned to.
This will either stop the poaching, since the poacher does not get credit, and it will give the person the exponent was assigned to the credit, thus negating any negativity that person may feel. |
[quote="Maybeso"]When prime95 starts up, it should compare the clock to the last timestamp of the most frequently updated file. It could add this 'dead' time to a total that gets reduced/eliminated whenever the server is contacted. Prime95 could use this total to adjust its estimates.[/quote]
I can see the attraction, but think it would still be necessary to allow more advanced users (for want of a better name) to enter data about hours per day etc. Past performance is not always an accurate guide to future performance. I know when a computer is likely to be down for a day or two, and can adjust the hours per day to give an accurate estimate of the average uptime. The computer doesn't and can't. Similarly, I know when "real life" work demands are likely to significantly reduce the CPU cycles available to Prime95 - the computer doesn't. On a separate subject, but still in this this thread, I disliked ebx's idea on allocating priority or 'juicy' assignments: [quote="ebx"]Some example criteria for calculation could be - at least 10 P90 years reported in the past 90 days[/quote] This would knock out a lot of users with just one or two fast boxes. Whether or not I can deal efficiently and quickly with an assignment does not depend on my total output in the last 90 days or any other period. It depends on the particular machine I run the assignment on and on my character/attitude. |
[quote="Ian_H"][quote="Maybeso"]When prime95 starts up, it should compare the clock to the last timestamp of the most frequently updated file. It could add this 'dead' time to a total that gets reduced/eliminated whenever the server is contacted. Prime95 could use this total to adjust its estimates.[/quote]
I can see the attraction, but think it would still be necessary to allow more advanced users (for want of a better name) to enter data about hours per day etc. Past performance is not always an accurate guide to future performance.[/quote] Prime95 does what maybeso suggests. The more your machine is off, the more your RollingAverage in local.ini decreases. Prime95 already does what Ian suggests too! Whenever you change the Hours-per-day value the RollingAverage is reset to 1000. |
[quote="Ian_H"]On a separate subject, but still in this this thread, I disliked ebx's idea on allocating priority or 'juicy' assignments:
[quote="ebx"]Some example criteria for calculation could be - at least 10 P90 years reported in the past 90 days[/quote][/quote] I agree. A priority assignment should go to any computer that will complete it in a reasonable time period (say 3 months) and said computer has already turned in a result. We want to give everyone with a decent computer and a proven track record an equal shot at these assignments. I also think a new server should expire exponent all throughout the day - making it more difficult for folks to log in at a specific time to get expiring exponents. Note that I don't think there is anything unethical about the users currently doing this. The Primenet server rules are well known and these users are using intelligence and persistence to get exponents they prefer. All users have this opportunity. |
[quote="Ian_H"][quote="Maybeso"]On a separate subject, but still in this this thread, I disliked ebx's idea on allocating priority or 'juicy' assignments:
[quote="ebx"]Some example criteria for calculation could be - at least 10 P90 years reported in the past 90 days[/quote] This would knock out a lot of users with just one or two fast boxes. Whether or not I can deal efficiently and quickly with an assignment does not depend on my total output in the last 90 days or any other period. It depends on the particular machine I run the assignment on and on my character/attitude.[/quote] Just to make the point clear. As I said, those were example numbers. If 5 years or 2 years is better, go for it. The numbers are for the box not the user account, by the way. True one can move assignments around but thats beyond control. Whats so good about running a priority assignment on my box? Why would I want it if I know my machine is slow? The key is to assign those numbers to creditible boxes without human interference. Reporting consistantly, predictiably is the way to build credit. Meanwhile, those more capable machines should be given more work. |
[quote="ebx"]As I said, those were example numbers. If 5 years or 2 years is better, go for it. The numbers are for the box not the user account, by the way.[/quote]
I know I can look this up, and I will, but what kind of box delivers 10 P90 years in 90 days? Not my P4 1.7 running 24/7, and I'd say that, at the moment, that's a machine that's good enough to be allowed to take a priority assignment. You say that the figure could be 5 years or 2 years. I understand where you're coming from. But I still say that reliability may be best assessed by other measures. For example: how long has a user been around? How many numbers has she/he tested and returned. Priority assignments may need to be allocated to decent boxes, but they also need to go places we know they'll return from. |
A lot of machines can do 40 P90 years per year, or 10 per 90 days. P4 1.7A should be real close if not already.
All measures we think good could be incorporated. But recent history carries more weight if I am to implement the rules. |
[quote="Ian_H"][quote="ebx"]
You say that the figure could be 5 years or 2 years. I understand where you're coming from. But I still say that reliability may be best assessed by other measures. For example: how long has a user been around? How many numbers has she/he tested and returned. Priority assignments may need to be allocated to decent boxes, but they also need to go places we know they'll return from.[/quote] As George has said, they'd have to have completed an exponent in the past 'n' months or whatever, but seriously, with a fast machine, and at the lower milestones (we're routinely testing numbers MUCH higher than 7M) speed should be a higher priority than reliability, as any (Athlon) box over 1.7 GHz will complete the current doublechecks in under 3 days... unless I'm missing something... With stats like that, the few power crunchers that do DC will eat through any milestone with ease! |
[quote="ebx"]A lot of machines can do 40 P90 years per year, or 10 per 90 days. P4 1.7A should be real close if not already.[/quote]
There must be something wrong with my arithmetic or my PCs - I'm getting nowhere near that. [quote="ebx"]All measures we think good could be incorporated. But recent history carries more weight if I am to implement the rules.[/quote] An overdependence on one aspect of history, measured mechanically, would slow things down by excluding good participants with good machines. Suppose you get a new Pentium V 4.5GHz box next year: should you be expected to run it for 90 days before 'qualifying' for choice assignments? I'd rather trust you, based on a broader view of your reliability. |
[quote="Prime95"]Prime95 does what maybeso suggests. [...] Prime95 already does what Ian suggests too![/quote]
Neat ;) |
[quote="Ian_H"]Suppose you get a new Pentium V 4.5GHz box next year: should you be expected to run it for 90 days before 'qualifying' for choice assignments? I'd rather trust you, based on a broader view of your reliability.[/quote]
4.5G will be sweet. What precentage of work fits this bill? 1%? 5%? If we have 5% of boxes does the job, I dont see it as a problem any more. Excluding one or two fast machines wont hurt. Beside, the different is 2x or 3x at max when comparing P4 to P4, not 60x when we talk about Pentium. |
[quote="ebx"]4.5G will be sweet.[/quote]
... it will be here sooner than we expect, along with SSE3, Prime95 v25 and complaints that double-checking everything below 33M is taking too long ;) [quote="ebx"]Beside, the different is 2x or 3x at max when comparing P4 to P4, not 60x when we talk about Pentium.[/quote] Fair point. And I must apologise for my arithmetic, which was the cause of my underestimating my P4 1.7's performance - in my haste I'd failed to include 6+ P90 years credited for one high number tested. |
[quote="Prime95"]Prime95 does what maybeso suggests. The more your machine is off, the more your RollingAverage in local.ini decreases.
[/quote] Actually, unless this has been changed in version 22, it doesn't. I normally run factoring under my own account. I installed a second copy of Prime95 in a separate directory in order to participate in the doublecheck gauntlet that TPR had going. My factoring copy didn't run for a month. When I fired it back up after the month was over, my rolling average remained as high as ever. Prime95 compensates for adjustments in how much CPU it is getting when it is running, but if it isn't running, it doesn't adjust. |
Could it be that your computer was running but not the program? It sounded like George had it set to monitor CPU on time rather than just the amount of time that the program is running but I can't be sure.
|
[quote="Deamiter"]Could it be that your computer was running but not the program? It sounded like George had it set to monitor CPU on time rather than just the amount of time that the program is running but I can't be sure.[/quote]
Yes, but I also had a laptop that wasn't on for a few weeks, and there has been no adjustment of the rolling average. I had also installed Prime95 on my dad's computer, he only uses it sporadically. I didn't bother to try to estimate how many hours per day it would be on (it was going to be factoring anyway), because I figured Prime95 would estimate that better than I could. But it never adjusted, it was compensating for the CPU time that my dad was using for other things, but not for the time it was off. |
Ok, back to the poaching business...
[b]Trif [/b]wrote: [quote]There also needs to be in place a mechanism for handling trailing edge assignments that will not complete in a reasonable amount of time, such as contacting the person assigned the exponent. I think that most of these poachers think that if they don't poach then an exponent with a projected completion of 300 days out will just sit there holding up a milestone. [/quote] I completely agree with this statement. I am not in favour of poaching, neither have I ever done it, but I understand that it may be irresistible for some people to do so, instead of just sitting there watching a milestone sliding away due to Sxxxxxxx/Cxxxxxxx participants that may even have forgotten they are part of the GIMPS project. So I think those people should be contacted, in a polite manner, stating that they are holding a milestone back, and encouraging them to either release the exponent, or to make their best at speeding up the testing of at least that particular exponent. This strategy is obviously not 100% effective, but I think it would help. It could also be defined a maximum period of time, depending on the exponent range, that any user would be allowed to retain the exponent, regardless of reporting regularly or not. The user should be warned, say 1 month before that date, that the exponent was about to be reassigned to someone else. This should however be stated in the Terms and Conditions for using the program. But I think that definitely something has to be done to help moving up the trailing edge of the project while avoiding unethical poaching, which I agree is harmful to DC initiatives. |
I know that people get attached to thier exponents, but wouldn't it be feasible to just make sure that if you are properly assigned an exponent and someone else returns the result (regardless of why they return it or how they got it) and you finish it properly, you get credit for it?
I guess with first time tests, the prize money might be an issue, but again, why not officially credit the first person to be properly assigned and properly complete an exponent? If someone wants to run an exponent that is holding up a milestone, they are going to do it anyways. Almost always thier goal is to speed up the milestone. If the server gets a good result then the project can move on, but whenever the person who was properly assigned the exponent submits it, they get full credit for it. In a sense, they might feel like thier work was wasted, that is the only downside I can see. However, people running these exponents and taking years are most likely not people who are highly active in GIMPS. They are people who fired it up and have left it running but forgotten about it, or people who reserved the exponent then stopped running GIMPS. As it is, if the exponent is poached, they REALLY end up with nothing, so the above mentioned option would actually improve thier situation. I am not a GIMPS maniac though, so I might be missing something vital. |
[quote="Prime95"]
Seriously, this discussion has been useful. I had not considered designing in proxies. I'm not sure how that would work or be implemented. It is true that one server can handle the current load and small outages are a nuisance. But wouldn't proxies serve as insurance against catastrophic failure? [/quote] Well, proxies is not the only possible approach. For example, traditional database replication might work well enough - if something drastic happens with master server, slave takes over domain after timeout, and only few last seconds of master activity might be lost. Another example, is mutually replicating servers. This requires much more efforts on database design and coding, but gives even better protection against failures of servers, ensuring that at any moment there is at least one server on internet that can be contacted by clients - and unlike proxies, such servers will be able to provide more efatures to project. More security issues should be kept in mind, tho. To choose between which db/network architecture is better, one need to ask: how GIMPS should look like in year 2005? 2010? That's a worthy thing to discuss. [quote="Prime95"] I agree with previous posters that there are two major hurdles to solve. One is cost. I think enough interest has been expressed here that the roughly $100 / month cost could be handled. Even an initial server outlay may not be a problem. [/quote] Considering that bandwidth requirements of server are very low (what's 30Gb/month? Nothing. But someone told it's 1Gb/month - ridiculously low traffic), I could provide 2 (and maybe 3) accounts on different servers to GIPMS. All servers reside in large data center which ensures excellent network and uptime. Of course, I will require that hosted software should be efficient enough to let other stuff on the servers run smoothly (which should not be much of issue - I currently run mprime on the servers testing exponents - after stopping it it will release a whole bunch of hardware resources). I saw others were willing to provide servers in their homes - as long as they have fast enough internet connection, that can be used for GIMPS and add good overall value to project. See below. [quote="Prime95"] The second hurdle is design and implementation. Coming up with a grand plan and implementing in stages seems reasonable. [/quote] Well, let let me throw in few more thoughts. First, primenet site should be broken into 3 parts: - module that handles reserving and releasing exponents, as well as storing results. This is the only module that requires replication. It can not be effectively developed by large community; but in fact, no need to - as someone noted, server-side stuff is not something extraordinary difficult. Working site might be ready for beta testing in less than a month, including planning stage (as alot of planning has already been done). Also, this module does not need much CPU, RAM or HD resources. If proxy and/or one of the replication approaches adopted, home computers might be used here even if their internet connection is not too reliable. - module that maintains accounts. (adding accounts, forming/joinging teams, changing passwords, etc using web forms). No need to have hot replication of this, hot backup is sufficient - if this module is ever down for a couple of days, it will not hurt too much (but with backup, this module can be upped at another server in a matter of few hours). - module that processes and shows stats. This is where most hardware resources needed. Fortunatelly, it does not require absolute reliability - if it goes down for a couple of days, that's passable. Note that this module can be duplicated at many independent servers - every stats server just need a stream of updates from master servers. This also greatly reduce security concerns. That's where home computers with good internet connection would be most useful. Now, about development of all that. I would like to volunteer to development of first 2 modules. I have the neccessary expertise: I'm involved with distributed projects since the old days of RC5-56 and OGR-23; specificially, participate within GIMPS/primenet for over 4 years already; have developed high-traffic server-side applications, specificially webcounter.com. I would prefer to form a small (2-3 persons) team that develop, maintain, and independently provide hosting (to ensure that new primenet never goes down even if some team member dies tomorrow). Development should be cathedral-style. Regarding stats: providing just the very basic stuff like what currently entropia provides is no problem at all. But if someone wants advanced stats - I suggest that bazaar-style project should be laucnched. It will take alot of work, good stats is not a task one person could easily handle. Another important module is code within GIMPS client software (prime95/mprime). I believe it's George realm currently. Fortunatelly, new primenet might use currently used HTTP-based protocol - or another HTTP-based protocol; that should not take too much efforts to implement. If anyone willing to seriously discuss these things (seriously means participating in development and/or hosting hardware worldwide), please drop me a note at: gimps at supplehost dot com. If someone has already formed a team that has started to work, please let me know. |
| All times are UTC. The time now is 12:58. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.