mersenneforum.org  

Go Back   mersenneforum.org > Other Stuff > Archived Projects > NFSNET Discussion

 
 
Thread Tools
Old 2004-09-30, 01:52   #12
S00113
 
S00113's Avatar
 
Dec 2003

21610 Posts
Default

Quote:
Originally Posted by xilman
Almost certainly.
In that case the mind breaking MAJOR problem is solveable. Just find a polynomial and start sieving!
[pre]
magnum $ /usr/sbin/swapinfo
Kb Kb Kb PCT START/ Kb
TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
dev 8192000 0 8183808 0% 0 - 1 /dev/vg00/lvol2
dev 71663616 0 71663616 0% 0 - 1 /dev/vg00/lvswap1
dev 102498304 0 102481920 0% 0 - 1 /dev/vg04fc60/swap2
dev 102498304 0 102481920 0% 0 - 0 /dev/vgeva/swap3
dev 102498304 0 102481920 0% 0 - 0 /dev/vgeva/swap4
reserve - 36803788 -36803788
memory 133954988 33556464 100398524 25%
[/pre]
The machine is about 10 meters from my office.

Don't depend on it yet. There is price money involved, so I want to get permission first, and agree on something to use "my" share on. A node for our new cluster perhaps, or some charity. And they'll almost certainly ask how long it will take in CPU time. Any estimates?
S00113 is offline  
Old 2004-09-30, 10:20   #13
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

10,753 Posts
Default

Quote:
Originally Posted by S00113
In that case the mind breaking MAJOR problem is solveable. Just find a polynomial and start sieving!
If only it were that easy. Somehow I don't think you realise just how much work that second sentence really implies.

A big NFS factorization is like a swan on a river. Mostly calm and serene on the surface, but paddling furiously underneath just to maintain its position.
Quote:
Originally Posted by S00113
Don't depend on it yet. There is price money involved, so I want to get permission first, and agree on something to use "my" share on. A node for our new cluster perhaps, or some charity. And they'll almost certainly ask how long it will take in CPU time. Any estimates?
The prize money is small and your portion of it, unless you do a lot of sieving too, is likely to be very small. Back in the days when Richard Wackerbarth, Don Leclair and myself were creating NFSNET we did some sieving on RSA-576. We spent quite a bit of time on it. Our share of the prize money was well under USD 100 but Jens Franke rounded it up to USD 100, which we donated to charity.

As for CPU time, the best estimate available is "lots". The matrix is likely to take a few dozen gigaHertz years to run.

Paul
xilman is offline  
Old 2004-09-30, 22:45   #14
Minty
 
Minty's Avatar
 
Sep 2004

22·5 Posts
Default

I for one hadn't realised just how long the matrix would take to run, say 40-80GHz years is a long time!!! Would the Superdome be available for a 6 months to a year project? (Presumably it would run at low priority and therefore the machine still be useable for other work) One other thing though - couldn't NFSNet sieve 'for free' (as it is now) and give a larger share of the prize money to the matrix solver say 6 to $8,000, effectively to 'pay' for the supercomputer's time? (and keep the rest for prizes/development time etc.) This seems quite a good idea as it might be an incentive to put to use company machines out there that might not be fully used, meaning that everyone's a winner. The only potential problem I can see is handing over shedloads of work to a company environment that might be more financially motivated than honesty driven, and that might then go on to submit the factors and claim the entire prize for themselves! I do not cast any doubt on anyone's intentions here, but I guess we have to be realistic - business can be quite sickening sometimes.
Either way I'm happy - if we end up sieving RSA challenges I'm happy to help, and if we don't - well hey - my dream's still alive - one day I'll build a fast supercomputer and go around solving crypto challenges - I'll probably be called Cryptoman!!!
Minty is offline  
Old 2004-10-01, 00:16   #15
ColdFury
 
ColdFury's Avatar
 
Aug 2002

32010 Posts
Default

Quote:
Developing the matrix solver that can be run in a distributed computing platform is the answer.
A lot of people have tried.
ColdFury is offline  
Old 2004-10-01, 07:49   #16
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

2A0116 Posts
Default

Quote:
Originally Posted by ColdFury
Quote:
Developing the matrix solver that can be run in a distributed computing platform is the answer.
A lot of people have tried.
Some people have succeeded if, by "distributed computing platform", you mean two or more self-contained computers connected together by a network. Peter Montgomery's implementation works very well in such an environment and it was used for all the NFSNET matrices until I left Microsoft Research.

There is a catch though. The network connections have to be reasonably fast and reasonably low latency. Gigabit ethernet and a dedicated switch connecting the machines together is ok for a cluster of up to a dozen or two machines. The MSR cluster consists of 16 machines, each with a pair of 1GHz PIII cpus. They would saturate the ethernet when running at full blast. The main limitation was latency. A gigabit network with much lower latency, Myrinet perhaps, would have been more productive though much more expensive.

Peter's distributed blocked Lanczos would almost certainly work on systems connected over the wider internet but the latency would cripple its performance.

Paul
xilman is offline  
Old 2004-10-01, 08:05   #17
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

10,753 Posts
Default

Quote:
Originally Posted by Minty
I for one hadn't realised just how long the matrix would take to run, say 40-80GHz years is a long time!!!
Here's a way-point to give you some idea of what's involved. When NFSNET did 2,811-, aka M(811), we ran into problems in creating a matrix. I ran a matrix which had 14.57 million rows and columns, though much sparser than normal. It took 15,000 GHz-hours to run. That is 625 GHz-days or 1.7 GHz-years. It ran on 30 cpus, each of 1GHz, arranged as a 15-member cluster of dual-proc systems. Thus each system used 15,000/15 = 1000 hours cpu. The elapsed time was over twice that, mostly because the computation was limited by the network rather than the cpu but partly because of crashes and scheduled downtime. I don't have precise dates to hand, but the computation ran for very close to three months.

The matrix for RSA-640 is unlikely to be smaller than the above matrix. Even if it is of comparable size it will be several times heavier. The running time of blocked Lanczos is roughly proportional to the number of set bits per row multiplied by square of number of rows.

Paul
xilman is offline  
Old 2004-10-01, 11:31   #18
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

22×5×373 Posts
Default

Quote:
Originally Posted by xilman
Some people have succeeded if, by "distributed computing platform", you mean two or more self-contained computers connected together by a network. Peter Montgomery's implementation works very well in such an environment and it was used for all the NFSNET matrices until I left Microsoft Research.

There is a catch though. The network connections have to be reasonably fast and reasonably low latency. Gigabit ethernet and a dedicated switch connecting the machines together is ok for a cluster of up to a dozen or two machines. The MSR cluster consists of 16 machines, each with a pair of 1GHz PIII cpus. They would saturate the ethernet when running at full blast. The main limitation was latency. A gigabit network with much lower latency, Myrinet perhaps, would have been more productive though much more expensive.

Peter's distributed blocked Lanczos would almost certainly work on systems connected over the wider internet but the latency would cripple its performance.

Paul
It should be further noted that per-processor CPU utilization degrades
rapidly as the number of machines increases; The algorithm does not
scale well in parallel. As you add more machines, communications costs
rise faster than than linearly, at a fixed latency. Of course if you lower the
latency you can add more machines...

While sieving scales linearly with the number of machines (up to say the
number of sieve rows needed), throwing more machines at the LA does
not help.

Haven't people noticed that as the numbers get larger, the ELAPSED time
to do the matrix has increased relative to the elapsed time to do the sieving?
One can always reduce the latter with more machines.....
R.D. Silverman is offline  
Old 2004-10-01, 12:15   #19
S00113
 
S00113's Avatar
 
Dec 2003

21610 Posts
Default

Quote:
Originally Posted by xilman
If only it were that easy. Somehow I don't think you realise just how much work that second sentence really implies.
I guess that just finding a good polynomial and other parameters would require hundreds of CPU months. I can contribute about 300 CPUs myself, mostly P4, about 18 hrs/day. I have no idea how much time the sieving will take, but by then I'll have many more CPUs.
S00113 is offline  
Old 2004-10-01, 12:20   #20
S00113
 
S00113's Avatar
 
Dec 2003

23×33 Posts
Default

Quote:
Originally Posted by Minty
I for one hadn't realised just how long the matrix would take to run, say 40-80GHz years is a long time!!! Would the Superdome be available for a 6 months to a year project?
Certainly not with all 64 CPUs! Also the job must be able to checkpoint and continue later during periods of high load.

How well does the matrix solver parallellize on a NUMA machine? Would it be possible to add and withdraw CPUs dynamically, possibly by a stop and a restart from checkpoint?
S00113 is offline  
Old 2004-10-01, 13:54   #21
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

250018 Posts
Default

Quote:
Originally Posted by S00113
I guess that just finding a good polynomial and other parameters would require hundreds of CPU months. I can contribute about 300 CPUs myself, mostly P4, about 18 hrs/day. I have no idea how much time the sieving will take, but by then I'll have many more CPUs.
Finding polynomials is easy. It is trivially parallelizable and can be split between as many machines as desirable.

Finding parameters is also easy. We know roughly where to look by extrapolation from somewhat smaller factorizations done previously. Then we set each of a number of machines to perform trial sieving for various plausible choices of the parameters.

Performing the actual production sieving is not quite so easy. The NFSNET infrastructure is capable now of handling the client/server workload to factor RSA-640 without too much difficulty though adding a few more servers would be desirable, for resilience reasons if nothing else. However, the sieving software isn't quite up to the job yet. The CWI siever was written for 32-bit machines (though runs well on 64-bit architectures) and the limitations show. It needs significant re-writing to overcome those limitations. Most other sievers in my experience are similarly limited, though Jens Franke's is less limited than most. For instance, we will want to allow large primes > 2^30 which is the present CWI limit. Someone who knows what they are doing will have to do the rewrite and that takes time and effort.

Post-sieving tasks would be rather hard, as discussed earlier in the thread.

The really difficult parts are not the computation. That's largely straightforward, if tedious at times. What is really hard is the human effort involved. There are a number of people paddling furiously underneath the NFSNET you see on the surface.

Work begins, usually by me, by choosing a number to be factored and its sieving parameters. Sometimes the parameters are easy to chose because they can be much the same as the ones for an earlier project (though with different polynomials, root and N of course). A trial sieve is performed and the results evaluated. If they are good enough, meaning that enough relations will be collected to complete the factorization, that's the end of it. Otherwise changes will be made to the params (based on a combination of experience and feedback from the trial sieve) and the process repeated. At this point a "details.txt" is produced and another person runs a sanity check on it. (This, incidentally, is where the 5_304P debacle began: I screwed up the details.txt and the sanity check didn't pick up the errors). This first stage takes a few man-hours labour.

The details.txt is uploaded to the servers as are task lists for each server. A task list is basically a range of lines for which that server is responsible. We generally issue several blocks of 100K lines to each server. Someone has to divide the work between the servers and ensure that no server allocates the same work as any server and that all the work is allocated. Someone also has to monitor the progress of the computation and ensure that more tasks are added if the initial surveys under-estimated the work that would be needed, or remove tasks if over-estimated.

OK, assume that all that has been done. Now the server admins have to deal with the many thouands of files which are returned by the clients. Some files will be corrupted. Some tasks will have been interrupted part way through, either cleanly or not. The garbage has to be recovered as far as possible. The missing tasks or parts of tasks have to be identified and converted into additional task lists to be added to the server.
A lot of this work can be automated but it still requires a great deal of human effort. Remember that the servers are scattered around the world, in different timezones, running different operating systems, are located in different organizations with different access and security regimes. Remember that all the data has to be preserved carefully, preferably with multiply indendent backups. Would you fancy dealing with, say, half a million files held in fifty thousand directories and totalling ten gigabytes of data? Assume a few percent of the files are corrupt in some way and it's your job to find out which they are and to take corrective action. Oh, and the original data is five thousand miles away from your system. Historically, Richard Wackerbarth has performed most of this labour, ably assisted by other server admins.

So far I've not even considered the post-sieving phase. Someone has to boil down all those files and gigabytes. The human labour required is distinctly non-trivial.

I have also not considered the labour involved in answering questions, giving advice, asking people to fix problems, maintaining mailing list memberships, contributing to mailing lists and on-line fora, writing documentation, translating docs and so on from English into other languages and a host of other such non-computational efforts. Neither have I discussed the care and feeding of web sites, including the production of performance statistics and fixing the inevitable errors. Then there is DNS maintenance ...

That's part of what I meant by "If only it were that easy".


Paul
xilman is offline  
Old 2004-10-01, 14:03   #22
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

1075310 Posts
Default

Quote:
Originally Posted by Bob Silverman
It should be further noted that per-processor CPU utilization degrades
rapidly as the number of machines increases; The algorithm does not
scale well in parallel. As you add more machines, communications costs
rise faster than than linearly, at a fixed latency. Of course if you lower the
latency you can add more machines...
For reasonably small number of machines, the overall performance seems to scale roughly as the square root of their number. Beyond the first few, doubling the performance starts getting very expensive.

As Bob and I have noted, it is network latency that generally limits performance rather than bandwidth or cpu speed. That said, bandwidth is important. I found that a PII-300 can easily use all the bandwidth of a 100M ethernet, but a 1GHz PIII is latency limited on gigabit ethernet --- it only gets about 25% of the bandwidth and 40% of the cpu because the rest of the time it is waiting for the data to travel from cpu to cpu.

Paul
xilman is offline  
 

Thread Tools


All times are UTC. The time now is 00:14.


Sat Jul 17 00:14:07 UTC 2021 up 49 days, 22:01, 1 user, load averages: 1.50, 1.73, 1.62

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.