![]() |
|
|
#12 |
|
(loop (#_fork))
Feb 2006
Cambridge, England
144238 Posts |
The script as I shipped it will never recommend [1,x] because I had no figures for B1=1e6 in the file; and its timings for 140 digits will be extrapolations from data for 150..250. Did you replace all the timings with ones you measured yourself, or just add the ones for B1=1e6 ?
I would be quite inclined to remeasure ECM timings with various b1 on input 10^140+13 and 10^130+1113, if you're wanting to draw any conclusions for those smaller inputs. |
|
|
|
|
|
#13 | |
|
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
23·3·5·72 Posts |
Quote:
It probably would be worth running 1e6 on smaller numbers. Ideally the more levels of ecm we add the better. Once the initial timings are run then it is all automated. I think it would be interesting to show the aliquot sequence people how inefficient their ecm is. Would it be possible to show the chance that the number goes to nfs after all the ecm? It would be interesting to compare this to the value for the standard yafu ecm. I reckon there could be a fairly large difference. There is more to be gained per number with larger numbers but many more small numbers are done. One feature that would be useful would be automatic increasing/decreasing the number of curves added per run. For 1e6 and 3e6 100 is too much but when you get to 11e7 it can be too little. edit: It maybe be worth making a script that people can run that generates all the timing and probability data. This would allow full flexibility in the optimization of ecm effort. There is no reason that we need to be restrained by the traditional bounds. The optimal solution would have a different bound for each curve. Last fiddled with by henryzz on 2016-07-13 at 18:44 |
|
|
|
|
|
|
#14 |
|
Oct 2006
Berlin, Germany
10011010012 Posts |
The current meassurements for B1=29e8 without memory limitation and with -maxmem 1800
Learnings from this: - should fit to a Boinc user system, even more if I split it into phase1 jobs and phase2 jobs. - phase2 runtimes doubled with -maxmem 1800. This will be even worse for larger composites and larger B1. So I will test larger B1 with -maxmem 12G. |
|
|
|
|
|
#15 | |
|
Feb 2012
32×7 Posts |
Quote:
It sounds like, from previous posts, that that machinery used for this project is not "state of the art", i.e. perhaps 3-5 years old and limited to around 12GB RAM per machine. It would be nice to know what kind of CPUs these machines have (and if they have GPUs of any sort too), as you could probably 'divide' the workload among this hardware to better speed up your project. For me personally, I wouldn't even attempt B1 = 2.9G until I ran something like 70K curves at B1 = 850M, which is a hell of a lot!! But let's assume you have done this level of work on some composites. I wouldn't be so concerned with time consideration between 1 thread using 16GB of ram on stage 2 vs 1.8GB of ram per thread on stage 2. The reason is that you will need to run so many curves, it is better to think of maximizing parallelzation of your problem then speed per curve. To illustrate, I'll assume that your 5 year old computers have 16GB of ram and intel dual core processors with hyper-threading (with is pretty common CPU technology from 5 years ago) and 64 bit architecture. If we take a 250 digit composite, stage 2 using 12GB (assume 11GB for ecm and another 1GB for transiant CPU activity) of ram will take 1.73hr, while it will take 4.66hr at 1.8GB. Now, in reality you can use up to 4GB per thread, so this time might go down. However, these numbers assume a single thread, which is all you can run using your 16GB machine at the default 11GB per curve on stage 2. But if you use only 4GB per thread, you can now run 4 curves at once (which is possible on a hyper-threaded dual core processor). Now, running a single curve vs 4 curves at once will be about 50% faster (per thread, based on my empirical observations), so the effective speed up of using less RAM per curve is 4/1.5 = 2.67X. Now, when you take into account that reducing the RAM per curve from 11GB to 2.75GB should take about 2 times longer (I noticed this when comparing stage 2 run-times at B1 = 3M to B1 = 260M on a C200 using the default memory and when reducing the memory by a factor of 4), this results in speed up of about 35%. Not bad for 'rationing' you ram usage. Of course, if you have a GPU you can realized even higher speed ups (compared to just using CPUs) as you can let the GPUs do hundreds or thousands of stage 1 curves at high B1 values, save the files, an let your CPUs do the stage 2. I think that is the way to go, but ultimately you know what hardware is available to you and hopefully you can find the optimal strategy to get your desired factors. Best of luck to you! Last fiddled with by cgy606 on 2016-07-16 at 07:58 |
|
|
|
|
|
|
#16 |
|
Oct 2006
Berlin, Germany
617 Posts |
Thank you for your hints.
Yes it is an older system with 16GB Ram (this one http://universeathome.pl/universe/sh...p?hostid=41792). Main purpose of my tests are to learn how runtime and memory increases by increased composite length and larger B1. Afterwards I derive formulas for runtime=f(C,B1) and memory=f(C,B1), which I need as prediction to send Boinc workunits to my users and to avoid sending workunits to systems which are not able to handle them. I checked the active hosts. There are more than 200 who have more than 20GB RAM. I run currently ecm up to B1=850M with -maxmem 1800. 1.8G because many user have 2G per core and 2G is the max size a 32bit system can handle. yoyo |
|
|
|
|
|
#17 |
|
"Carlos Pinho"
Oct 2011
Milton Keynes, UK
3×17×97 Posts |
yoyo,
Do you think yoyo can have a large ECM effort on the Cunningham base 2 numbers (with B1 ~ 2^32)? Perhaps we might get Bruce to indicate the total ECM work he has performed on these numbers before making a decision. EPFL ran 19000 curves with B1 = 10^9 on all of the 2^n+1, n odd, composites. Carlos and Silverman |
|
|
|
|
|
#18 |
|
Oct 2006
Berlin, Germany
26916 Posts |
Currently I tend to make 2 Boinc Apps, one for phase1 and one for phase2.
Phase2 I want to limit to 10G RAM usage, not to much less because it limits runtime and phase 2 seems to have no checkpoints. With this at least B1=2.9e9 up to C400 is possible. Tests for large B1 are still running. |
|
|
|
|
|
#19 |
|
Oct 2006
Berlin, Germany
617 Posts |
Which command line options I have to use to run only Phase1 and only phase 2 for a given composite and B1?
|
|
|
|
|
|
#20 |
|
"Dave"
Sep 2005
UK
23×347 Posts |
|
|
|
|
|
|
#21 |
|
Oct 2006
Berlin, Germany
617 Posts |
If stage 1 is done on the GPU of a single computer, and stage 2 on multiple computer. Can than the stage 1 save file also be used on those many computer which runs stage 2?
|
|
|
|
|
|
#22 |
|
(loop (#_fork))
Feb 2006
Cambridge, England
72·131 Posts |
Yes, you can in theory run each line from the stage 1 save file on a separate computer; I have often sliced up the file and run stage 2 on 48 cores
Last fiddled with by fivemack on 2016-12-05 at 11:56 |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| P73 found by a yoyo@home user | yoyo | GMP-ECM | 3 | 2017-11-08 15:20 |
| yoyo@home and wikipedia | Dubslow | Factoring | 1 | 2015-12-06 15:56 |
| Yoyo factors | 10metreh | ElevenSmooth | 20 | 2013-05-14 03:27 |
| Anyone want to compile an OS X ecm for yoyo? | jasong | GMP-ECM | 1 | 2009-03-14 11:22 |
| Second yoyo-BOINC factor | wblipp | ElevenSmooth | 0 | 2009-02-20 00:25 |