![]() |
|
|
#12 |
|
Nov 2010
2×7 Posts |
Mini-geek and Jason, thank you so much to you both, you've been incredibly helpful :) .
As it is impossible for me to access the computers over the weekend (the room is locked) and I anticipate the relations to be at 30% by monday morning, it is unlikely that changing the polynomial will make the sieving stage any faster, but I will do the math and find out. From what you have both said, I think splitting the linear algebra step is a bit beyond me, but If I am nearing the deadline then I will run square roots for different dependency numbers on different machines. I just have one more short question for now, I'm not sure how familiar you are with the factmsieve python script that switches between msieve and GNFS for different stages, but that is what I am using. This script has inbuilt support for multiple clients (hence how I am using multiple computers). But when you start each client you have to specify which number client that one is and also the total number of clients. My question is this: now that I have started the sieving with 10 clients (by inputting the command: 'factmsieve.py 1 10, 2 10, 3 10 etc for each machine) can I add an 11th or 12th client by just putting 11 12 or 12 12? Or will that not work/ screw up the whole thing seeing as the other 10 think the maximum is 10? Sorry if that isn't very clear but I hope you get what I am saying. |
|
|
|
|
|
#13 | |
|
Account Deleted
"Tim Sorbera"
Aug 2006
San Antonio, TX USA
10000101010112 Posts |
Quote:
From the factmsieve.py file, this is how the splitting works: Code:
# For multiple clients, the q search space is divided
# into major blocks of length num_clients * fact_p['qstep'] so
# that major block i starts and ends at:
#
# QSTART + i * num_clients * fact_p['qstep']
# QSTART + (i + 1) * num_clients * fact_p['qstep'].
#
# Within each such major block, client k sieves the
# (k - 1)'th fact_p['qstep'] block. It then proceeds to it's
# fact_p['qstep'] block within the next major block
Last fiddled with by Mini-Geek on 2010-11-20 at 17:55 |
|
|
|
|
|
|
#14 | |
|
"Frank <^>"
Dec 2004
CDP Janesville
2×1,061 Posts |
Quote:
Last fiddled with by schickel on 2010-11-20 at 18:25 Reason: Minor edit for clarity |
|
|
|
|
|
|
#15 | |
|
(loop (#_fork))
Feb 2006
Cambridge, England
72×131 Posts |
Quote:
Last fiddled with by fivemack on 2010-11-20 at 23:24 |
|
|
|
|
|
|
#16 |
|
Nov 2010
2·7 Posts |
Ah ok so fivemack I guess i'll be very lucky if I manage to do it in 4 days with my i7-920.
So if I set a new group of computers to do a different range, when the first group reach that range will they ignore it automatically (they are all writing to them same dat file)? Or will I need to stop them and reset them at a new range? I think my best bet is to stop all the machines when I get access to them on monday morning and then just restart them all along with a few extra ones. My only concern with this is that the ones I stop will all create resume files. How will that affect things if I then pick up with more than 10 machines? |
|
|
|
|
|
#17 |
|
(loop (#_fork))
Feb 2006
Cambridge, England
72×131 Posts |
Aargh! Having all the machines writing to the same dat file is in general a bad idea; it increases the opportunities for data corruption while not gaining anything.
I would strongly recommend that you write your own scheduling tools for this problem, if only that it's a useful exercise to write scheduling tools, and profitable to be able to blame nobody but yourself when you accidentally set two rooms full of machines to sieving the same region. Not that I've done that more than twice. The i7/920 is (because of the three memory buses) an almost ideal machine for doing large GNFS jobs: the matrix work on a C160 that I did three weeks ago (7124576x7124803) took only 70 hours using -t4 on my i7/920. 55603893 relations, 44226266 unique; gnfs-lasieve4I15e for the sieving, 29-bit large primes, 3LP on algebraic side, alim=rlim=25e6 |
|
|
|
|
|
#18 | ||||
|
"Frank <^>"
Dec 2004
CDP Janesville
2·1,061 Posts |
Quote:
Quote:
Quote:
Quote:
|
||||
|
|
|
|
|
#19 |
|
Nov 2010
2×7 Posts |
As schickel rightly points out the python script makes all the clients write to spairs files which the master machine periodically transfers to the main dat file. So I don't need to worry about any scheduling tools right..?
Fivemack out of interest why did you use t4 with both your dual opteron and i7 post processings? I mean in both cases you had 8 threads available so were you just leaving some threads for general tasks or is there a problem with using more than 4? I only ask because my i7s are currently sieving using 8 threads, would 4 actually make the process quicker? It sounds like adding a second cluster into the mix is going to cause more trouble than it's worth, would my idea of stopping the first cluster and then restarting with a larger cluster work fine? I'm still worried about how there will be 10 resume files but more than 10 machines working on it. Thanks for your time guys :) |
|
|
|
|
|
#20 | |||
|
(loop (#_fork))
Feb 2006
Cambridge, England
641910 Posts |
Quote:
Quote:
Quote:
I think you should probably be testing this sort of thing yourself rather than asking us here; it uses a little compute time, of which you have a reasonable amount, and the result tends to stick with you better. Just out of curiosity, is it Kevin Buzzard lecturing for the class asking you to factor a C160? Tom |
|||
|
|
|
|
|
#21 |
|
Nov 2010
2×7 Posts |
Tom- no actually it is not, I don't think he lectures for any first year courses (at least not the ones I have started so far), why do you ask?
I will look into the python script's code but to be honest I have very little programming experience so I will struggle to have any idea what is going on... Thanks for the advice about the number of threads to use, makes sense and I will bear it in mind for the different stages. You raise a fair point about me testing this stuff myself... and believe me I am trying, it's just a bit of a struggle when barely a week ago I didn't even know what modular arithmetic was... Anthony |
|
|
|
|
|
#22 |
|
Nov 2010
2×7 Posts |
Ok, I have tested out my idea of stopping the 10 clients and then restarting with an 11th, specifying the maximum number of clients as 11, but the 11th just starts sieving right from the beginning again. So it is looking like the only way I will be able to use more machines is if I specify different ranges for them. Can I do this through the python script? Also how would I merge the relations this second cluster would find with those from the first cluster? Is it simply a case as copying and pasting the contents of one dat file into the other?
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| A couple newbie questions | evanmiyakawa | Information & Answers | 4 | 2017-11-07 01:37 |
| new here with a couple questions | theshark | Information & Answers | 21 | 2014-08-30 17:36 |
| 2^877-1 polynomial selection | fivemack | Factoring | 47 | 2009-06-16 00:24 |
| Polynomial selection | CRGreathouse | Factoring | 2 | 2009-05-25 07:55 |
| A couple questions from a new guy | Optics | Information & Answers | 8 | 2009-04-25 18:23 |