![]() |
Question about ggnfs. I am currently in the sieving stage of a 154 digit number. Poly selection took about 4 days and I have a decent one selected.
Now as I run the sieve I am curious about the .dat file that is created. The job has been running for about 26 or 27 hours on 21 Intel Xeon CPU's and the job.dat file is already 5.3 GB. How much space do you figure I will need for a factor job this big? Also I see this in the output in the ggnfs.log file, trying to figure what it means... [code] Wed Mar 17 13:40:22 2010 Msieve v. 1.45 Wed Mar 17 13:40:22 2010 random seeds: 9e01d9ec 1d44eb7b Wed Mar 17 13:40:22 2010 factoring 6813377766757638164918650305665391545877815056634620577957683139030334314048355246578767633356280078928552022932140281258043983076447823479268400293856367 (154 digits) Wed Mar 17 13:40:25 2010 no P-1/P+1/ECM available, skipping Wed Mar 17 13:40:25 2010 commencing number field sieve (154-digit input) Wed Mar 17 13:40:25 2010 R0: -1156038696359091884229749817581 Wed Mar 17 13:40:25 2010 R1: 193522735996815187 Wed Mar 17 13:40:25 2010 A0: -77511246416842652782947827243153982720 Wed Mar 17 13:40:25 2010 A1: 52845910196937376909035045081456 Wed Mar 17 13:40:25 2010 A2: 64530037137566250198775404 Wed Mar 17 13:40:25 2010 A3: -6266879667492455540 Wed Mar 17 13:40:25 2010 A4: -604994336743 Wed Mar 17 13:40:25 2010 A5: 3300 Wed Mar 17 13:40:25 2010 skew 10062842.74, size 5.537956e-15, alpha -6.828280, combined = 3.008750e-12 [/code]This looks like the poly selection output but keeps showing up during sieving. After that it ommences relation filtering and duplicate removal. All is good but I am just curious I guess. -- |
[QUOTE=sleigher;208692]This looks like the poly selection output but keeps showing up during sieving. After that it ommences relation filtering and duplicate removal. All is good but I am just curious I guess.
--[/QUOTE]A little more info is needed. The poly information is output each time the post-processing is tried. Unfortunately, the perl script is broken in regards to how often to try post-processing. Post the follwoing lines and we can figure out how far along yiou are: [INDENT]o The FRMAX/FAMAX from the msieve.fb file, o The line from msieve.log that says "found xxx duplicates and xxxx unique relations", oThe line from the bottom of msieve.log that says "reduce to xxx relations and xxx ideals in xx passes"[/INDENT](You can disregard the part about "need 1000000 more relations"....it says 1M more until you get very close....) |
Here it is. Not sure about FRMAX/FAMAX though. I don't see it.
Here is the entire fb file from the poly stage. [code] N 6813377766757638164918650305665391545877815056634620577957683139030334314048355246578767633356280078928552022932140281258043983076447823479268400293856367 SKEW 10062842.74 R0 -1156038696359091884229749817581 R1 193522735996815187 A0 -77511246416842652782947827243153982720 A1 52845910196937376909035045081456 A2 64530037137566250198775404 A3 -6266879667492455540 A4 -604994336743 A5 3300 skew: 10062842.74 type: gnfs [/code]I cannot be that far along but I do want to understand the output. [code] found 86819723 duplicates and 11200677 unique relations reduce to 33 relations and 0 ideals in 3 passes [/code]Thanks! |
[QUOTE=sleigher;208749]Here it is. Not sure about FRMAX/FAMAX though. I don't see it.[/quote]
It's only used for line sieving with msieve. It doesn't matter that it isn't there. [quote][code]found 86819723 duplicates and 11200677 unique relations reduce to 33 relations and 0 ideals in 3 passes [/code]Thanks![/QUOTE] That is a massive number of duplicates. Which siever did you use? |
I am using ggnfs and the perl script that calls it. Using the perl script to spread it across hosts. Why do you think there are so many duplicates?
[code] gnfs-lasieve4I14e -k -o spairs.out3.T1 -v -n3 -a bignum.job.3.T1 [/code] |
[QUOTE=sleigher;208774]I am using ggnfs and the perl script that calls it. Using the perl script to spread it across hosts. Why do you think there are so many duplicates?
[code] gnfs-lasieve4I14e -k -o spairs.out3.T1 -v -n3 -a bignum.job.3.T1 [/code][/QUOTE] That's [B]86 Million[/B] Duplicates and [B]11 million[/B] unique relations, i.e. ~8 times as much duplicates than unique relations, that indicates that something has went wrong, maybe you have duplicated (or quintuplicated) the work on some subranges?. (No problem for further work on the factorization, but the duplicates indicate that you have spent ~7-8 times more time than necessary.) A normal amount of duplicates is roughly 10% (for some factorizations even 20%), i.e. ~1-2M duplicates, but a duplication rate of ~700% is *very* unusual. Edit: [quote]reduce to 33 relations and 0 ideals in 3 passes[/quote] This is a "normal" step in relation filtering which failed because you don't have enough relations yet. For a C154 I guess that you need approx. 30-40 million unique relations. (please correct me if I'm wrong - I am currently not at home to check my excel file.) Make sure that you don't duplicate your subranges, then you should take less time than you already have used for this job. Using the 14e siever for a c154 seems to be OK. Edit2: Did you start the pearl script on multiple computers (i.e. invoke the pearl script seperately on each computer)? In this case every computer would start sieving on the same point and thus get exactly the same relations. |
Yes, I did start each job separately on each computer. According to what I read that is what I was supposed to do.
Like this: [code] host 1. /usr/bin/perl factmsieve.pl bignum 1 3 host 2. /usr/bin/perl factmsieve.pl bignum 2 3 host 3. /usr/bin/perl factmsieve.pl bignum 3 3 [/code]All from the same NFS mounted directory. Was that not the right way to go? Should I stop and start over? I wanted to distribute the load so it would take less time. :( |
[QUOTE=sleigher;208795]Yes, I did start each job separately on each computer. According to what I read that is what I was supposed to do.
Like this: [code] host 1. /usr/bin/perl factmsieve.pl bignum 1 3 host 2. /usr/bin/perl factmsieve.pl bignum 2 3 host 3. /usr/bin/perl factmsieve.pl bignum 3 3 [/code]All from the same NFS mounted directory. Was that not the right way to go? Should I stop and start over? I wanted to distribute the load so it would take less time. :([/QUOTE] I don't think that you have to start over - you just have to figure out which subranges have been actually done - and then continue from that point. For doing this I would suggest that you look into your NFS mounted directory for files with the ending ".job.T1", ".job.T2", etc.. These files should look like this (of course with your number, poly and parameters! my file which I post here is just a small example factorization.): [code]n: 1082459403348015097680203453036340709782567234943890172824341268095745764903671381 m: Y0: -28591131987968371775 Y1: 8664281557 c0: -1863912027185931608438644 c1: 42170438213405082472 c2: 24042628797369 c3: -354483612 c4: 1620 skew: 262928.24 rlim: 350000 alim: 400000 lpbr: 24 lpba: 24 mfbr: 37 mfba: 37 rlambda: 1.7 alambda: 1.7 [B]q0: 400001 [/B]qintsize: 4999 [B]#q1:405000 [/B][/code] Look for the lines which are beginning with "q0:" and "#q1:" - these are the subranges which have been sieved most recently. Look for the biggest #q1 number which you find - this is most likely the upper end of the range which you have already sieved - so you have to continue from this point. I am not very familiar with using factmsieve.pl for distributed sieving, so maybe someone else could help out please, but it seems that you have indeed duplicated your work instead of distributing the workload. Usually I am distributing the jobs with manual inputs (and coordinating the subranges with an excel file to avoid duplication). These inputs might look like this: [CODE]Host 1: ./gnfs-lasieve4I14e -a bignum.job -o outputfile.20M.25M.out -f 20000000 -c 5000000 Host 2: ./gnfs-lasieve4I14e -a bignum.job -o outputfile.25M.30M.out -f 25000000 -c 5000000 etc....[/CODE] (note: if a host has for example 2 CPUs, then open 2 bash windows and start one of these jobs in each bash) where bignum.job is one of your ".job.T1" files WITHOUT the lines which are beginning with "q0", "qintsize" and "#q1" outputfile is an individual outputfile for each job - I usually tend to add the subrange to the filename - to see what the file contains when I am looking lateron The number after [B]-f[/B] is the start of your subrange the number after [B]-c[/B] is the length of your subrange The range of your first job should start at the highest q1 which you find in your .job.T1, .job.T2, etc. files. Note: Avoid overlapping (i.e. duplication) of your subranges. |
I am looking at relations though and I see the following from each job file.
These are in order from first job to last and it seems they are in the right order and not overlapping. [code] q0: 20950000 #q1:20964285 q0: 20964285 #q1:20978570 q0: 20978570 #q1:20992855 q0: 20992855 #q1:21007140 q0: 21007140 #q1:21021425 q0: 21021425 #q1:21035710 q0: 21035710 #q1:21049995 q0: 21650000 #q1:21664285 q0: 21664285 #q1:21678570 q0: 21678570 #q1:21692855 q0: 21692855 #q1:21707140 q0: 21707140 #q1:21721425 q0: 21721425 #q1:21735710 q0: 21735710 #q1:21749995 q0: 21750000 #q1:21764285 q0: 21764285 #q1:21778570 q0: 21778570 #q1:21792855 q0: 21792855 #q1:21807140 q0: 21807140 #q1:21821425 q0: 21821425 #q1:21835710 q0: 21835710 #q1:21849995 [/code] I also see this in the ggnfs.log file -> makeJobFile(): Adjusted to q0=21250000, q1=21350000. -> client 1 q0: 21250000 -> makeJobFile(): Adjusted to q0=22050000, q1=22150000. -> client 3 q0: 22050000 So it looks like it is adjusting certain clients to different ranges as it needs to. |
[QUOTE=sleigher;208795]So it looks like it is adjusting certain clients to different ranges as it needs to.[/QUOTE]Quick question: over on the msieve thread, you posted this:[QUOTE=sleigher;208570]okay, using the perl script seems to be working. I have 3 hosts with 7 cores a piece all running from the same NFS mounted directory.
Jobs are numbered properly and all is well. One of the hosts has CPU's that are a little faster than the other 2. Think that will matter?[/QUOTE]With multi-core PCs running, did you modify $NUM_CPUS in the configuration section at the top of the script? My first thought on seeing the number of duplicates you've got is that maybe the "multiple threads" code (using $NUM_CPUS) is stomping on the "multiple clients" code (using $CLIENT_ID). (The theory behind this advice is that AFAIK the two different methods were put in at different times by different people, and this re-working by Brian is the first major effort to rework the entire script; everything prior to this has been tinkering at the edges...) If you did set $NUM_CPUS to something higher, then set it back to 1 and, as Andi47 said, use a different command window for each separate thread you want to enable on each PC. You can still run everything out of the central directory, since the files are all named using different extensions as things progress. |
I did in fact change $NUM_CPU to 7 for each host. Each host is a dual quad core. So it isn't doing it properly then. darn....
I will do what was suggested above and start each job in it's own window. If I am going to track the ranges manually, what is a good range for each job and how high do I go? 20 mil? 30 mil? It seems currently that ranges are 100000. stay with that? |
| All times are UTC. The time now is 23:00. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.