mersenneforum.org Some CADO-NFS Work At Around 175-180 Decimal Digits
 Register FAQ Search Today's Posts Mark Forums Read

2020-04-06, 20:55   #1
EdH

"Ed Hall"
Dec 2009

5×13×53 Posts
Some CADO-NFS Work At Around 175-180 Decimal Digits

This thread will be the new home for some posts from the Comparison of GNFS/SNFS With Quartic (Why not to use SNFS with a Quartic) thread. This new thread (with moved posts) has been created to continue discussion, but move it out of the blog area.

There may be slight overlap, with the possibility of a couple duplicate posts, but all new posts should be made in this thread.

Quote:
 Originally Posted by VBCurtis Once the matrix sizes exceed 10M, I think it's pretty important to get off the default matrix density of 70. If you retained the data, I suggest you explore this by setting the flag target_density=100 in your msieve filtering invocation. I've found positive results (measured by matrix ETA) up to density 120, while the NFS@home solvers often use 130 or even 140. I think most of the gains come going from 70 to 90 or 100. A pleasant (and possibly more important side effect) is that it's harder to build a matrix with higher density, which acts as a nice measure of whether you have "oversieved enough". Your second msieve run will almost surely build at TD 100, and I bet the matrix will be smaller by 1M dimensions or so. That might only take a day off the matrix ETA, but saving a day for "free" is still worthy!
I should be able to give this a try. I haven't cleaned up anything yet and will be interrupting all the snfs in favor of letting the gnfs run to completion.

BTW, are you working on Improved params for the 165-175 digit range at all?

Last fiddled with by EdH on 2020-04-20 at 16:22 Reason: Add initial explanation of thread creation.

 2020-04-06, 23:12 #2 VBCurtis     "Curtis" Feb 2005 Riverside, CA 2·7·11·29 Posts I've just started back on building params files, with an eye toward extending the patterns of what-increases-when from the C100-140 files up into C160. Let me know what size you'd like, I'll be happy to put one together for you! I'm running trials personally on C130-140 presently to refine those files and test A=26 vs I=13 and I=14; I think 165-170 will be a spot to test I=14 vs A=28, and I would appreciate some testing of that setting on one of my params files.
2020-04-07, 00:15   #3
EdH

"Ed Hall"
Dec 2009

1101011101012 Posts

Quote:
 Originally Posted by VBCurtis I've just started back on building params files, with an eye toward extending the patterns of what-increases-when from the C100-140 files up into C160. Let me know what size you'd like, I'll be happy to put one together for you! I'm running trials personally on C130-140 presently to refine those files and test A=26 vs I=13 and I=14; I think 165-170 will be a spot to test I=14 vs A=28, and I would appreciate some testing of that setting on one of my params files.
The one I've got running LA right now is a 168 dd HCN, for which I used the default 170 params file. The trouble I foresee is that my CADO-NFS server crashed with memory woes. My hybrid CADO/msieve setup would not give you complete data and that is what I would probably need to use for the 175-176 HCNs that are next in line.

However, if you'd like to just toss something together roughly for me at the 175 level, I could see how a 175/176 compares to the 168 I'm currently finishing. You could maybe slant it toward my large siever count vs. single LA machine.

I wish I kept better notes! I'm pretty sure I found that I could still use mpi to run msieve LA across two machines, if one didn't have enough memory. I only have gigabit currently and am pretty sure that caused a bit of slow-down, but wasn't excessive for just two nodes. That may very well only be "wishful thinking," though.

2020-04-07, 02:55   #4
VBCurtis

"Curtis"
Feb 2005
Riverside, CA

10001011100102 Posts

Quote:
 Originally Posted by EdH However, if you'd like to just toss something together roughly for me at the 175 level, I could see how a 175/176 compares to the 168 I'm currently finishing. You could maybe slant it toward my large siever count vs. single LA machine.
This is just the sort of thing I'd be happy to do for you.

175 is a size where CADO chooses I=14, but I think A=28 or maybe I=15 are better choices. However, A=28 uses twice the memory of I=14, and I=15 is double again. Do you have enough RAM to run I=15? I think it's around 2.5GB per process, and you can choose the number of threads per process on the client command line with "--override t 4" to run, e.g. 4-threaded.

A=28 should be under 1.5GB per process; but I think you recently mentioned you are running an older CADO that doesn't recognise A... so we're choosing between I=14 and 15?

2020-04-07, 03:11   #5
EdH

"Ed Hall"
Dec 2009

5×13×53 Posts

Quote:
 Originally Posted by VBCurtis This is just the sort of thing I'd be happy to do for you. 175 is a size where CADO chooses I=14, but I think A=28 or maybe I=15 are better choices. However, A=28 uses twice the memory of I=14, and I=15 is double again. Do you have enough RAM to run I=15? I think it's around 2.5GB per process, and you can choose the number of threads per process on the client command line with "--override t 4" to run, e.g. 4-threaded. A=28 should be under 1.5GB per process; but I think you recently mentioned you are running an older CADO that doesn't recognise A... so we're choosing between I=14 and 15?
I think almost all of my machines are at least 6GB, with the largest three maxed out at 16GB. And, I'm currently having the machines run one instance with full threads under that instance. I have a client script that gets the CPU count and uses that to determine the override.

 2020-04-07, 17:17 #6 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 5·13·53 Posts The 176 digit job is running with the modified params. I always find the first ETA returned, entertaining: Code:  Info:Lattice Sieving: Marking workunit c175_sieving_650000-660000 as ok (0.0% => ETA Mon Oct 19 20:46:32 2020)
 2020-04-07, 18:08 #7 VBCurtis     "Curtis" Feb 2005 Riverside, CA 105628 Posts For a job big enough to produce a 20M+ matrix, failing to build at TD100 is an indication that more relations are needed. It's not that density 100 will shrink the matrix some magical amount, rather that having enough relations to build at 100 or 110 will also be enough relations to shrink the matrix another 10% or so. As a guess, 5% more relations would shrink the matrix 10%. Not usually a great tradeoff, but when you have a sieving farm it's surely a plus for you! A note on the GNFS175 file: Qmin is so small that ETA will start out about 2/3rds of what it will actually take to gather the relations. The sievers are really efficient at small Q, at the cost of some extra duplicates and CADO making empty promises about a fast job.
 2020-04-08, 14:39 #8 EdH     "Ed Hall" Dec 2009 Adirondack Mtns D7516 Posts Some CADO-NFS questions and experiment points I'm pondering - any thoughts are quite welcome: 1. CADO-NFS provides a summary when it completes. I would typically abort the CADO-NFS LA stage, if the msieve LA is successfully running with a substantially ealier ETA. Should I let this one finish via CADO-NFS to provide a full dataset for you? 2. For the previous SNFS job, I was able to invoke a duplicate server on a Colab instance to sieve the area from 100k-150k and add those relations to the set provided to msieve. But, this used a provided polynomial. For this GNFS job, what CADO-NFS files would I need to use to invoke a parallel CADO-NFS server in a Colab instance in a similar manner to before? Would I need to use more than the snapshot file (modified for the qmin)? 3. I am toying with the idea of using a(n) RPi as a proxy for a CADO-NFS client. Is there a way I can have a machine (RPi or another incapable of meeting the memory requirements) act as a go between with a Colab instance? Basically, I want to have the RPi be seen as the client, which picks up the WUs and reassigns them to the Colab instance, then retrieves the WU results and uploads them to the server. I can copy files between the RPi and the Colab instance, but I can't run the Colab instance as a client to my server. (I actually don't want to open the server to machines outside my LAN.)
 2020-04-08, 17:37 #9 VBCurtis     "Curtis" Feb 2005 Riverside, CA 2·7·11·29 Posts 1. This only matters if we plan to iterate multiple factorizations of similar size to compare various params settings; otherwise, your timing data doesn't tell us much since there is little to compare to. If you have some elapsed (e.g. wall clock) time for the C168ish you did with the default CADO file, we can see if my C175 file did better than the observed double-every-5.5-digits typical on CADO. So, I wouldn't bother letting it finish, but I would try to record CADO's claim of sieve time from right before it enters filtering. 2. I believe you need to give it the poly also; either the .poly file in the same folder (which the snapshot should reference), or by explicitly declaring the poly the same way you did for the SNFS poly (tasks.poly = {polyfilename}, if I recall). Either way, you'll need to copy the poly file to the colab instance. 3. Far beyond my pay-grade in networking nor CADO knowledge, sorry.
2020-04-08, 18:52   #10
charybdis

Apr 2020

1618 Posts

Quote:
 Originally Posted by VBCurtis If quotes, no hyphen. EDIT: Actually, no hyphen at all for that setting. The biggest mystery for a C175 file is the number of relations to target; but since you have an army of sievers and not much matrix power, I put the relations count a fair bit higher than I think strictly necessary. I think you could get away with 250M relations, but 270 should make a much nicer matrix.
I've been following this discussion a bit and I'd like to do one of the homogeneous Cunningham c177s with your parameters (and ~160 cores for sieving), but to give a bit of variety I'm thinking of doing it with A=28. Are there any other changes I ought to make to compensate for the smaller sieve region?

2020-04-08, 19:18   #11
VBCurtis

"Curtis"
Feb 2005
Riverside, CA

10001011100102 Posts

Quote:
 Originally Posted by charybdis I've been following this discussion a bit and I'd like to do one of the homogeneous Cunningham c177s with your parameters (and ~160 cores for sieving), but to give a bit of variety I'm thinking of doing it with A=28. Are there any other changes I ought to make to compensate for the smaller sieve region?
I think A=28 is optimal for this size, but this is just a guess really- I'm happy to hear you'll try it!
Here's what I would change, and why:
The duplicate rate is often a bit higher when using a smaller siever, so you may need more than 270M relations. I estimate a matrix would build for Ed on I=15 with 250M, and I added 20M because he uses a farm to sieve but his "main" machine isn't very fast so he is willing to sacrifice some sieve time to reduce matrix time. Our experience with ggnfs sievers is that 10-15% relations are needed on 14e vs 15e; since A=28 is halfway in between, we can guess 5-8% more relations will be needed. 8% more than 250M is 270M, so if you don't mind a long-ish matrix you could leave it at 270M. I would personally choose 285M for A=28, and see what happens.

If yield isn't very good, you can relax the lambda settings a bit, like 0.01 each. This will increase the relations required, though- those complicated interactions between lambda/sieve speed/relations needed are why I do 8-10 factorizations at a given size before publishing parameters.

I would also increase each lim by 15% or so, say to lim0=105M and lim1= 160M. I don't have a good reason for this, other than that ggnfs sievers see yield fall off markedly when Q > 2 * lim0. Even with CADO, I have found that choosing lim's such that Q sieved does not exceed lim1 is always faster than otherwise (where "always" is for all tests below 160 digits). I believe Ed's I=15 job should finish when Q is in the 100-130M range. Using A=28 will need roughly 50% more Q, 150-190M as final Q. So I'm suggesting lim1 equal to my guess at final Q; note that since you're doing C177 rather than C175, you might add another 10% to both lim's, to e.g. 115M and 175M.

Larger lim's improve yield (relations per Q-range) at the expense of a little speed.

Finally, 2 extra digits of difficulty is about 25% harder, so I'd add 25% to poly select: Change admax from 12e5 to 15e5.

If you'd like to contribute to refining these parameters going forward, I'd like to know the final Q sieved, the number of relations you generated (that is, the rels_wanted you chose), the number of unique relations, and the matrix size (total weight is a proxy for size, but it's nice to have both row count and total weight). Timing info is only useful if you plan to factor multiple numbers with your setup- obviously, if you do a second one of similar size, say within 3 digits, we can compare the timings and conclude which params were better.

Good luck!

 Similar Threads Thread Thread Starter Forum Replies Last Post enzocreti enzocreti 1 2020-03-03 18:38 tuckerkao Miscellaneous Math 2 2020-02-16 06:23 Nick Puzzles 9 2013-02-13 17:17 vsuite GPU Computing 11 2011-02-02 04:47 Corbyguy Software 3 2008-06-09 18:09

All times are UTC. The time now is 15:11.

Tue Nov 24 15:11:13 UTC 2020 up 75 days, 12:22, 4 users, load averages: 1.68, 1.83, 2.07