mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Factoring (https://www.mersenneforum.org/forumdisplay.php?f=19)
-   -   TARGET_DENSITY: some results (https://www.mersenneforum.org/showthread.php?t=13493)

fivemack 2010-06-05 11:16

TARGET_DENSITY: some results
 
This is using the 89999_243 dataset, which is an LP-31 SNFS job sieved by RSALS

There is a 'full dataset' of 226.9 million relations, and a reduced one of 200 million.

I ran filtering with target density 70, 80, 90, 100, 110 (the -nc1 runs took between 12 and 17 kiloseconds); for each run, I let linear algebra run for two hours on four threads on an otherwise-idle 8G Phenom 9850 and then stopped it and extrapolated the runtime from the rate at which dimensions were being finished. 'matrixW' is the width of the final matrix, IE the number of dimensions that the linear algebra has to complete.

[code]
|T_D|matrixW | sparsewgt| est time|
| 70|13782379| 898135569|540 hours|
| 80|13269095| 971597713|504 hours|
| 90|12763909|1039891026|494 hours|
|100|12347739|1105886801|491 hours|
|110|Matrix production failed: repeated 'matrix is too sparse, restarting'|
[/code]

With the smaller dataset, I used densities 70, 80 and 90:

[code]
|T_D|matrixW | sparsewgt| est time|
| 70|15136972| 990865204|671 hours|
| 80|14405806|1069523450|646 hours|
| 90|13822424|1146559127|616 hours|
[/code]

So this sounds as if, for large jobs with msieve linear algebra on multi-threaded machines, the largest density for which a matrix can be produced is the one to go for, even though the width*weight estimate for runtime would suggest otherwise.

For large/100, small/80 and small/90, the matrix-building step caused the 8G machine to run out of memory; so I ran -nc2 on a 32G machine, and was then able to run -ncr on the 8G machine.

[b]a smaller case[/b]

9282.773: C130 gnfs, 27-bit large primes, about 25% oversieved (15.4M relations, 13.2M unique). linalg with -t4 on i7, run for half an hour then kill
[code]
| 70|1084593|70721494|4901s|
| 80|1036739|76577238|5025s|
| 90| 998788|82190443|5026s|
|100| 966867|87749244|5126s|
|110| 940926|92911103|5265s|
|120|too few cycles, matrix probably cannot build|
[/code]

So for that case 70 is the right answer

[b]Further research[/b]

I'm working on getting some more results using the M941 dataset, though it's a lot bigger (420.6 million unique relations) and so the runs will take a good deal longer; but I can do two (but not three) -nc1 at a time on the 32G machine. It's much more oversieved, so it should be possible to push the target density higher; I will be doing the linear algebra on the i7, though that does restrict the matrix size to fitting in 12G.

fivemack 2010-06-17 09:18

M941 TARGET_DENSITY results
 
This thoroughly oversieved dataset could be pushed up to density 130. The running matrices still fit in 12GB, though not in 8GB; the construction of the matrices takes less than 16GB but not much less. Density 150 doesn't produce a matrix.

(density / final matrix size / sparse weight / estimated hours to do job on i7 -t4)

[code]
| 70|23116648 x 23116873|1506911489|991|
| 90|21207898 x 21208124|1748363936|956|
|110|19922444 x 19922669|1978439421|936|
|130|18982850 x 18983075|2197143316|923|
[/code]

Nothing very new here; it would seem to make sense on projects of this sort of scale to sieve until a density-110 matrix can be produced and then run that. Estimating the minimum number of relations needed to make density-110 work is going to be a bit tiresome since each filtering run takes 48 hours.

jasonp 2010-06-17 12:09

This is really interesting. The failures at the higher target densities are probably because there is a hard maximum of 28 relations that can appear in one matrix column; more than that and the column is deleted during the merge phase. I suppose with a high enough density most of the matrix winds up getting deleted.

Batalov 2010-06-17 16:09

Great stuff! Thank you for examining really high densities and discovering the thresholds after which this effect starts to matter.
I've been using density 90 for several last 15M+ matrices but have not looked higher or compared to the baseline.

For conventional matrices (especially matrices exported from other people, .dat-less, density 70), those that fit e.g. 4.4Gb, I also use another accelerator -- block-size of 172032; both tricks lead to more memory footprint, so they cannot be used together. It appears that both tricks achieve a similar effect - from the processor's point of view, more fast data is within reach, in the cache, so less latency penalty is incurred and iterations run faster. Am I interpreting this right?


All times are UTC. The time now is 15:40.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.