mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   CADO-NFS (https://www.mersenneforum.org/forumdisplay.php?f=170)
-   -   CADO NFS (https://www.mersenneforum.org/showthread.php?t=11948)

joral 2009-06-02 19:49

No, I thought the working directory needed to previously exist, so it was there, but empty. This then showed up when running bwc.pl with :complete.

Thanks for the information. As I get further along I may have more errors to ask about.

fivemack 2009-06-02 22:11

I don't quite understand the benchmark output
 
OK: I'm using the command line

[code]
% /home/nfsslave2/cado/cado-nfs-20090528-r2167/build/cow/linalg/bwc/u128_bench snfs.small -impl bucket
[/code]

and getting plausible diagnostics rather than error messages. Trying all the *_bench tools, deleting snfs.small-bucket.bin between runs:

u128
T0 snfs.small: 5582216 rows 5582056 cols 259518490 coeffs
22 iterations in 101s, 4.61/1, 17.77 ns/coeff

u64
T0 snfs.small: 5582216 rows 5582056 cols 259518490 coeffs
38 iterations in 102s, 2.67/1, 10.29 ns/coeff

u64k says 'T0 : Check failed Aborted'

u64n also says this.

I assume u128 would want to do exactly half as many iterations as u64, so would be quicker in total; should I be getting a 'k' or 'n' parameter to u64k or u64n in some way?

If u128 does 5582216/128 iterations, the total runtime would be ~200k seconds, which seems pretty good since msieve lanczos took 108242 wall-time seconds with four threads - but I'm not sure whether there's not another factor two hiding somewhere in the block Wiedemann algorithm.

So, time to try threading.

[code]
/home/nfsslave2/cado/cado-nfs-20090528-r2167/build/cow/linalg/balance --in snfs.small --out cabbage --nslices 2x2 --ramlimit 8G
[/code]

gives me a message 'Matrix has more rows than columns \n Perhaps the matrix should have been transposed first', and produces cabbage.row_perm, cabbage.col_perm and cabbage.h[01].v[01]. Then

[code]
taskset 0f /home/nfsslave2/cado/cado-nfs-20090528-r2167/build/cow/linalg/bwc/u128_bench -impl bucket -nthreads 4 -- cabbage.h0.v0 cabbage.h1.v0 cabbage.h0.v1 cabbage.h1.v1
[/code]

runs with occasionally 400% CPU and says

19 iterations in 102s, 5.35/1, 20.62 ns/coeff

Does this mean that threads are treading on one another's toes and four threads are slower than one, or that each thread has done 19 iterations in 102 seconds for a total speed of effectively 5.16 ns/coeff ?

joral 2009-06-02 23:46

Ok. A little farther.

Now I have had the following:

Computing trsp(x)*M^100
..........Warning: Doing many iterations with bad code
Warning: Doing many iterations with bad code
Warning: Doing many iterations with bad code
Warning: Doing many iterations with bad code

Then a little later...

Failed check at iteration 100
/cado-nfs/linalg/bwc/u64_krylov: exited with status 1

Tried with a different seed, and it failed at iteration 1900.

I know I had trouble with the msieve version of block lanczos if the matrix was too sparse, I believe it was. Is there a similar condition here which could cause it to fail?

thome 2009-06-03 08:45

[quote=fivemack;175686]OK: I'm using the command line

[code]
% /home/nfsslave2/cado/cado-nfs-20090528-r2167/build/cow/linalg/bwc/u128_bench snfs.small -impl bucket
[/code]and getting plausible diagnostics rather than error messages. Trying all the *_bench tools, deleting snfs.small-bucket.bin between runs:

u128
T0 snfs.small: 5582216 rows 5582056 cols 259518490 coeffs
22 iterations in 101s, 4.61/1, 17.77 ns/coeff

u64
T0 snfs.small: 5582216 rows 5582056 cols 259518490 coeffs
38 iterations in 102s, 2.67/1, 10.29 ns/coeff
[/quote]

ok -- which kind of cpu is this ? These figures seem a bit large.

[quote]
u64k says 'T0 : Check failed Aborted'

u64n also says this.
[/quote]bug. u64k and u64n make little sense for benches, but that's definitely a bug. I'll try to reproduce it.

[quote]
I assume u128 would want to do exactly half as many iterations as u64, so would be quicker in total; should I be getting a 'k' or 'n' parameter to u64k or u64n in some way?
[/quote]For information, the k in u64k is hard-coded (anyway this code is never used). Setting n for u64n_bench is done with --nbys=128 (for n=2).

[quote]
If u128 does 5582216/128 iterations, the total runtime would be ~200k seconds, which seems pretty good since msieve lanczos took 108242 wall-time seconds with four threads - but I'm not sure whether there's not another factor two hiding somewhere in the block Wiedemann algorithm.
[/quote]N/m+N/n+N/n -- so three times as much. But I wonder. Your timings exceed what I get normally, so perhaps there's something wrong somewhere. Was your matrix transposed ? If not, i.e. if relation-sets are rows and ideals are columns, then you should use the -t option to the bench program, otherwise the matrix gets organized the wrong way around.

[quote]
So, time to try threading.

[code]
/home/nfsslave2/cado/cado-nfs-20090528-r2167/build/cow/linalg/balance --in snfs.small --out cabbage --nslices 2x2 --ramlimit 8G
[/code]gives me a message 'Matrix has more rows than columns \n Perhaps the matrix should have been transposed first',
[/quote]This warning is innocuous cruft, since bwc tools now properly handle matrices in both directions -- although this hints at the fact the arguments you've tried don't direct them to do so.

[quote]
and produces cabbage.row_perm, cabbage.col_perm and cabbage.h[01].v[01]. Then

[code]
taskset 0f /home/nfsslave2/cado/cado-nfs-20090528-r2167/build/cow/linalg/bwc/u128_bench -impl bucket -nthreads 4 -- cabbage.h0.v0 cabbage.h1.v0 cabbage.h0.v1 cabbage.h1.v1
[/code]runs with occasionally 400% CPU and says

19 iterations in 102s, 5.35/1, 20.62 ns/coeff

Does this mean that threads are treading on one another's toes and four threads are slower than one, or that each thread has done 19 iterations in 102 seconds for a total speed of effectively 5.16 ns/coeff ?[/quote]The number of seconds here (102, 5.35) is cpu, not wct. So four threads do effectively one iteration every 1.34s wct, which isn't exactly 4times better than 1thread, but relatively acceptable. Threads do tread on one another's toes indeed, because of the memory access penalties. Since the penalty is not large here, I suppose you have opterons maybe.

E.

thome 2009-06-03 08:49

[quote=joral;175689]Ok. A little farther.

Now I have had the following:

Computing trsp(x)*M^100
..........Warning: Doing many iterations with bad code
Warning: Doing many iterations with bad code
Warning: Doing many iterations with bad code
Warning: Doing many iterations with bad code
[/quote]

Normal for the u64_secure program. It effectively does transposed multiplications, which are somewhat slower.

[quote]
Then a little later...

Failed check at iteration 100
/cado-nfs/linalg/bwc/u64_krylov: exited with status 1

Tried with a different seed, and it failed at iteration 1900.
[/quote]That's a problem. The fact that it doesn't even deterministically fails suggest that perhaps your RAM could be accused, but I wouldn't conclude that too soon.

Care for sharing your matrix ?

[quote]
I know I had trouble with the msieve version of block lanczos if the matrix was too sparse, I believe it was. Is there a similar condition here which could cause it to fail?[/quote]If it's very sparse, and if I got padding coeffs wrong in some corner case, maybe, but I doubt it.

E.

thome 2009-06-03 10:52

[quote=fivemack;175686]
u64k says 'T0 : Check failed Aborted'

u64n also says this.
[/quote]

Now fixed. Thanks.

thome 2009-06-03 10:53

[quote=joral;175689]Computing trsp(x)*M^100
..........Warning: Doing many iterations with bad code
Warning: Doing many iterations with bad code
Warning: Doing many iterations with bad code
Warning: Doing many iterations with bad code
[/quote]

This warning no longer appears (yes, there's a new tarball).

E.

joral 2009-06-03 10:56

[QUOTE]The fact that it doesn't even deterministically fails suggest that perhaps your RAM could be accused, but I wouldn't conclude that too soon.[/QUOTE]

I'm going to run some more tests to be sure, but as I recall it is deterministic in this:

If I leave the seed parameter unchanged, it always fails at the same iteration.

It's about a 280 Mb matrix file ungzipped, so I'll see what it compresses to and where I can put it.

fivemack 2009-06-03 11:12

The machine I'm doing the benchmarks on is a single-socket 2.66GHz Core i7 (256k 10-cycle L2 cache per core + 8192k 19-cycle L3 cache per four cores + 12G DDR3/8500); I am a little surprised that I don't have to give a load of cache parameters to bench, if it's running one thread blocking for the 256k cache rather than the 8192k one then I could understand it being a bit slow.

Will try more sensible benchmarks (correct transpose parameters, trying 1x4 2x2 4x1 decompositions on four cores and 1x8 2x4 4x2 8x1 decompositions on eight-threads-on-four-cores) with new tarball tonight; I've left a make-matrix-from-relations job running today on a set of relations from a very large SNFS job, and will mention if that falls over in interesting ways. It's using an awful lot of memory (17G vsize, 10G rsize), but I have an awful lot of memory and a fast swap disc.

jasonp 2009-06-03 13:28

In case it becomes an issue: the latest GGNFS lattice sievers do not print all the factors of relations; they skip multiplicity beyond 1 and skip printing factors smaller than 1000, so that both of these have to be rediscovered by any relation-reading code.

fivemack 2009-06-03 17:13

Benchmark with -t fails entirely
 
I issue the command

[code]
nfsslave2@cow:/scratch/fib1039/with-cado$ /home/nfsslave2/cado/cado-nfs-20090603-r2189/build/cow/linalg/bwc/u128_bench -t --impl bucket snfs.small
[/code]

and it produces a lot of output at the 'large' level before failing with

[code]
Lsl 56 cols 3634827..3699734 w=778884, avg dj=7.2, max dj=34365, bucket hit=1/1834.7-> too sparse
Switching to huge slices. Lsl 56 to be redone
Flushing 56 large slices
Hsl 0 cols 3634827..5582056 (30*64908) ..............................
w=16383453, avg dj=0.3, max dj=29376, bucket block hit=1/10.2
u128_bench: /home/nfsslave2/cado/cado-nfs-20090603-r2189/linalg/bwc/matmul-bucket.cpp:610: void split_huge_slice_in_vblocks(builder*, huge_slice_t*, huge_slice_raw_t*, unsigned int): Assertion `(n+np)*2 == (size_t) (spc - sp0)' failed.
Aborted
[/code]

The enormous filtering run got terminated by something that kills SSH sessions that have produced no output for ages, will try that again.


All times are UTC. The time now is 19:41.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.