mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > CADO-NFS

Reply
 
Thread Tools
Old 2009-06-05, 17:21   #56
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

23·3·5·72 Posts
Default

Quote:
Originally Posted by joral View Post
Ok... Either I take my computer back to 1GB or I go buy a new memory stick. Ran memtest overnight, and it picked up about 808 bit errors right around the 1.5GB mark in 6 passes through.

Good call, though bad for me...
worthwhile knowing though
henryzz is offline   Reply With Quote
Old 2009-06-05, 17:40   #57
joral
 
joral's Avatar
 
Mar 2008

5·11 Posts
Default

Makes me wonder how long this has been going on.

Jason, maybe you can comment on this. Is there a good chance that a single bit error of this type will lead to 'submatrix not invertible' errors trying to run the msieve block lanczos code?
joral is offline   Reply With Quote
Old 2009-06-05, 17:56   #58
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

1101110101012 Posts
Default

Quote:
Originally Posted by joral View Post
Jason, maybe you can comment on this. Is there a good chance that a single bit error of this type will lead to 'submatrix not invertible' errors trying to run the msieve block lanczos code?
Serge has run into bad memory causing these errors, but they only seem to appear for really big jobs. I don't know if a single bit getting flipped is enough to ruin the entire run (instead of only ruining one dependency), but my guess is that a big linear algebra run pushes the bus really hard and causes memory access with marginal timing to behave incorrectly. The more efficient the code the harder the bus gets pushed, so this says nice things about the level of optimization in the CADO code :)
jasonp is offline   Reply With Quote
Old 2009-06-05, 18:24   #59
joral
 
joral's Avatar
 
Mar 2008

3716 Posts
Default

Hrmm. I wonder if my MB is one which allows me to control RAM timing. Try stepping it down a touch.

It wasn't just a single bit, but they also weren't evenly spaced.

All of them were 'expected FBFFFFFFFFFFFFFF Got FFFFFFFFFFFFFFFF'
joral is offline   Reply With Quote
Old 2009-06-06, 12:11   #60
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

72·131 Posts
Default Wow, much better performance

I don't know what you did, but it's now twice as fast single-threaded, and faster multi-threaded: I'm now getting

Code:
 /home/nfsslave2/cado/cado-nfs-20090605-r2202/build/cow/linalg/bwc/u128_bench -t -impl bucket snfs.small

40 iters in 102s, 2.55/1, 9.81 ns/c (last 10 : 2.48/1, 9.57 ns/c)
and, running

Code:
for u in 1x2 2x1 1x4 2x2 4x1 1x8 2x4 4x2 8x1; do /home/nfsslave2/cado/cado-nfs-20090605-r2202/build/cow/linalg/balance --in snfs.small.T --out slice$u --nslices $u --ramlimit 8G; done

for u in 1x2 2x1; do taskset 03 /home/nfsslave2/cado/cado-nfs-20090605-r2202/build/cow/linalg/bwc/u128_bench --impl bucket -nthreads 2 -- slice$u.[hv]*; done

for u in 1x4 2x2 4x1; do taskset 0f /home/nfsslave2/cado/cado-nfs-20090605-r2202/build/cow/linalg/bwc/u128_bench --impl bucket -nthreads 4 -- slice$u.[hv]*; done

for u in 1x8 2x4 4x2 8x1; do taskset ff /home/nfsslave2/cado/cado-nfs-20090605-r2202/build/cow/linalg/bwc/u128_bench --impl bucket -nthreads 8 -- slice$u.[hv]* 2>&1 | tee $u.b; done
the timings are (4 threads distributed over 4 physical cores for 4-thread totals, 8 threads over 4 physical cores for 8-thread totals)

Code:
1x2 38 iters in 103s, 2.70/1, 10.40 ns/c (last 10 : 2.63/1, 10.13 ns/c)        
2x1 35 iters in 100s, 2.86/1, 11.03 ns/c (last 10 : 2.78/1, 10.72 ns/c)        

1x4 23 iters in 102s, 4.45/1, 17.14 ns/c (last 10 : 4.26/1, 16.42 ns/c)        
2x2 23 iters in 104s, 4.52/1, 17.41 ns/c (last 10 : 4.33/1, 16.68 ns/c)        
4x1 21 iters in 104s, 4.96/1, 19.11 ns/c (last 10 : 4.73/1, 18.24 ns/c)        

1x8 10 iters in 108s, 10.84/1, 41.77 ns/c (last 10 : 9.86/1, 37.98 ns/c)
2x4 10 iters in 105s, 10.47/1, 40.36 ns/c (last 10 : 9.52/1, 36.69 ns/c)
4x2 10 iters in 110s, 10.96/1, 42.23 ns/c (last 10 : 9.96/1, 38.39 ns/c)
8x1 8 iters in 102s, 12.69/1, 48.91 ns/c (last 10 : 50.32/1, 193.89 ns/c)
fivemack is offline   Reply With Quote
Old 2009-06-06, 20:52   #61
joral
 
joral's Avatar
 
Mar 2008

5×11 Posts
Default

So I am now constrained to 1 GB RAM until I can find a reasonable price for DDR DIMMs. But it now works. At least one of the 1GB DIMMS I had were unreliable.
joral is offline   Reply With Quote
Old 2009-06-06, 22:25   #62
joral
 
joral's Avatar
 
Mar 2008

5·11 Posts
Default

Finally made it to the sqrt phase, but I'm getting this one now when running allsqrt.

Odd valuation! At rational prime 731261

I'm almost tempted to completely trash this run and restart it with known good RAM. It's a 167 digit SNFS from the near repdigit lists. That should still fit in 1GB.
joral is offline   Reply With Quote
Old 2009-06-08, 03:36   #63
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

2×34×13 Posts
Default

Following Tom's lead, here are the benchmarks using Tom's matrix on the 8-cpu quadcore 2GHz Opteron Barcelona machine. There is another program running, so I only had 16 cores to work with.

Code:
u64_bench:
16 iters in 106s, 6.61/1, 25.47 ns/c (last 10 : 6.24/1, 24.04 ns/c)

u128_bench:
7 iters in 101s, 14.47/1, 55.76 ns/c (last 10 : 54.58/1, 210.32 ns/c)

Using u64:

1x2: 10 iters in 106s, 10.63/1, 40.97 ns/c (last 10 : 9.65/1, 37.20 ns/c)        
2x1: 10 iters in 101s, 10.12/1, 38.98 ns/c (last 10 : 9.21/1, 35.50 ns/c)        

1x4: 6 iters in 101s, 16.89/1, 65.08 ns/c (last 10 : 61.01/1, 235.09 ns/c)        
2x2: 7 iters in 112s, 16.02/1, 61.72 ns/c (last 10 : 60.34/1, 232.50 ns/c)        
4x1: 7 iters in 104s, 14.90/1, 57.41 ns/c (last 10 : 57.48/1, 221.47 ns/c)        

1x8: 7 iters in 103s, 14.78/1, 56.95 ns/c (last 10 : 60.20/1, 231.98 ns/c)        
2x4: 6 iters in 106s, 17.61/1, 67.87 ns/c (last 10 : 58.28/1, 224.57 ns/c)        
4x2: 6 iters in 105s, 17.44/1, 67.21 ns/c (last 10 : 59.67/1, 229.94 ns/c)        
8x1: 6 iters in 106s, 17.69/1, 68.16 ns/c (last 10 : 74.09/1, 285.51 ns/c)        

2x8: 3 iters in 111s, 36.84/1, 141.97 ns/c (last 10 : 80.81/1, 311.40 ns/c)        
4x4: 3 iters in 103s, 34.24/1, 131.95 ns/c (last 10 : 75.74/1, 291.83 ns/c)        
8x2: 4 iters in 115s, 28.82/1, 111.03 ns/c (last 10 : 85.86/1, 330.84 ns/c)  


Using u128:

1x2: 4 iters in 107s, 26.70/1, 102.90 ns/c (last 10 : 67.94/1, 261.80 ns/c)        
2x1: 4 iters in 113s, 28.20/1, 108.68 ns/c (last 10 : 67.73/1, 260.98 ns/c)        

1x4: 3 iters in 110s, 36.54/1, 140.79 ns/c (last 10 : 78.34/1, 301.85 ns/c)        
2x2: 3 iters in 116s, 38.65/1, 148.93 ns/c (last 10 : 76.51/1, 294.82 ns/c)        
4x1: 4 iters in 113s, 28.29/1, 109.02 ns/c (last 10 : 75.04/1, 289.15 ns/c)        

1x8: 3 iters in 125s, 41.63/1, 160.41 ns/c (last 10 : 88.00/1, 339.08 ns/c)        
2x4: 3 iters in 118s, 39.45/1, 152.00 ns/c (last 10 : 77.86/1, 300.03 ns/c)        
4x2: 3 iters in 118s, 39.18/1, 150.96 ns/c (last 10 : 77.40/1, 298.25 ns/c)        
8x1: 3 iters in 109s, 36.48/1, 140.56 ns/c (last 10 : 84.80/1, 326.74 ns/c)        

2x8: 2 iters in 123s, 61.31/1, 236.26 ns/c (last 10 : 111.50/1, 429.66 ns/c)        
4x4: 2 iters in 118s, 59.12/1, 227.79 ns/c (last 10 : 99.00/1, 381.47 ns/c)        
8x2: 2 iters in 118s, 58.88/1, 226.90 ns/c (last 10 : 102.75/1, 395.91 ns/c)
As you can see, the times that I am seeing are significantly slower than those on Tom's i7. Also, I'm getting decent scaling up to 8 threads. The jump from 8 to 16 threads, however, doesn't scale as well.

In other news, before today I haven't been able to get a successful factorization using relations from the GGNFS sievers. Today, I rewrote verify.c to output all factors of the norms, including those below 1000 and multiplicity, and even sorted them for good measure. This worked! I'm now going to see what I can remove (starting with sorting) and still get a valid factorization.
frmky is offline   Reply With Quote
Old 2009-06-08, 06:48   #64
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

40728 Posts
Default

Encouraged by the successful run today, I thought I might try it on the cluster using MPI. But, no. This is an example that ran fine with MPI off.

Code:
#############################################################################
../bin/linalg/bwc/./lingen nullspace=left wdir=bwc mm_impl=bucket thr=1x1 interval=100 mpi=4x4 seed=1 mn=64 interleaving=0 --lingen-threshold 64
# (exported) ../bin/linalg/bwc/./lingen nullspace=left wdir=bwc mm_impl=bucket thr=1x1 interval=100 mpi=4x4 seed=1 mn=64 interleaving=0 --lingen-threshold 64
# Compiled with gcc 4.3.2
# Compilation flags -O3 -funroll-loops -DNDEBUG -std=c99 -g -W -Wall
Reading scalar data in polynomial ``a''
Using A(X) div X in order to consider Y as starting point
../bin/linalg/bwc/./lingen: died with signal 11, without coredump
frmky is offline   Reply With Quote
Old 2009-06-08, 21:57   #65
thome
 
May 2009

2×11 Posts
Default

Quote:
Originally Posted by frmky View Post
Encouraged by the successful run today, I thought I might try it on the cluster using MPI. But, no. This is an example that ran fine with MPI off.

Code:
[...]
../bin/linalg/bwc/./lingen: died with signal 11, without coredump
Strange.

Could you please try to recompile with -O0 -g and provide a backtrace ? To do so you need to create/modify a file named local.sh at the root of the cado tree, to contain:
CFLAGS="-O0 -g"
CXXFLAGS="-O0 -g"

(the -O0 is there because some mpi versions have a tendency to put -O2 no matter what).

Then do ``make cmake'', then ``make -j8''

Then just gdb --args <complete lingen command line>, and wait until it catches the signal. Type ``bt'' at the gdb prompt.

I'm also interested by the ls -l output of your directory. You can send it by e-mail if you prefer.

E.
thome is offline   Reply With Quote
Old 2009-06-10, 02:54   #66
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

83A16 Posts
Default

Perhaps somewhat annoyingly, it got further when compiled with "-O0 -g":

Code:
../bin/linalg/bwc/./lingen nullspace=left wdir=bwc mm_impl=bucket thr=1x1 interval=100 mpi=4x4 seed=1 mn=64 interleaving=0 --lingen-threshold 64


al=100 mpi=4x4 seed=1 mn=64 interleaving=0 --lingen-threshold 64
# (exported) ../bin/linalg/bwc/./lingen nullspace=left wdir=bwc mm_impl=bucket thr=1x1 interval=100 mpi=4x4 seed=1 mn=64 interleaving=0 --lingen-threshold 64
# Compiled with gcc 4.3.2
# Compilation flags -g -O0 -std=c99 -g -W -Wall
Reading scalar data in polynomial ``a''
Using A(X) div X in order to consider Y as starting point
Computing t0
[X^0] A, col 63 increases rank to 64 (head row 62)
Found satisfying init data for t0=1
written F_INIT_QUICK to disk
t0 = 1
Computing value of E(X)=A(X)F(X) (degree 7098) [ +O(X^7099) ]
Throwing out a(X)
E: 7098 coeffs, t=1
57      [7]: 5.8/748.6                      [7+]: 5.8/748.6 (1%)       
112     [7]: 11.5/737.6                     [7+]: 11.5/737.6 (2%)      
112     [6,6+]: 0.7/42.2,779.8              [6+]: 12.2/779.8 (2%)      
168     [7]: 17.4/740.7                     [6+]: 18.0/783.0 (2%)      
223     [7]: 23.0/737.1                     [6+]: 23.7/779.3 (3%)      
223     [6,6+]: 1.3/42.2,779.3              [6+]: 24.4/779.3 (3%)      
223     [5,5+]: 1.4/43.7,823.0              [5+]: 25.7/823.0 (3%)      
279     [7]: 28.9/739.5                     [5+]: 31.6/825.4 (4%)      

...  editing out a bunch of lines ...

6323    [7]: 656.4/737.0                    [1+]: 905.7/1065.0 (85%)   
6323    [6,6+]: 37.8/42.5,779.5             [1+]: 906.4/1065.0 (85%)   
6379    [7]: 662.2/737.1                    [1+]: 912.2/1065.1 (86%)   
6434    [7]: 667.9/737.0                    [1+]: 917.9/1065.0 (86%)   
6434    [6,6+]: 38.5/42.5,779.5             [1+]: 918.5/1065.0 (86%)   
6434    [5,5+]: 39.2/43.2,822.7             [1+]: 919.9/1065.0 (86%)   
6490    [7]: 673.7/737.0                    [1+]: 925.7/1065.0 (87%)   
6545    [7]: 679.4/736.9                    [1+]: 931.4/1064.9 (87%)   
6545    [6,6+]: 39.2/42.5,779.4             [1+]: 932.0/1064.9 (88%)   
6601    [7]: 685.2/737.0                    [1+]: 937.9/1065.0 (88%)   
6656    [7]: 690.9/736.9                    [1+]: 943.5/1064.9 (89%)   
6656    [6,6+]: 39.8/42.5,779.4             [1+]: 944.2/1064.9 (89%)   
6656    [5,5+]: 40.5/43.2,822.6             [1+]: 945.5/1064.9 (89%)   
6656    [4,4+]: 29.7/31.7,854.3             [1+]: 947.5/1064.9 (89%)   
6712    [7]: 696.7/737.0                    [1+]: 953.4/1065.0 (90%)   
6767    [7]: 702.4/736.9                    [1+]: 959.0/1064.9 (90%)   
6767    [6,6+]: 40.5/42.5,779.4             [1+]: 959.7/1064.9 (90%)   
6823    [7]: 708.2/737.0                    [1+]: 965.5/1065.0 (91%)   
6878    [7]: 713.9/736.9                    [1+]: 971.2/1064.9 (91%)   
6878    [6,6+]: 41.2/42.5,779.4             [1+]: 971.9/1064.9 (91%)   
6878    [5,5+]: 41.9/43.2,822.6             [1+]: 973.2/1064.9 (91%)   
6934    [7]: 719.7/737.0                    [1+]: 979.0/1065.0 (92%)   
6985    6cols=0: [0..5]
6986    63cols=0: [0..62] [0..5]*2
6987    64cols=0: [0..63] [0..62]*2 [0..5]*3
6988    64cols=0: [0..63]*2 [0..62]*3 [0..5]*4
6989    [7]: 725.4/736.9                    [1+]: 984.7/1064.8 (92%)   
6989    [6,6+]: 41.8/42.5,779.3             [1+]: 985.3/1064.8 (93%)   
6989    64cols=0: [0..63]*3 [0..62]*4 [0..5]*5
6990    64cols=0: [0..63]*4 [0..62]*5 [0..5]*6
6991    64cols=0: [0..63]*5 [0..62]*6 [0..5]*7
6992    64cols=0: [0..63]*6 [0..62]*7 [0..5]*8
6993    64cols=0: [0..63]*7 [0..62]*8 [0..5]*9
6994    64cols=0: [0..63]*8 [0..62]*9 [0..5]*10
6995    64cols=0: [0..63]*9 [0..62]*10 [0..5]*11
6996    64cols=0: [0..63]*10 [0..62]*11 [0..5]*12
6997    64cols=0: [0..63]*11 [0..62]*12 [0..5]*13
6998    64cols=0: [0..63]*12 [0..62]*13 [0..5]*14
6999    64cols=0: [0..63]*13 [0..62]*14 [0..5]*15
7000    64cols=0: [0..63]*14 [0..62]*15 [0..5]*16
7001    64cols=0: [0..63]*15 [0..62]*16 [0..5]*17
7002    64cols=0: [0..63]*16 [0..62]*17 [0..5]*18
7003    64cols=0: [0..63]*17 [0..62]*18 [0..5]*19
7004    64cols=0: [0..63]*18 [0..62]*19 [0..5]*20
7005    64cols=0: [0..63]*19 [0..62]*20 [0..5]*21
7006    64cols=0: [0..63]*20 [0..62]*21 [0..5]*22
7007    64cols=0: [0..63]*21 [0..62]*22 [0..5]*23
7008    64cols=0: [0..63]*22 [0..62]*23 [0..5]*24
7009    64cols=0: [0..63]*23 [0..62]*24 [0..5]*25
7010    64cols=0: [0..63]*24 [0..62]*25 [0..5]*26
7011    64cols=0: [0..63]*25 [0..62]*26 [0..5]*27
7012    64cols=0: [0..63]*26 [0..62]*27 [0..5]*28
7013    64cols=0: [0..63]*27 [0..62]*28 [0..5]*29
7014    64cols=0: [0..63]*28 [0..62]*29 [0..5]*30
7015    64cols=0: [0..63]*29 [0..62]*30 [0..5]*31
7016    64cols=0: [0..63]*30 [0..62]*31 [0..5]*32
7017    64cols=0: [0..63]*31 [0..62]*32 [0..5]*33
7018    64cols=0: [0..63]*32 [0..62]*33 [0..5]*34
7019    64cols=0: [0..63]*33 [0..62]*34 [0..5]*35
7020    64cols=0: [0..63]*34 [0..62]*35 [0..5]*36
7021    64cols=0: [0..63]*35 [0..62]*36 [0..5]*37
7022    64cols=0: [0..63]*36 [0..62]*37 [0..5]*38
7023    64cols=0: [0..63]*37 [0..62]*38 [0..5]*39
7024    64cols=0: [0..63]*38 [0..62]*39 [0..5]*40
7025    64cols=0: [0..63]*39 [0..62]*40 [0..5]*41
../bin/linalg/bwc/./lingen: died with signal 11, without coredump
I'm running it with gdb now...

Last fiddled with by frmky on 2009-06-10 at 02:56
frmky is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
CADO-NFS on windows jux CADO-NFS 25 2021-07-13 23:53
CADO help henryzz CADO-NFS 4 2017-11-20 15:14
CADO and WinBlows akruppa Programming 22 2015-12-31 08:37
CADO-NFS skan Information & Answers 1 2013-10-22 07:00
CADO R.D. Silverman Factoring 4 2008-11-06 12:35

All times are UTC. The time now is 20:26.


Fri Jul 16 20:26:36 UTC 2021 up 49 days, 18:13, 1 user, load averages: 1.92, 2.23, 2.19

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.