mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2009-11-17, 20:38   #34
jrk
 
jrk's Avatar
 
May 2008

3·5·73 Posts
Default

Quote:
Originally Posted by Jeff Gilchrist View Post
As Batalov suggested it was probably a transmission or decompression error somewhere along the line.
Or using "ASCII" mode FTP transfer.
jrk is offline   Reply With Quote
Old 2009-11-17, 20:54   #35
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

36·13 Posts
Default

If RSALS uses a quorum of 2, that's a huge unnecessary loss of productivity. The nature of sieving allows for many failures (usually, several percent?), which are removed easily at filtering stage. No single relation is precious and should be unduly fought for. Any significantly large amount of them will do fine.

(This is different from many other DC problems, when a WU cannot be closed until every corner of the search space was visited and confirmed.)

In fact, there's another possible problem with quorum of 2: the sievers may produce perfectly correct yet slightly different output, and a pair of outputs will not match -- the factorizations (fields 2 and 3 in the ":"-separated files) may have different order of factors in the comma-separated lists. The server will then discard some perfectly fine WUs, and waste some more time trying to get them match down to a bit.

Consider running BOINC client sievers running with -z turned ON by default! (less traffic and transmission error detection already at the reception by server)

My 2 cents.

P.S. Jeff, transferring g/zipped files has more than one benefit - 1) faster, 2) you get the error when gunzipping already. Then you start checking - was binary transfer ON, was the client or server's memory crank...?

(I once had a few gunzipping errors, while receiving rels from Bruce. And what do you think? a memory stick had a few bits burned, on my side. Memtest confirmed it. It was just a month old, and it the memory problem apparently happened within the previous day... RMA'd it. So, consider that any of your client machines may have some memory problems; if they are gamers, they will never know; but the rels files will show it.)
Batalov is offline   Reply With Quote
Old 2009-11-17, 21:27   #36
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

83A16 Posts
Default

Quote:
Originally Posted by Batalov View Post
If RSALS uses a quorum of 2, that's a huge unnecessary loss of productivity.
Actually they are using a quorum of 1 now, but still an initial replication of 2. I'm not sure why the latter since BOINC reissues invalid work units anyway, and with a 3.5 day expiration, NFS@Home is getting nearly all work units back in reasonable time. A few stragglers I don't worry about.

See here, for example.
Quote:
Originally Posted by Batalov View Post
Consider running BOINC client sievers running with -z turned ON by default!
Actually, no. This causes -R not to work. Restarting is very important in BOINC because the client frequently, for many reasons, interrupts a workunit in progress. Instead, the BOINC client can be (is, in the case of NFS@Home) instructed to gzip the output file just before uploading. Really old BOINC clients ignore this instruction, so I'm getting a few that are not gzipped before transfer, but most are. Fortunately, "gzip -dcf" simply cat's a non-gzipped file, so I can treat them all the same in validation and processing.
frmky is offline   Reply With Quote
Old 2009-11-17, 21:41   #37
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

DD516 Posts
Default

Quote:
Originally Posted by Jeff Gilchrist View Post
For some reason it goes up to 64 dependencies. I'm going to double check that all my files on the Windows and Linux systems are identical in case something got messed up somewhere.
It always goes up to 64 dependencies, because I was lazy and didn't implement a bitwise-or of the dependency vector. I should fix that.
jasonp is offline   Reply With Quote
Old 2009-11-17, 22:18   #38
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

250516 Posts
Default

Quote:
Originally Posted by frmky View Post
Actually, no. This causes -R not to work.
True, I wrote that initial hack. I'll write the necessary part of the code then. Will you be willing to statically link with -lz ? For both writing and reading part [for -R, we have to read back the file]; in fact, wait a minute -- -z option won't work on many exisiting Win clients because as it is written now, it pipes into a "gzip --best --stdout" pipe, and they may not have an external gzip program.

Another thought: there's a suppressed old code that keeps track of the last special q in .last_spq file in the working directory. This can be carefully revived/refactored for faster restarts. gzipped files are concatentable, except if they are truncated (hmm, not really wanting to debug that). Maybe your solution via BOINC manager is what the doctor ordered, after all.

P.S. It may be redundant with BOINC's capabilities (you do have that set up. good!), but it won't hurt individual sievers. I'll do that over a few weekends.

Last fiddled with by Batalov on 2009-11-17 at 22:39 Reason: .last_spq
Batalov is offline   Reply With Quote
Old 2009-11-18, 00:31   #39
Jeff Gilchrist
 
Jeff Gilchrist's Avatar
 
Jun 2003
Ottawa, Canada

100100101012 Posts
Default

Quote:
Originally Posted by Batalov View Post
P.S. Jeff, transferring g/zipped files has more than one benefit - 1) faster, 2) you get the error when gunzipping already. Then you start checking - was binary transfer ON, was the client or server's memory crank...?

(I once had a few gunzipping errors, while receiving rels from Bruce. And what do you think? a memory stick had a few bits burned, on my side. Memtest confirmed it. It was just a month old, and it the memory problem apparently happened within the previous day... RMA'd it. So, consider that any of your client machines may have some memory problems; if they are gamers, they will never know; but the rels files will show it.)
The files were bzip2'd and didn't produce any errors during decompression, and I use sftp so there are no ASCII/binary issues I think since sftp has only one mode right? Either way if a .bz2 file was sent in ASCII mode it would have given me an error when trying to decompress.

My system recently became unstable, for some reason after working solidly for over a year it needed a vcore adjustment up one more level in the bios. I think I might have processed the file before I had fixed it which would explain how the file became corrupted. I don't think it is a memory issue because the system ran solidly with Prime95 Torture test for 18 hours and some IntelBurn Test after I adjusted the vcore higher when it was crashing fairly quickly before that.

And, I do understand the benefits of data compression well.

Jeff.
Jeff Gilchrist is offline   Reply With Quote
Old 2009-11-18, 01:15   #40
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

36×13 Posts
Default

Prime95 in torture test may or may not crawl all over the memory (gotta have a look at the source). To tell it to use more memory, when I tried the replacement memory, I've run it with custom torture with setting how much memory to use (e.g. 3600Mb or 7600Mb, to leave some for the OS's kitchen). Even then, it was easily seen under monitors that Prime95 does alloc as much, but does it crawl all over it? will read the source right tonight.

Memtest will tell you for sure. They complement each other, because memtest doesn't really stress the CPU but it does crawl all over. The latest version tries to use cores one after another, but not all of them; not really a torture. Dell's diagnostics CD tools have tests that bang on all the memory from all CPUs, at once and/or in different strokes.

My favorite torture test is of course "msieve -nc2"
It is because of msieve, that I run 8gb of 1066MHz-rated memory at 1000MHz. Winter is here, so last week I've tried to bump it to 1040MHz... "bzzzzt, wrong decision", reported msieve. (All other torturers were happy!). "matrix is not invertible" and such. At 1000MHz, stable for weeks (or as long as it takes), running B/Lanczos.

I didn't imply that you... you know.
Batalov is offline   Reply With Quote
Old 2009-11-18, 01:49   #41
Jeff Gilchrist
 
Jeff Gilchrist's Avatar
 
Jun 2003
Ottawa, Canada

3·17·23 Posts
Default

Quote:
Originally Posted by Batalov View Post
Prime95 in torture test may or may not crawl all over the memory (gotta have a look at the source). To tell it to use more memory, when I tried the replacement memory, I've run it with custom torture with setting how much memory to use (e.g. 3600Mb or 7600Mb, to leave some for the OS's kitchen). Even then, it was easily seen under monitors that Prime95 does alloc as much, but does it crawl all over it? will read the source right tonight.
You are right, I usually do Prime95 which doesn't cover all RAM. I also use IntelBurnTest which you can specify to use all available RAM and it heats up the CPU cores quite a bit more than Prime95 as well. So between the two of them usually everything is caught. I probably should run memtest just to be extra sure...

But yes, msieve -nc2 is a great torture test if you have a big enough set of relations lying around.
Jeff Gilchrist is offline   Reply With Quote
Old 2009-11-18, 04:29   #42
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3,541 Posts
Default

It actually doesn't even take a very big Lanczos problem to trigger hardware errors in heavily overclocked machines. But even if a pre-generated matrix takes up much less than 1GB worth of disk files, that's a rather sizable torture test :)
jasonp is offline   Reply With Quote
Old 2009-11-23, 20:22   #43
Andi47
 
Andi47's Avatar
 
Oct 2004
Austria

46628 Posts
Default

Never seen this before - seems I have exactly hit the point where the filtering stage thinks that it has enough relations, but it can't even build a matrix:

Code:
Mon Nov 23 20:33:22 2009  
Mon Nov 23 20:33:22 2009  Msieve v. 1.43
Mon Nov 23 20:33:22 2009  random seeds: 7a03d6bc 07484868
Mon Nov 23 20:33:22 2009  factoring 86241193609892168785428877715507386455095742442904296823089444056870910801429442199122976408625759612781268230001767314137735454949478943181896880936801 (152 digits)
Mon Nov 23 20:33:24 2009  no P-1/P+1/ECM available, skipping
Mon Nov 23 20:33:24 2009  commencing number field sieve (152-digit input)
Mon Nov 23 20:33:24 2009  R0: -9816673457144609450
Mon Nov 23 20:33:24 2009  R1:  3133157107
Mon Nov 23 20:33:24 2009  A0:  1
Mon Nov 23 20:33:24 2009  A1: -4
Mon Nov 23 20:33:24 2009  A2: -10
Mon Nov 23 20:33:24 2009  A3:  10
Mon Nov 23 20:33:24 2009  A4:  15
Mon Nov 23 20:33:24 2009  A5: -6
Mon Nov 23 20:33:24 2009  A6: -7
Mon Nov 23 20:33:24 2009  A7:  1
Mon Nov 23 20:33:24 2009  A8:  1
Mon Nov 23 20:33:24 2009  skew 1.00, size 1.807668e-04, alpha 3.204558, combined = 2.547850e-12
Mon Nov 23 20:33:24 2009  
Mon Nov 23 20:33:24 2009  commencing relation filtering
Mon Nov 23 20:33:24 2009  estimated available RAM is 1985.8 MB
Mon Nov 23 20:33:24 2009  commencing duplicate removal, pass 1
Mon Nov 23 20:33:34 2009  error -15 reading relation 1005504
Mon Nov 23 20:36:08 2009  error -9 reading relation 16273920
Mon Nov 23 20:36:40 2009  error -15 reading relation 19332113
Mon Nov 23 20:37:18 2009  found 6070455 hash collisions in 23266847 relations
Mon Nov 23 20:37:46 2009  added 1828725 free relations
Mon Nov 23 20:37:46 2009  commencing duplicate removal, pass 2
Mon Nov 23 20:39:01 2009  found 6918430 duplicates and 18177141 unique relations
Mon Nov 23 20:39:01 2009  memory use: 165.2 MB
Mon Nov 23 20:39:01 2009  reading ideals above 15138816
Mon Nov 23 20:39:14 2009  commencing singleton removal, initial pass
Mon Nov 23 20:45:52 2009  memory use: 376.5 MB
Mon Nov 23 20:45:52 2009  reading all ideals from disk
Mon Nov 23 20:46:02 2009  memory use: 390.9 MB
Mon Nov 23 20:46:05 2009  commencing in-memory singleton removal
Mon Nov 23 20:46:08 2009  begin with 18177141 relations and 19827024 unique ideals
Mon Nov 23 20:46:32 2009  reduce to 9015192 relations and 7025069 ideals in 17 passes
Mon Nov 23 20:46:32 2009  max relations containing the same ideal: 29
Mon Nov 23 20:46:35 2009  reading ideals above 100000
Mon Nov 23 20:46:35 2009  commencing singleton removal, initial pass
Mon Nov 23 20:52:40 2009  memory use: 376.5 MB
Mon Nov 23 20:52:40 2009  reading all ideals from disk
Mon Nov 23 20:52:49 2009  memory use: 389.1 MB
Mon Nov 23 20:52:52 2009  keeping 15701256 ideals with weight <= 200, target excess is 45481
Mon Nov 23 20:52:55 2009  commencing in-memory singleton removal
Mon Nov 23 20:52:58 2009  begin with 9893826 relations and 15701256 unique ideals
Mon Nov 23 20:53:28 2009  reduce to 8961975 relations and 8879883 ideals in 12 passes
Mon Nov 23 20:53:28 2009  max relations containing the same ideal: 200
Mon Nov 23 20:53:39 2009  removing 244628 relations and 229961 ideals in 14667 cliques
Mon Nov 23 20:53:39 2009  commencing in-memory singleton removal
Mon Nov 23 20:53:42 2009  begin with 8717347 relations and 8879883 unique ideals
Mon Nov 23 20:54:01 2009  reduce to 8710738 relations and 8643286 ideals in 8 passes
Mon Nov 23 20:54:01 2009  max relations containing the same ideal: 200
Mon Nov 23 20:54:11 2009  removing 170533 relations and 155866 ideals in 14667 cliques
Mon Nov 23 20:54:12 2009  commencing in-memory singleton removal
Mon Nov 23 20:54:14 2009  begin with 8540205 relations and 8643286 unique ideals
Mon Nov 23 20:54:31 2009  reduce to 8537162 relations and 8484373 ideals in 7 passes
Mon Nov 23 20:54:31 2009  max relations containing the same ideal: 197
Mon Nov 23 20:54:37 2009  relations with 0 large ideals: 1953
Mon Nov 23 20:54:37 2009  relations with 1 large ideals: 1579
Mon Nov 23 20:54:37 2009  relations with 2 large ideals: 4259
Mon Nov 23 20:54:37 2009  relations with 3 large ideals: 38099
Mon Nov 23 20:54:37 2009  relations with 4 large ideals: 198885
Mon Nov 23 20:54:37 2009  relations with 5 large ideals: 665650
Mon Nov 23 20:54:37 2009  relations with 6 large ideals: 1465164
Mon Nov 23 20:54:37 2009  relations with 7+ large ideals: 6161573
Mon Nov 23 20:54:37 2009  commencing 2-way merge
Mon Nov 23 20:54:52 2009  reduce to 5455675 relation sets and 5402886 unique ideals
Mon Nov 23 20:54:52 2009  commencing full merge
Mon Nov 23 21:02:37 2009  memory use: 249.3 MB
Mon Nov 23 21:02:37 2009  found 37827 cycles, need 1117181
Mon Nov 23 21:02:37 2009  too few cycles, matrix probably cannot build
Mon Nov 23 21:17:42 2009
I will sieve for some more relations...
Andi47 is offline   Reply With Quote
Old 2009-11-23, 20:55   #44
10metreh
 
10metreh's Avatar
 
Nov 2008

1001000100102 Posts
Default

Quote:
Originally Posted by Andi47 View Post
Never seen this before - seems I have exactly hit the point where the filtering stage thinks that it has enough relations, but it can't even build a matrix:

<snip>

I will sieve for some more relations...
This used to be the "matrix must have more columns than rows" error, but I think Jason added this feature, where it is detected beforehand, in 1.43.

Last fiddled with by jasonp on 2009-11-24 at 00:42 Reason: yup, correct
10metreh is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Msieve 1.53 feedback xilman Msieve 149 2018-11-12 06:37
Msieve 1.50 feedback firejuggler Msieve 99 2013-02-17 11:53
Msieve v1.48 feedback Jeff Gilchrist Msieve 48 2011-06-10 18:18
Msieve 1.42 feedback Andi47 Msieve 167 2009-10-18 19:37
Msieve 1.41 Feedback Batalov Msieve 130 2009-06-09 16:01

All times are UTC. The time now is 01:17.


Sat Jul 17 01:17:00 UTC 2021 up 49 days, 23:04, 1 user, load averages: 0.91, 1.12, 1.26

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.