![]() |
|
|
#34 |
|
May 2008
3·5·73 Posts |
|
|
|
|
|
|
#35 |
|
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
36·13 Posts |
If RSALS uses a quorum of 2, that's a huge unnecessary loss of productivity. The nature of sieving allows for many failures (usually, several percent?), which are removed easily at filtering stage. No single relation is precious and should be unduly fought for. Any significantly large amount of them will do fine.
(This is different from many other DC problems, when a WU cannot be closed until every corner of the search space was visited and confirmed.) In fact, there's another possible problem with quorum of 2: the sievers may produce perfectly correct yet slightly different output, and a pair of outputs will not match -- the factorizations (fields 2 and 3 in the ":"-separated files) may have different order of factors in the comma-separated lists. The server will then discard some perfectly fine WUs, and waste some more time trying to get them match down to a bit. Consider running BOINC client sievers running with -z turned ON by default! (less traffic and transmission error detection already at the reception by server) My 2 cents. P.S. Jeff, transferring g/zipped files has more than one benefit - 1) faster, 2) you get the error when gunzipping already. Then you start checking - was binary transfer ON, was the client or server's memory crank...? (I once had a few gunzipping errors, while receiving rels from Bruce. And what do you think? a memory stick had a few bits burned, on my side. Memtest confirmed it. It was just a month old, and it the memory problem apparently happened within the previous day... RMA'd it. So, consider that any of your client machines may have some memory problems; if they are gamers, they will never know; but the rels files will show it.) |
|
|
|
|
|
#36 | |
|
Jul 2003
So Cal
83A16 Posts |
Quote:
See here, for example. Actually, no. This causes -R not to work. Restarting is very important in BOINC because the client frequently, for many reasons, interrupts a workunit in progress. Instead, the BOINC client can be (is, in the case of NFS@Home) instructed to gzip the output file just before uploading. Really old BOINC clients ignore this instruction, so I'm getting a few that are not gzipped before transfer, but most are. Fortunately, "gzip -dcf" simply cat's a non-gzipped file, so I can treat them all the same in validation and processing. |
|
|
|
|
|
|
#37 |
|
Tribal Bullet
Oct 2004
DD516 Posts |
It always goes up to 64 dependencies, because I was lazy and didn't implement a bitwise-or of the dependency vector. I should fix that.
|
|
|
|
|
|
#38 |
|
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
250516 Posts |
True, I wrote that initial hack. I'll write the necessary part of the code then. Will you be willing to statically link with -lz ? For both writing and reading part [for -R, we have to read back the file]; in fact, wait a minute -- -z option won't work on many exisiting Win clients because as it is written now, it pipes into a "gzip --best --stdout" pipe, and they may not have an external gzip program.
Another thought: there's a suppressed old code that keeps track of the last special q in .last_spq file in the working directory. This can be carefully revived/refactored for faster restarts. gzipped files are concatentable, except if they are truncated (hmm, not really wanting to debug that). Maybe your solution via BOINC manager is what the doctor ordered, after all. P.S. It may be redundant with BOINC's capabilities (you do have that set up. good!), but it won't hurt individual sievers. I'll do that over a few weekends. Last fiddled with by Batalov on 2009-11-17 at 22:39 Reason: .last_spq |
|
|
|
|
|
#39 | |
|
Jun 2003
Ottawa, Canada
100100101012 Posts |
Quote:
My system recently became unstable, for some reason after working solidly for over a year it needed a vcore adjustment up one more level in the bios. I think I might have processed the file before I had fixed it which would explain how the file became corrupted. I don't think it is a memory issue because the system ran solidly with Prime95 Torture test for 18 hours and some IntelBurn Test after I adjusted the vcore higher when it was crashing fairly quickly before that. And, I do understand the benefits of data compression well. ![]() Jeff. |
|
|
|
|
|
|
#40 |
|
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
36×13 Posts |
Prime95 in torture test may or may not crawl all over the memory (gotta have a look at the source). To tell it to use more memory, when I tried the replacement memory, I've run it with custom torture with setting how much memory to use (e.g. 3600Mb or 7600Mb, to leave some for the OS's kitchen). Even then, it was easily seen under monitors that Prime95 does alloc as much, but does it crawl all over it? will read the source right tonight.
Memtest will tell you for sure. They complement each other, because memtest doesn't really stress the CPU but it does crawl all over. The latest version tries to use cores one after another, but not all of them; not really a torture. Dell's diagnostics CD tools have tests that bang on all the memory from all CPUs, at once and/or in different strokes. My favorite torture test is of course "msieve -nc2" ![]() It is because of msieve, that I run 8gb of 1066MHz-rated memory at 1000MHz. Winter is here, so last week I've tried to bump it to 1040MHz... "bzzzzt, wrong decision", reported msieve. (All other torturers were happy!). "matrix is not invertible" and such. At 1000MHz, stable for weeks (or as long as it takes), running B/Lanczos. I didn't imply that you... you know. ![]() ![]()
|
|
|
|
|
|
#41 | |
|
Jun 2003
Ottawa, Canada
3·17·23 Posts |
Quote:
But yes, msieve -nc2 is a great torture test if you have a big enough set of relations lying around. |
|
|
|
|
|
|
#42 |
|
Tribal Bullet
Oct 2004
3,541 Posts |
It actually doesn't even take a very big Lanczos problem to trigger hardware errors in heavily overclocked machines. But even if a pre-generated matrix takes up much less than 1GB worth of disk files, that's a rather sizable torture test :)
|
|
|
|
|
|
#43 |
|
Oct 2004
Austria
46628 Posts |
Never seen this before - seems I have exactly hit the point where the filtering stage thinks that it has enough relations, but it can't even build a matrix:
Code:
Mon Nov 23 20:33:22 2009 Mon Nov 23 20:33:22 2009 Msieve v. 1.43 Mon Nov 23 20:33:22 2009 random seeds: 7a03d6bc 07484868 Mon Nov 23 20:33:22 2009 factoring 86241193609892168785428877715507386455095742442904296823089444056870910801429442199122976408625759612781268230001767314137735454949478943181896880936801 (152 digits) Mon Nov 23 20:33:24 2009 no P-1/P+1/ECM available, skipping Mon Nov 23 20:33:24 2009 commencing number field sieve (152-digit input) Mon Nov 23 20:33:24 2009 R0: -9816673457144609450 Mon Nov 23 20:33:24 2009 R1: 3133157107 Mon Nov 23 20:33:24 2009 A0: 1 Mon Nov 23 20:33:24 2009 A1: -4 Mon Nov 23 20:33:24 2009 A2: -10 Mon Nov 23 20:33:24 2009 A3: 10 Mon Nov 23 20:33:24 2009 A4: 15 Mon Nov 23 20:33:24 2009 A5: -6 Mon Nov 23 20:33:24 2009 A6: -7 Mon Nov 23 20:33:24 2009 A7: 1 Mon Nov 23 20:33:24 2009 A8: 1 Mon Nov 23 20:33:24 2009 skew 1.00, size 1.807668e-04, alpha 3.204558, combined = 2.547850e-12 Mon Nov 23 20:33:24 2009 Mon Nov 23 20:33:24 2009 commencing relation filtering Mon Nov 23 20:33:24 2009 estimated available RAM is 1985.8 MB Mon Nov 23 20:33:24 2009 commencing duplicate removal, pass 1 Mon Nov 23 20:33:34 2009 error -15 reading relation 1005504 Mon Nov 23 20:36:08 2009 error -9 reading relation 16273920 Mon Nov 23 20:36:40 2009 error -15 reading relation 19332113 Mon Nov 23 20:37:18 2009 found 6070455 hash collisions in 23266847 relations Mon Nov 23 20:37:46 2009 added 1828725 free relations Mon Nov 23 20:37:46 2009 commencing duplicate removal, pass 2 Mon Nov 23 20:39:01 2009 found 6918430 duplicates and 18177141 unique relations Mon Nov 23 20:39:01 2009 memory use: 165.2 MB Mon Nov 23 20:39:01 2009 reading ideals above 15138816 Mon Nov 23 20:39:14 2009 commencing singleton removal, initial pass Mon Nov 23 20:45:52 2009 memory use: 376.5 MB Mon Nov 23 20:45:52 2009 reading all ideals from disk Mon Nov 23 20:46:02 2009 memory use: 390.9 MB Mon Nov 23 20:46:05 2009 commencing in-memory singleton removal Mon Nov 23 20:46:08 2009 begin with 18177141 relations and 19827024 unique ideals Mon Nov 23 20:46:32 2009 reduce to 9015192 relations and 7025069 ideals in 17 passes Mon Nov 23 20:46:32 2009 max relations containing the same ideal: 29 Mon Nov 23 20:46:35 2009 reading ideals above 100000 Mon Nov 23 20:46:35 2009 commencing singleton removal, initial pass Mon Nov 23 20:52:40 2009 memory use: 376.5 MB Mon Nov 23 20:52:40 2009 reading all ideals from disk Mon Nov 23 20:52:49 2009 memory use: 389.1 MB Mon Nov 23 20:52:52 2009 keeping 15701256 ideals with weight <= 200, target excess is 45481 Mon Nov 23 20:52:55 2009 commencing in-memory singleton removal Mon Nov 23 20:52:58 2009 begin with 9893826 relations and 15701256 unique ideals Mon Nov 23 20:53:28 2009 reduce to 8961975 relations and 8879883 ideals in 12 passes Mon Nov 23 20:53:28 2009 max relations containing the same ideal: 200 Mon Nov 23 20:53:39 2009 removing 244628 relations and 229961 ideals in 14667 cliques Mon Nov 23 20:53:39 2009 commencing in-memory singleton removal Mon Nov 23 20:53:42 2009 begin with 8717347 relations and 8879883 unique ideals Mon Nov 23 20:54:01 2009 reduce to 8710738 relations and 8643286 ideals in 8 passes Mon Nov 23 20:54:01 2009 max relations containing the same ideal: 200 Mon Nov 23 20:54:11 2009 removing 170533 relations and 155866 ideals in 14667 cliques Mon Nov 23 20:54:12 2009 commencing in-memory singleton removal Mon Nov 23 20:54:14 2009 begin with 8540205 relations and 8643286 unique ideals Mon Nov 23 20:54:31 2009 reduce to 8537162 relations and 8484373 ideals in 7 passes Mon Nov 23 20:54:31 2009 max relations containing the same ideal: 197 Mon Nov 23 20:54:37 2009 relations with 0 large ideals: 1953 Mon Nov 23 20:54:37 2009 relations with 1 large ideals: 1579 Mon Nov 23 20:54:37 2009 relations with 2 large ideals: 4259 Mon Nov 23 20:54:37 2009 relations with 3 large ideals: 38099 Mon Nov 23 20:54:37 2009 relations with 4 large ideals: 198885 Mon Nov 23 20:54:37 2009 relations with 5 large ideals: 665650 Mon Nov 23 20:54:37 2009 relations with 6 large ideals: 1465164 Mon Nov 23 20:54:37 2009 relations with 7+ large ideals: 6161573 Mon Nov 23 20:54:37 2009 commencing 2-way merge Mon Nov 23 20:54:52 2009 reduce to 5455675 relation sets and 5402886 unique ideals Mon Nov 23 20:54:52 2009 commencing full merge Mon Nov 23 21:02:37 2009 memory use: 249.3 MB Mon Nov 23 21:02:37 2009 found 37827 cycles, need 1117181 Mon Nov 23 21:02:37 2009 too few cycles, matrix probably cannot build Mon Nov 23 21:17:42 2009 |
|
|
|
|
|
#44 |
|
Nov 2008
1001000100102 Posts |
This used to be the "matrix must have more columns than rows" error, but I think Jason added this feature, where it is detected beforehand, in 1.43.
Last fiddled with by jasonp on 2009-11-24 at 00:42 Reason: yup, correct |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Msieve 1.53 feedback | xilman | Msieve | 149 | 2018-11-12 06:37 |
| Msieve 1.50 feedback | firejuggler | Msieve | 99 | 2013-02-17 11:53 |
| Msieve v1.48 feedback | Jeff Gilchrist | Msieve | 48 | 2011-06-10 18:18 |
| Msieve 1.42 feedback | Andi47 | Msieve | 167 | 2009-10-18 19:37 |
| Msieve 1.41 Feedback | Batalov | Msieve | 130 | 2009-06-09 16:01 |