![]() |
![]() |
#1 |
"Bob Silverman"
Nov 2003
North of Boston
22·1,877 Posts |
![]()
The linear algebra for 5,423+ finished last night.
It failed. The results was "too many orthogonal vectors". I need to rebuild the matrix and try again. Meanwhile 5,423- is about 2/3 sieved. |
![]() |
![]() |
#2 | |
"Bob Silverman"
Nov 2003
North of Boston
1D5416 Posts |
![]() Quote:
I had rebuilt the matrix. This one was only 3.6M rows. I am going to try recompiling all the code and try again. I suspect memory problems, but diags turned up nothing. I am going to try a different diagnostic suite. I am also going to re-seat the memory DIMMS. 5,423- has finished sieving, and 2,1794M is in progress. |
|
![]() |
![]() |
#3 | |
"Bob Silverman"
Nov 2003
North of Boston
22×1,877 Posts |
![]() Quote:
matrix on a different machine. Even if the matrix somehow has the "wrong" bits lit, it still has a null space and hence the LA should still produce a solution, even if it is wrong. If anyone else has some ideas, I would love to hear them. |
|
![]() |
![]() |
#4 |
"William"
May 2003
Near Grandkid
94516 Posts |
![]()
Many people have found memory problems using the Prime95 torture test that did not show up on any diagnostic suite.
|
![]() |
![]() |
#5 |
∂2ω=0
Sep 2002
Repรบblica de California
101101111010112 Posts |
![]()
Bob, if your code doesn't already have such an expedient, Is there any kind of simple checksum you can incorporate into the matrix-building step, to make sure that all the 1s and 0s that go in come out right in the final matrix? I know diddly-squat about NFS LA, but have found before-and-after checksums invaluable in my own work, when it comes to tracking down weird memory-corruption and compiler bugs.
Also, Is there anything further that can be gleaned from the "too many orthogonal vectors" diagnostic? That would seem (assuming the diagnostic itself it working as intended) to say something about the actual contents (viewed at large) of the final matrix, wouldn't it? Could diagnosing which vectors are orthogonal to which -- or adding diagnostic to flag ones which are orthogonal to some greater-than-expected fraction of their brethren (if there is some applicable threshold to be applied in this regard) -- be helpful? |
![]() |
![]() |
#6 | |
Bamboozled!
"๐บ๐๐ท๐ท๐ญ"
May 2003
Down not across
1165810 Posts |
![]() Quote:
I realise that this only gives you factors and not solve your underlying problem, but at least you get the factors. Paul |
|
![]() |
![]() |
#7 | |
"Bob Silverman"
Nov 2003
North of Boston
22·1,877 Posts |
![]() Quote:
It reported that the 'left' matrix somehow contained a row of all 0's. (this is within a 64x64 block ) A second try with the same exact matrix did not yield this error!!!!! (which, more than anything else, is what makes me suspect a memory problem) running along OK, then, WHAM: Smask[64400] = FFFFFFFFFFFFFFFF (64 entries out of 64). Writing checkpoint 322 to checkfile.even after 64400 iterations. Checkpoint successfully written at Tue Apr 24 08:52:36 2007 Smask[64429] = FFFFFFFFFFFFFFF2 (61 entries out of 64). Smask[64434] = FFFFFFFFFFFFFFD5 (61 entries out of 64). Smask[64445] = FFFFFFFFFFFFFF79 (61 entries out of 64). Smask[64472] = FFFFFFFFFFFFFFEC (61 entries out of 64). leftmat row 0 = 0000000000000000 (0000000000000000000000000000000000000000000000000000000000000000) leftmat row 1 = 0000000000000003 (1100000000000000000000000000000000000000000000000000000000000000) leftmat row 2 = 0000000000000004 (0010000000000000000000000000000000000000000000000000000000000000) leftmat row 3 = 0000000000000009 (1001000000000000000000000000000000000000000000000000000000000000) leftmat row 4 = 0000000000000010 (0000100000000000000000000000000000000000000000000000000000000000) leftmat row 5 = 0000000000000021 (1000010000000000000000000000000000000000000000000000000000000000) leftmat row 6 = 0000000000000040 (0000001000000000000000000000000000000000000000000000000000000000) leftmat row 7 = 0000000000000080 (0000000100000000000000000000000000000000000000000000000000000000) leftmat row 8 = 0000000000000101 (1000000010000000000000000000000000000000000000000000000000000000) leftmat row 9 = 0000000000000200 (0000000001000000000000000000000000000000000000000000000000000000) leftmat row 10 = 0000000000000400 (0000000000100000000000000000000000000000000000000000000000000000) leftmat row 11 = 0000000000000801 (1000000000010000000000000000000000000000000000000000000000000000) leftmat row 12 = 0000000000001001 (1000000000001000000000000000000000000000000000000000000000000000) leftmat row 13 = 0000000000002000 (0000000000000100000000000000000000000000000000000000000000000000) leftmat row 14 = 0000000000004000 (0000000000000010000000000000000000000000000000000000000000000000) leftmat row 15 = 0000000000008001 (1000000000000001000000000000000000000000000000000000000000000000) leftmat row 16 = 0000000000010000 (0000000000000000100000000000000000000000000000000000000000000000) leftmat row 17 = 0000000000020001 (1000000000000000010000000000000000000000000000000000000000000000) leftmat row 18 = 0000000000040001 (1000000000000000001000000000000000000000000000000000000000000000) leftmat row 19 = 0000000000080001 (1000000000000000000100000000000000000000000000000000000000000000) leftmat row 20 = 0000000000100001 (1000000000000000000010000000000000000000000000000000000000000000) leftmat row 21 = 0000000000200001 (1000000000000000000001000000000000000000000000000000000000000000) leftmat row 22 = 0000000000400001 (1000000000000000000000100000000000000000000000000000000000000000) leftmat row 23 = 0000000000800001 (1000000000000000000000010000000000000000000000000000000000000000) leftmat row 24 = 0000000001000000 (0000000000000000000000001000000000000000000000000000000000000000) leftmat row 25 = 0000000002000000 (0000000000000000000000000100000000000000000000000000000000000000) leftmat row 26 = 0000000004000000 (0000000000000000000000000010000000000000000000000000000000000000) leftmat row 27 = 0000000008000001 (1000000000000000000000000001000000000000000000000000000000000000) leftmat row 28 = 0000000010000000 (0000000000000000000000000000100000000000000000000000000000000000) leftmat row 29 = 0000000020000001 (1000000000000000000000000000010000000000000000000000000000000000) leftmat row 30 = 0000000040000001 (1000000000000000000000000000001000000000000000000000000000000000) leftmat row 31 = 0000000080000000 (0000000000000000000000000000000100000000000000000000000000000000) leftmat row 32 = 0000000100000000 (0000000000000000000000000000000010000000000000000000000000000000) leftmat row 33 = 0000000200000001 (1000000000000000000000000000000001000000000000000000000000000000) leftmat row 34 = 0000000400000000 (0000000000000000000000000000000000100000000000000000000000000000) leftmat row 35 = 0000000800000001 (1000000000000000000000000000000000010000000000000000000000000000) leftmat row 36 = 0000001000000001 (1000000000000000000000000000000000001000000000000000000000000000) leftmat row 37 = 0000002000000001 (1000000000000000000000000000000000000100000000000000000000000000) leftmat row 38 = 0000004000000001 (1000000000000000000000000000000000000010000000000000000000000000) leftmat row 39 = 0000008000000000 (0000000000000000000000000000000000000001000000000000000000000000) leftmat row 40 = 0000010000000001 (1000000000000000000000000000000000000000100000000000000000000000) leftmat row 41 = 0000020000000001 (1000000000000000000000000000000000000000010000000000000000000000) leftmat row 42 = 0000040000000001 (1000000000000000000000000000000000000000001000000000000000000000) leftmat row 43 = 0000080000000000 (0000000000000000000000000000000000000000000100000000000000000000) leftmat row 44 = 0000100000000001 (1000000000000000000000000000000000000000000010000000000000000000) leftmat row 45 = 0000200000000000 (0000000000000000000000000000000000000000000001000000000000000000) leftmat row 46 = 0000400000000000 (0000000000000000000000000000000000000000000000100000000000000000) leftmat row 47 = 0000800000000000 (0000000000000000000000000000000000000000000000010000000000000000) leftmat row 48 = 0001000000000001 (1000000000000000000000000000000000000000000000001000000000000000) leftmat row 49 = 0002000000000001 (1000000000000000000000000000000000000000000000000100000000000000) leftmat row 50 = 0004000000000000 (0000000000000000000000000000000000000000000000000010000000000000) leftmat row 51 = 0008000000000001 (1000000000000000000000000000000000000000000000000001000000000000) leftmat row 52 = 0010000000000000 (0000000000000000000000000000000000000000000000000000100000000000) leftmat row 53 = 0020000000000001 (1000000000000000000000000000000000000000000000000000010000000000) leftmat row 54 = 0040000000000001 (1000000000000000000000000000000000000000000000000000001000000000) leftmat row 55 = 0080000000000000 (0000000000000000000000000000000000000000000000000000000100000000) leftmat row 56 = 0100000000000000 (0000000000000000000000000000000000000000000000000000000010000000) leftmat row 57 = 0200000000000000 (0000000000000000000000000000000000000000000000000000000001000000) leftmat row 58 = 0400000000000000 (0000000000000000000000000000000000000000000000000000000000100000) leftmat row 59 = 0800000000000000 (0000000000000000000000000000000000000000000000000000000000010000) leftmat row 60 = 1000000000000000 (0000000000000000000000000000000000000000000000000000000000001000) leftmat row 61 = 2000000000000001 (1000000000000000000000000000000000000000000000000000000000000100) leftmat row 62 = 4000000000000000 (0000000000000000000000000000000000000000000000000000000000000010) leftmat row 63 = 8000000000000001 (1000000000000000000000000000000000000000000000000000000000000001) rightmat row 0 = B551DFBCC1FC724E (0111001001001110001111111000001100111101111110111000101010101101) rightmat row 1 = 2E86BDF45F01DFE2 (0100011111111011100000001111101000101111101111010110000101110100) rightmat row 2 = 4A00E8614598E587 (1110000110100111000110011010001010000110000101110000000001010010) rightmat row 3 = E73FC9093A15F99A (0101100110011111101010000101110010010000100100111111110011100111) rightmat row 4 = 4EAEF0619ED59A43 (1100001001011001101010110111100110000110000011110111010101110010) rightmat row 5 = D090E6779FD0A416 (0110100000100101000010111111100111101110011001110000100100001011) rightmat row 6 = 70F689B23572365A (0101101001101100010011101010110001001101100100010110111100001110) rightmat row 7 = F01EA383991F50C0 (0000001100001010111110001001100111000001110001010111100000001111) rightmat row 8 = 3854053E0E0F5E7F (1111111001111010111100000111000001111100101000000010101000011100) rightmat row 9 = 496B69E987F563BF (1111110111000110101011111110000110010111100101101101011010010010) rightmat row 10 = 748F98575E5DC603 (1100000001100011101110100111101011101010000110011111000100101110) rightmat row 11 = A17B78D36ED29C40 (0000001000111001010010110111011011001011000111101101111010000101) rightmat row 12 = 268C210564FF386D (1011011000011100111111110010011010100000100001000011000101100100) rightmat row 13 = 78F2818FE5464FAA (0101010111110010011000101010011111110001100000010100111100011110) rightmat row 14 = C5D138FD1D93D12D (1011010010001011110010011011100010111111000111001000101110100011) rightmat row 15 = 143C239A8079704F (1111001000001110100111100000000101011001110001000011110000101000) rightmat row 16 = 3F08E64960EDD79A (0101100111101011101101110000011010010010011001110001000011111100) rightmat row 17 = DC867D3BFE6E37FF (1111111111101100011101100111111111011100101111100110000100111011) rightmat row 18 = 00547643F5CBA461 (1000011000100101110100111010111111000010011011100010101000000000) rightmat row 19 = 077021E4A5D19F56 (0110101011111001100010111010010100100111100001000000111011100000) rightmat row 20 = 2EFDE2104EAC3F4B (1101001011111100001101010111001000001000010001111011111101110100) rightmat row 21 = 55EA41179DD59892 (0100100100011001101010111011100111101000100000100101011110101010) rightmat row 22 = 0FAB496F4369B4A2 (0100010100101101100101101100001011110110100100101101010111110000) rightmat row 23 = F3EFEA60533B50E6 (0110011100001010110111001100101000000110010101111111011111001111) rightmat row 24 = FCEBB778B702260B (1101000001100100010000001110110100011110111011011101011100111111) rightmat row 25 = 86FD3C5DA02E7D74 (0010111010111110011101000000010110111010001111001011111101100001) rightmat row 26 = 3518376EF0C20D38 (0001110010110000010000110000111101110110111011000001100010101100) rightmat row 27 = CB8167E6F82671CB (1101001110001110011001000001111101100111111001101000000111010011) rightmat row 28 = 785F28F2B558DDD1 (1000101110111011000110101010110101001111000101001111101000011110) rightmat row 29 = 6343D68EEEE77677 (1110111001101110111001110111011101110001011010111100001011000110) rightmat row 30 = AA69B789732FAFFF (1111111111110101111101001100111010010001111011011001011001010101) rightmat row 31 = 2022FFA859C0E65D (1011101001100111000000111001101000010101111111110100010000000100) rightmat row 32 = 6C52693A42677EBC (0011110101111110111001100100001001011100100101100100101000110110) rightmat row 33 = 6921EA456C729191 (1000100110001001010011100011011010100010010101111000010010010110) rightmat row 34 = 026ED92E488631CF (1111001110001100011000010001001001110100100110110111011001000000) rightmat row 35 = F7CF70711045E9DA (0101101110010111101000100000100010001110000011101111001111101111) rightmat row 36 = 6E4AF86DE434C7B0 (0000110111100011001011000010011110110110000111110101001001110110) rightmat row 37 = 74ACE4C30232D08F (1111000100001011010011000100000011000011001001110011010100101110) rightmat row 38 = CAE48D5CE6DB9922 (0100010010011001110110110110011100111010101100010010011101010011) rightmat row 39 = 4240B4E45F1ADC61 (1000011000111011010110001111101000100111001011010000001001000010) rightmat row 40 = 3CA6B8DDFB90CB56 (0110101011010011000010011101111110111011000111010110010100111100) rightmat row 41 = 43F3945C726D1359 (1001101011001000101101100100111000111010001010011100111111000010) rightmat row 42 = 23477D02B103E095 (1010100100000111110000001000110101000000101111101110001011000100) rightmat row 43 = 0C7CF01335D078ED (1011011100011110000010111010110011001000000011110011111000110000) rightmat row 44 = 8B4B1CB8100046C0 (0000001101100010000000000000100000011101001110001101001011010001) rightmat row 45 = A628DAC1B761439D (1011100111000010100001101110110110000011010110110001010001100101) rightmat row 46 = 7EC876C7CE194ED1 (1000101101110010100110000111001111100011011011100001001101111110) rightmat row 47 = B1986EB24683125D (1011101001001000110000010110001001001101011101100001100110001101) rightmat row 48 = C06BBDE825D4AFBF (1111110111110101001010111010010000010111101111011101011000000011) rightmat row 49 = 0159EEC1200A60E5 (1010011100000110010100000000010010000011011101111001101010000000) rightmat row 50 = DFF2C5AABB947EBF (1111110101111110001010011101110101010101101000110100111111111011) rightmat row 51 = 4E0C169AAEE7498C (0011000110010010111001110111010101011001011010000011000001110010) rightmat row 52 = A6E30783180C342E (0111010000101100001100000001100011000001111000001100011101100101) rightmat row 53 = F7B2C5C03AEE7D4C (0011001010111110011101110101110000000011101000110100110111101111) rightmat row 54 = A1553DF984BA6192 (0100100110000110010111010010000110011111101111001010101010000101) rightmat row 55 = 79E51CD4CA0E063C (0011110001100000011100000101001100101011001110001010011110011110) rightmat row 56 = 52080CF24A070EE5 (1010011101110000111000000101001001001111001100000001000001001010) rightmat row 57 = E33C76DC6AD9101E (0111100000001000100110110101011000111011011011100011110011000111) rightmat row 58 = 7DE83B07C8977B92 (0100100111011110111010010001001111100000110111000001011110111110) rightmat row 59 = 61B69195F051C873 (1100111000010011100010100000111110101001100010010110110110000110) rightmat row 60 = AC1B846CB2B19343 (1100001011001001100011010100110100110110001000011101100000110101) rightmat row 61 = 1B1991656A69AE33 (1100110001110101100101100101011010100110100010011001100011011000) rightmat row 62 = 1A978A3D90A08D99 (1001100110110001000001010000100110111100010100011110100101011000) rightmat row 63 = 179CC416D47ADB51 (1000101011011011010111100010101101101000001000110011100111101000) |
|
![]() |
![]() |
#8 |
∂2ω=0
Sep 2002
Repรบblica de California
101101111010112 Posts |
![]()
So the question is, is this a hardware or software memory corruption problem?
I'd suggest running some program like Purify on your code, but if the problem is not in any way reproducible from one run to the next, this could be a waste of time. I assume you have carefully scanned your build-time output for compile warnings about uninitialized memory? Have you built on multiple platforms/compilers? Some are better at catching uninitialized-memory issues at build time (obviously preferable to doing a time-consuming Purify run with a nonpredictably reproducible issue like you seem to have) than others. But wait - your second post of the thread says it failed 3 times in a row -- so is the error reproducible in this case, or not? |
![]() |
![]() |
#9 | |
Bamboozled!
"๐บ๐๐ท๐ท๐ญ"
May 2003
Down not across
2D8A16 Posts |
![]() Quote:
Is the temperature ok? Are all the fans spinning properly? Remember to check the chipset fan(s) if any. Can you pull the memory from that system and put it into another one so that the latter has enough to be able to complete the matrix? Sometimes re-seating any add-on cards curesstrange problems. For example, my Athlon Linux box hangs every now and again. It's not done so for weeks but previously it would sometimes hang several times a day. I've long suspected the ethernet card but have never been able to prove it. On the other hand, no hangs have occurred after reseating all the cards. Paul |
|
![]() |
![]() |
#10 | ||
Aug 2002
2×7×13×47 Posts |
![]() Quote:
Quote:
![]() |
||
![]() |
![]() |
#11 |
Sep 2002
Database er0rr
23·3·11·17 Posts |
![]()
Basic hardware checks:
BIOS: all voltages readings BIOS: all temperature readings (idle CPU at 45C and no hotter than 15C over the motherboard) BIOS: CPU timings BIOS: all memory timings BIOS: miscellaneous settings such "spread spectrum" H/W: Clean fans, heatsink H/W: connections for mainboard, cards, disks and memory Software checks: memtest86+ prime95 torture tests Operating System checks: Filing system: "chkdsk" (Win); or "fsck" and "badblocks" (Linux; RTFM) Defrag (Win) Update operating system Virus Firewall Spyware Other malware Network device security Application checks: Re-install Re-compile Try on another box Try to shield from cosmic rays ![]() Last fiddled with by paulunderwood on 2007-05-23 at 22:23 |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Assertion failure in 6.4.2 | bsquared | GMP-ECM | 4 | 2013-03-01 15:52 |
Power Supply Failure | flashjh | Hardware | 11 | 2013-02-16 15:49 |
NEW USER - HARDWARE FAILURE - PLEASE HELP | Cosmo | Hardware | 45 | 2005-10-17 10:00 |
What does this failure indication mean? | krunsj | Hardware | 5 | 2004-07-17 16:09 |
Failure Functioins | Unregistered | Miscellaneous Math | 0 | 2004-02-12 11:51 |