mersenneforum.org CUDA - Class problems. Factor divisible by 2, 3, 5, 7, or 11
 Register FAQ Search Today's Posts Mark Forums Read

 2019-01-13, 11:51 #1 ET_ Banned     "Luigi" Aug 2002 Team Italia 10010111010112 Posts CUDA - Class problems. Factor divisible by 2, 3, 5, 7, or 11 I did an advanced search on the forum, and I found that this error often hinted to a not proper CUDA toolkit/CC/driver/executable configuration: recompilation with the appropriate CC in the makefile, or reinstallation of the toolkit/driver usually solved the issue with both mfaktc and mmff programs. Unfortunately, a friend of mine incurred in this same error with mmff.exe on Windows 10, but at first look his configurtion is correct. He has a Pascal card (either GTX 1050 or GTX 1060) and just installed the toolkit and the driver from Nvidia. Here is the screenshot of the issue: Code: mmff v0.28 (64bit built) Compiletime options THREADS_PER_BLOCK 256 MORE_CLASSES enabled Runtime options GPU Sieving enabled WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486) GPUSievePrimes depends on worktodo entry GPUSieveSize 16M bits WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8) GPUSieveProcessSize 8K bits WorkFile worktodo.txt Checkpoints enabled CheckpointDelay 300s StopAfterFactor disabled PrintMode compact V5UserID (none) ComputerID (none) WARNING, no ProgressFormat specified in mmff.ini, using default TimeStampInResults no CUDA version info binary compiled for CUDA 10.0 CUDA runtime version 10.0 CUDA driver version 10.0 CUDA device info name GeForce GTX 1050 Ti with Max-Q Design compute capability 6.1 maximum threads per block 1024 number of mutliprocessors 6 (unknown number of shader cores) clock rate 1417MHz got assignment: k*2^167+1, k range 1835000000 to 1836000000 (198-bit factors) Starting trial factoring of k*2^167+1 in k range: 1835M to 1836M (198-bit factors) k_min = 1835000000 k_max = 1836000000 Using GPU kernel "mfaktc_barrett204_F160_191gs" ERROR: Class problems. Factor divisible by 2, 3, 5, 7, or 11 Now, apart from the lack of the configuration file mmff.ini I can't see errors. The makefile "Makefile.win" is set to produce code for CC 3.0 and above (including 6.1 which covers Pascal cards) through the command Code: --generate-code arch=compute_61,code=sm_61 the card and the CC are recognized, the kernel is correct... In red the parts I have never seen before (but my newest GPU card is a GTX 980...) What might possibly have been gone wrong? Luigi Last fiddled with by ET_ on 2019-01-13 at 11:53
 2019-01-13, 14:12 #2 kriesel     "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 26×107 Posts I don't run mmff. But when I run mfaktx or CUDAPm1, and get a bad factor, it seems to indicate problem gpu hardware. Running a thorough memory test (multiple patterns, multiple repeats, full memory range) has indicated bad gpu memory in that case.
2019-01-13, 14:44   #3
ET_
Banned

"Luigi"
Aug 2002
Team Italia

29·167 Posts

Quote:
 Originally Posted by kriesel I don't run mmff. But when I run mfaktx or CUDAPm1, and get a bad factor, it seems to indicate problem gpu hardware. Running a thorough memory test (multiple patterns, multiple repeats, full memory range) has indicated bad gpu memory in that case.
Thanks Ken, but it does not look as a GPU problem, as the GPU worked fine before the transition to CUDA 10...

 2019-01-13, 15:00 #4 kriesel     "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 26·107 Posts Still may be worth running, since it could rule out memory issues, or the hardware health may have declined recently. I had a gpu that drastically increased memory error rate in a year. (Probably faster; I only tested memory a year apart.)
 2019-01-13, 19:19 #5 Dylan14     "Dylan" Mar 2017 2×33×11 Posts Just finished a memory test (memtestg80.exe) with the GTX 1050 ti mentioned in the first post of this thread with 3 GB used and 1000 test iterations. There were errors, but only with the random blocks part of the test, and not every iteration had these errors. Not sure if errors in this part implies a failure with mmff. If need be, I can redo the test with fewer iterations and upload the output.
 2019-01-14, 02:09 #6 tServo     "Marv" May 2009 near the Tannhäuser Gate 17×47 Posts Luigi, The stuff in red from your post doesn't matter. I suspect the number he is trying to factor is out-of-range. mmff comes with a "test" worktodo.txt file that has a number of known fermat factors. Has this test file been run?
2019-01-14, 09:15   #7
ET_
Banned

"Luigi"
Aug 2002
Team Italia

29×167 Posts

Quote:
 Originally Posted by tServo Luigi, The stuff in red from your post doesn't matter. I suspect the number he is trying to factor is out-of-range. mmff comes with a "test" worktodo.txt file that has a number of known fermat factors. Has this test file been run?
It is part of the worktodo.txt file distributed with the executable.
I tested it on a gtx 680, and it worked nicely and didn't throw the error out, as the k is very small with relation to the acual k sizes.

Last fiddled with by ET_ on 2019-01-14 at 09:15

2019-01-15, 15:11   #8
Dylan14

"Dylan"
Mar 2017

2·33·11 Posts

To further debug the issue, I've made the following changes to the file tf_validate.h, which verifies if a factor doesn't have small factors in itself:

in line 271, comment out the exit(1);
in line 274, comment out the exit(1);

The effect of these was to allow the code to carry on after encountering the class problems error. I then recompiled the code without issue, and then run the sample worktodo-test256.txt file included in the source (renamed to worktodo.txt of course). When I run mmff this time, it yields a bunch of errors and then it appears to hang. I've included the output from this run, see the attachment.
Attached Files
 output_immediateexitdisabled.txt (13.9 KB, 230 views)

2019-01-15, 17:03   #9
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

153008 Posts

Quote:
 Originally Posted by Dylan14 Just finished a memory test (memtestg80.exe) with the GTX 1050 ti mentioned in the first post of this thread with 3 GB used and 1000 test iterations. There were errors, but only with the random blocks part of the test, and not every iteration had these errors. Not sure if errors in this part implies a failure with mmff. If need be, I can redo the test with fewer iterations and upload the output.
I suspect your card has some hardware issue. It may be not reliable enough for mmff.
If I understand you correctly, you are seeing issues with it in mmff and in memtestg80.

Try CUDALucas 2.06 May 2017 beta, -memtest option, with as much memory coverage as it will let you run (nearly all the 4GB on the gpu card, except for a bit ~100MB for the program to sit in), and follow with a full double check run. Standard operating procedure is to duplicate a run of a known Mersenne prime, such as M6972593. If it can't pass that test, repeatedly, it's probably not reliable enough to use in mmff or other number theory software either.
In your memtestg80 testing, what did it show you in terms of error counts and location? I've had good results for limited times, running trial factoring on a card that became unusable for P-1 and then LL testing. As the memory cells failed over time, it came to cover more of the address space. P-1 is the most memory hungry, primality testing is intermediate, trial factoring has a small footprint. Eventually the card became unusably unreliable even for TF and was retired.
How old is your card? (Warranty expired?)
Do you see any visual artifacts if you use it to drive a display?

EDIT:
Several choices for gpu testing are listed at https://www.raymond.cc/blog/having-p...st-its-memory/

Last fiddled with by kriesel on 2019-01-15 at 17:16

2019-01-15, 21:08   #10
Dylan14

"Dylan"
Mar 2017

59410 Posts

Quote:
 Originally Posted by kriesel I suspect your card has some hardware issue. It may be not reliable enough for mmff. If I understand you correctly, you are seeing issues with it in mmff and in memtestg80. Try CUDALucas 2.06 May 2017 beta, -memtest option, with as much memory coverage as it will let you run (nearly all the 4GB on the gpu card, except for a bit ~100MB for the program to sit in), and follow with a full double check run. Standard operating procedure is to duplicate a run of a known Mersenne prime, such as M6972593. If it can't pass that test, repeatedly, it's probably not reliable enough to use in mmff or other number theory software either. In your memtestg80 testing, what did it show you in terms of error counts and location? I've had good results for limited times, running trial factoring on a card that became unusable for P-1 and then LL testing. As the memory cells failed over time, it came to cover more of the address space. P-1 is the most memory hungry, primality testing is intermediate, trial factoring has a small footprint. Eventually the card became unusably unreliable even for TF and was retired. How old is your card? (Warranty expired?) Do you see any visual artifacts if you use it to drive a display? EDIT: Several choices for gpu testing are listed at https://www.raymond.cc/blog/having-p...st-its-memory/

CUDAlucas -memtest - ran with 125 chunks of memory, 1 iteration. No errors found.
CUDAlucas double check - used M36 and M37 (exponents 2976221 and 3021377, respectively). Both came up prime, as expected.
memtestg80 - As stated before the errors only occur in the random block phase. It does not tell me where the errors occur, just how many. See the file memtestg80-output.txt which is included in the zip file attached to this post.
The GTX 1050 ti that I am testing is in a laptop, which I recieved in late June 2018. The warranty is active until 6/10/19.
No artifacts are present.
Attached Files
 gputesting.zip (10.1 KB, 232 views)

 2019-01-15, 23:35 #11 Prime95 P90 years forever!     Aug 2002 Yeehaw, FL 29·277 Posts Try a PM to TheJudger. He maintains mfaktc and is very familiar with all the CUDA changes over the years. mmff is a derivative of mfaktc.

 Similar Threads Thread Thread Starter Forum Replies Last Post Batalov Cunningham Tables 1 2011-04-14 10:23 Kosmaj Riesel Prime Search 756 2008-07-04 12:50 davar55 Puzzles 13 2007-09-12 17:35 davar55 Puzzles 4 2007-08-09 20:10 davar55 Puzzles 3 2007-05-14 22:05

All times are UTC. The time now is 10:32.

Sun Oct 2 10:32:20 UTC 2022 up 45 days, 8 hrs, 0 users, load averages: 0.84, 1.03, 1.01