![]() |
[QUOTE=Ethan (EO);277845]I'm still failing about half of the selftests with that kernel file.
... edit: No luck -- unziped your src file directly, put the updated cl file in src, built Release/x64, and ran. No runtime cl compilation errors, but -- aha -- just noticed that it is passing the first test in each test case, and then failing the rest: [CODE] ########## testcase 6/1558 ########## tf(53134687, 68, 69, ...); k_min = 2999999998380 - k_max = 3300000000000 Using GPU kernel "mfakto_cl_71_8" class | candidates | time | avg. rate | SievePrimes | ETA | avg. wait 3120/4620 | 14.16M | 0.468s | 30.25M/s | 25000 | n.a. | 10798us Result[00]: M53134687 has a factor: 337073926433410950601 found 1 factor(s) for M53134687 from 2^68 to 2^69 [mfakto 0.09-Win mfakto_cl_71_ 8] selftest for M53134687 passed (mfakto_cl_71_8)! tf(): total time spent: 0.487s tf(53134687, 68, 69, ...); k_min = 2999999998380 - k_max = 3300000000000 Using GPU kernel "mfakto_cl_71_4" class | candidates | time | avg. rate | SievePrimes | ETA | avg. wait 3120/4620 | 14.16M | 0.215s | 65.84M/s | 25000 | n.a. | 0us no factor for M53134687 from 2^68 to 2^69 [mfakto 0.09-Win mfakto_cl_71_4] ERROR: selftest failed for M53134687 (mfakto_cl_71_4) no factor found tf(): total time spent: 0.234s tf(53134687, 68, 69, ...); k_min = 2999999998380 - k_max = 3300000000000 Using GPU kernel "mfakto_cl_barrett79" class | candidates | time | avg. rate | SievePrimes | ETA | avg. wait 3120/4620 | 14.16M | 0.214s | 66.15M/s | 25000 | n.a. | 0us no factor for M53134687 from 2^68 to 2^69 [mfakto 0.09-Win mfakto_cl_barrett79] ERROR: selftest failed for M53134687 (mfakto_cl_barrett79) no factor found tf(): total time spent: 0.233s tf(53134687, 68, 69, ...); k_min = 2999999998380 - k_max = 3300000000000 Using GPU kernel "mfakto_cl_barrett92" class | candidates | time | avg. rate | SievePrimes | ETA | avg. wait 3120/4620 | 14.16M | 0.214s | 66.15M/s | 25000 | n.a. | 0us no factor for M53134687 from 2^68 to 2^69 [mfakto 0.09-Win mfakto_cl_barrett92] ERROR: selftest failed for M53134687 (mfakto_cl_barrett92) no factor found tf(): total time spent: 0.232s [/CODE]And that's consistent across the testcases. [/QUOTE] There should be no need to rebuild anything - just replace the kernel file next to the original 0.09 binary. On the other hand, rebuilding should not hurt. I'll update my home PC over the weekend, maybe I can reproduce the error there. The symptom looks a bit like something not initialized in the correct place ... and now it depends on memory layout or other rather random things. |
[QUOTE=Bdot;277904]There should be no need to rebuild anything - just replace the kernel file next to the original 0.09 binary.[/QUOTE]
Yeah -- I just rebuilt from your unaltered project to make sure I hadn't messed something up on the executable I had built previously :) Ethan |
I reordered the test cases to see if the failure pattern was the same, and it turns out that the order of the kernels within a testcase is irrelevant -- mfakto_cl_71_4, mfakto_cl_barrett79, and mfakto_cl_barrett92 are failing, but mfakto_cl_71_8 is working.
|
Hello,
just a shot into the dark: The average wait is 0 when the known factor is not found: does the GPU-kernel run at all? Oliver |
[QUOTE=TheJudger;278000]Hello,
just a shot into the dark: The average wait is 0 when the known factor is not found: does the GPU-kernel run at all? Oliver[/QUOTE] Yep -- they are running. Just turned on the kernel tracing stuff in the OpenCL kernels, and I've found a difference: 32 bit build cl_71_4: [CODE] ########## testcase 1/1558 ########## tf(50804297, 67, 68, ...); k_min = 1599999998520 - k_max = 1900000000000 Using GPU kernel "mfakto_cl_71_4" class | candidates | time | avg. rate | SievePrimes | ETA | avg. wait mfakto_cl_71: tid=0: p=3073649, *2 =6:e6c92, k=0, 0, 0, 0:17487, 17487, 17487, 1 7487:6e8773, 6ef3bb, 6f05c7, 6f3beb, f=8d029, 8d029, 8d029, 8d02a:fccff8, ff5fc2 , ffcd0e, 114f3:77c397, 53e4a7, a33f7f, 915007, shift=19, b=0, 0, 0, 0:1, 1, 1, 1:0, 0, 0, 0:0, 0, 0, 0:0, 0, 0, 0:0, 0, 0, 0 mod_144_72#1: qf=3.51844E+013, nf=6.15105E-021, *=2.16421E-007, qi=0 mod_144_72#1: q=0:1:0:0:0:0, n=8d029:fccff8:77c397, qi=0 mod_144_72#1.1: nn=0:0:0:0:0:0 mod_144_72#1.2: nn=0:0:0:0:0:0 mod_144_72#1.3: nn=0:0:0:0:0:0Error: The arguments don't match the printf format string. printf(mod_144_72#1.3: nn=%x:%x:%x:%x:%x:%x [/CODE] 64bit build cl_71_4: [CODE] ########## testcase 1/1558 ########## tf(50804297, 67, 68, ...); k_min = 1599999998520 - k_max = 1900000000000 Using GPU kernel "mfakto_cl_71_4" class | candidates | time | avg. rate | SievePrimes | ETA | avg. wait mfakto_cl_71: tid=0: p=3073649, *2 =6:e6c92, k=0, 0, 0, 0:17487, 17487, 17487, 1 7487:6e8773, 6ef3bb, 6f05c7, 6f3beb, f=8d029, 8d029, 8d029, 8d02a:fccff8, ff5fc2 , ffcd0e, 114f3:77c397, 53e4a7, a33f7f, 915007, shift=19, b=0, 0, 0, 0:0, 0, 0, 0:0, 0, 0, 0:0, 0, 0, 0:0, 0, 0, 0:0, 0, 0, 0 mod_144_72#1: qf=0.000000, nf=6.15105E-021, *=0.000000, qi=0 mod_144_72#1: q=0:0:0:0:0:0, n=8d029:fccff8:77c397, qi=0 mod_144_72#1.1: nn=0:0:0:0:0:0 mod_144_72#1.2: nn=0:0:0:0:0:0 mod_144_72#1.3: nn=0:0:0:0:0:0Error: The arguments don't match the printf format string. printf(mod_144_72#1.3: nn=%x:%x:%x:%x:%x:%x [/CODE] 64 bit build cl_71_8: [CODE] ########## testcase 1/1558 ########## tf(50804297, 67, 68, ...); k_min = 1599999998520 - k_max = 1900000000000 Using GPU kernel "mfakto_cl_71_8" class | candidates | time | avg. rate | SievePrimes | ETA | avg. wait mfakto_cl_71: tid=0: p=3073649, *2 =6:e6c92, k=0, 0, 0, 0, 0, 0, 0, 0:17487, 174 87, 17487, 17487, 17487, 17487, 17487, 17487:6e8773, 6ef3bb, 6f05c7, 6f3beb, 6fd e57, 70147b, 706eb7, 71c59b, f=8d029, 8d029, 8d029, 8d02a, 8d02a, 8d02a, 8d02a, 8d02a:fccff8, ff5fc2, ffcd0e, 114f3, 4eca2, 63487, 85704, 1073ae:77c397, 53e4a7, a33f7f, 915007, 5b819f, 499227, d6585f, ba1667, shift=19, b=0, 0, 0, 0, 0, 0, 0 , 0:1, 1, 1, 1, 1, 1, 1, 1:0, 0, 0, 0, 0, 0, 0, 0:0, 0, 0, 0, 0, 0, 0, 0:0, 0, 0 , 0, 0, 0, 0, 0:0, 0, 0, 0, 0, 0, 0, 0 mod_144_72#1: qf=3.51844E+013, nf=6.15105E-021, *=2.16421E-007, qi=0 mod_144_72#1: q=0:1:0:0:0:0, n=8d029:fccff8:77c397, qi=0 mod_144_72#1.1: nn=0:0:0:0:0:0 mod_144_72#1.2: nn=0:0:0:0:0:0 mod_144_72#1.3: nn=0:0:0:0:0:0Error: The arguments don't match the printf format string. printf(mod_144_72#1.3: nn=%x:%x:%x:%x:%x:%x [/CODE] I haven't worked back yet to see if b is correct in the caller in the 64bit/v4 case... and haven't finished changing the kernels' printf format strings to vector types as you can see :) |
qf = 0.00000 doesn't look good.[LIST][*]precomputation failed[*]floatingpoint conversion failed (unlikely?)[*]data transfer doesn't work / isn't finished[*]something else[/LIST]
|
[QUOTE=TheJudger;278049]qf = 0.00000 doesn't look good.[LIST][*]precomputation failed[*]floatingpoint conversion failed (unlikely?)[*]data transfer doesn't work / isn't finished[*]something else[/LIST][/QUOTE]
I did not yet change all the trace statements to work for vectors. The kernel trace is only accurate when tracing non-vectored kernels. That´s also the reason for the "arguments don´t match" message. Looks like some work to do ... Edit: And the average wait can be zero for mfakto because the necessary wait time for the last block of a class is not included in the calculation (one of the differences to the earlier mfaktc versions, to work better on small classes). |
The kernels do not receive the input parameter that holds the pre-processing information, but get a zero there.
With the kernel tracing fixed and set to at least level 3, the mfakto_cl_71_4 kernel will receive the correct parameters and find the factors. So far I did not get the barrett kernels to receive all input parameters. My guess is that the optimizer removed them as it did not deem them important. But trying to build the kernel non-optimized crashes the kernel compiler. In the light of this it is probably not helping that the barrett kernels are ~4% faster with 11.10. Probably because crucial parts have been optimized away. I guess we just need to skip the Catalyst 11.10 version :-( |
I just recently got myself an HD 6770 and picked up mfakto .09 and when I try to run the 64 bit windows exe I get multiple errors about too many instances of mad24 and then a message saying there were 27 errors and the program shutdown. (paraphrasing as I am not sitting AT the machine atm) If I run the 32 bit exe, everything appears to run normal. I have 11.9 drivers installed as I read on here about the problems with 11.10.
|
[QUOTE=bcp19;278121]I just recently got myself an HD 6770 and picked up mfakto .09 and when I try to run the 64 bit windows exe I get multiple errors about too many instances of mad24 and then a message saying there were 27 errors and the program shutdown. (paraphrasing as I am not sitting AT the machine atm) If I run the 32 bit exe, everything appears to run normal. I have 11.9 drivers installed as I read on here about the problems with 11.10.[/QUOTE]
[code] Select device - Get device info - Compiling kernels. BUILD OUTPUT C:\Users\root\AppData\Local\Temp\OCLCEF5.tmp.cl(2192): error: more than one instance of overloaded function "mad24" matches the argument list: function "mad24(int, int, int) C++" function "mad24(uint, uint, uint) C++" argument types are: (uint, int, uint) *res_hi = mad24(mul_hi(a,b), 256, (*res_lo >> 24)); ^ ... C:\Users\root\AppData\Local\Temp\OCLCEF5.tmp.cl(2726): error: more than one instance of overloaded function "mad24" matches the argument list: function "mad24(int, int, int) C++" function "mad24(uint, uint, uint) C++" argument types are: (uint, int, uint) nn.d2 = mad24(mul_hi(n.d1, qi), 256, tmp >> 24); ^ 27 errors detected in the compilation of "C:\Users\root\AppData\Local\Temp\OCLCEF5.tmp.cl". Internal error: clc compiler invocation failed. END OF BUILD OUTPUT Error -11: clBuildProgram init_CL(5, 0) failed [/code]This is exactly the error that on my machine started appearing with the installation of Catalyst 11.10. These compilation errors are easy to solve, but mfakto will still fail the selftest as there are other bugs in the compiled kernel. I tried deinstalling 11.10 and went back as far as 11.6 - the errors remain. It's not the first time that the ATI drivers do not correctly deinstall themselves. Maybe they do but some hardware switch remained in a bad position. Anyway: the sad result is: once in that state, I could not get out. (I cannot try reinstalling the machine.) I'll see if I can build an "11.10-workaround-version" for trapped folks like me. There will certainly be a performance-penalty. Where it still works, it's probably faster to run the 32-bit version for now - on my machine the 32-bit version fails as well. Strange, strange, strange. Maybe there's still a bug in the main program that just has these side effects. |
Hmmm, does that mean the exp's I've been doing on the 32 bit client are suspect?
|
All times are UTC. The time now is 00:22. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.