![]() |
1 Attachment(s)
I'm getting a failure on the selftest for an HD2500.
ERROR: selftest failed for M60004333 (mfakto_cl_63)## |
1 Attachment(s)
Radeon HD 7770, passed -st2.
|
[QUOTE=potonono;380490]I'm getting a failure on the selftest for an HD2500.
ERROR: selftest failed for M60004333 (mfakto_cl_63)##[/QUOTE] Hmm, looks like cl_mg62 and mfakto_cl_63 in general are failing for me too...(HD4600) |
[QUOTE=kracker;380522]Hmm, looks like cl_mg62 and mfakto_cl_63 in general are failing for me too...(HD4600)[/QUOTE]
It looks like [URL="https://github.com/Bdot42/mfakto/commit/de4dbe2fd1f32f357dcfbc7054d2a54467769589"]this change[/URL] was premature. I'm rolling it back. The Montgomery kernel, however, was not changed for a while - not sure why that one would be affected. I'll not release anything within the next two weeks as I have no access to my test machines ... |
[QUOTE=Bdot;380409]Can you tell a bit more about your system:
[LIST][*]Which Graphics drivers (AMD, Intel and/or NVIDIA, and which version) I see a crash during compile as well when trying to run it on my Quadro FX 880M with NV drivers 334.something. It's an NV driver bug, it used to work with older drivers.[*]Interesting detail that GPUType=INTEL make it work ... that one skips optimization and enables a few workarounds in the code.[*]Does mfakto -d 11 / -d 12 / -d 13 / -d 21 / 22 / 23 / ... try to use other devices? (keep increasing the two digits separately until mfakto tells something like "Error: Only 1 platforms found. Cannot use platform 3..." or "Error: Only 1 devices found. Cannot use device 3..:" Does any of the settings select the HD4000? Is the HD4000 listed in the output of "clinfo"?[*]How did you check the HD4000 is enabled? Does it have a monitor connected?[/LIST][/QUOTE] Sorry, my fault - at some point the HD4000 had become disabled, once I enabled it again everything was fine. Also sorry for the delay, was away from my test machine for some time. |
Sorry for the delay on this ... I was on vacation, and had a lot of other stuff to do after returning ...
Thank you for your reports: [QUOTE=Jayder;380461]The first issue appears to be an old one (present in 0.14): SievePrimes doesn't seem to adjust after a certain point (or in some cases at all) in certain bit ranges. NumThreads is somewhat involved, but is probably not the culprit. ... As I mentioned, it's only certain bit ranges, but it also depends on the exponent. I tested both this 4M exponent (above) as well as an 85M exponent. For the 4M, the SievePrimes has trouble adjusting up to 64 bits (64-65 adjusting fine) and the 85M exponent has trouble adjusting up to 68 bits (68-69 adjusting fine). I noticed all of this first during the selftest (st). Attached are some files. Jayder-NS3 shows that with NumStreams 3 (or less, but not shown here) SPrimes climbs for a while but stops. Jayder-NS4 shows that with NumStreams 4 (or greater, but not shown) SPrimes doesn't change at all. +/-, s/S, and p/P seem to work as intended, even when SPrimes is stuck as above, but it does not unstick it. [/QUOTE] Good observation and an easy explanation: If not each of the NumStreams has sent at least 2 blocks of factor candidates to the GPU, then the resulting timing information is regarded unreliable and no SievePrimes adjustment will be done. It basically means that the job is too small to tell anything about the CPU utilization during the trial factoring, because they were not running in parallel: Each stream will prepare a block of factor candidates, send it off to the GPU and then start preparing the second block. So only when the second block is prepared, the GPU has a chance to run in parallel. Lowering GridSize will help as that reduces the block size. If you regularly run such small tasks, this may be a good thing anyway, as on average half a block of FC's is wasted per class - if you have only 2 blocks per class, that is 25% wasted. An even better approach might be to run the GPU sieve with MoreClasses=0 for such tasks. [QUOTE=Jayder;380461] The second thing which I noticed is that time per class for my 4M exponent, 63-64 bits, has increased by at least 7%. The other two files in the archive contain brief logs showing this. There seemed to be no difference with the 85M exponent I tested. Settings all the same, computer idle. [/QUOTE] I need to see if the same kernel is selected as before ... Maybe I did something wrong with the kernel precedence for APUs ... I'll check your logs and come back to that separately. [QUOTE=Jayder;380461]Finally, I'm told that my "device does not support double precision operations." I don't know enough to know if this is right or not (it probably is), but I thought I'd check. I have an A4-3420 (with HD 6410D). I know the GPU does not have DP, but your description makes it sound like the DP is for the CPU. I don't know, me dumb. :cmd: [/QUOTE] "device" in this case means the device where the OpenCL kernels are running, i.e. the GPU part of your APU. And that one is VLIW5 without DP support. No error here, and also no problem for running mfakto. I changed "WARNING" into "INFO" for the next version. [QUOTE=Jayder;380461]I hope I have helped more than hindered. Thank you again (and kracker, and the many others who've helped).[/QUOTE] I'm really thankful for all feedback I can get. There's no way for me to test it on all the possible devices - I need your help with that. Also in respect to unclear descriptions or behavior: It is not sufficient that something is clear for me, I have a special view on mfakto. Please do ask, others may have the same question :smile: |
Finally I managed to create a 74-bit kernel that helps straightening out the performance of mfakto when the factor sizes increase (it moves out the big drop one more bit). My HD7950@1100MHz now runs 100M candidates:
bits : GHz-days/day 67-68: 448 68-69: 476 69-70: 459 70-71: 416 71-72: 417 72-73: 418 [COLOR=DarkGreen]73-74: 408[/COLOR] <== the new one, was 361 before 74-82: 361 Attempts of achieving this using a new 5x16-bit kernel or an improved montgomery kernel yielded slow results. The solution is a "4x15-bit + 1x16-bit" kernel ... |
Wonderful, thank you! :tu:
I'll be watching this channel for news of the new version's official release. Rodrigo |
[QUOTE=Bdot;382330]Finally I managed to create a 74-bit kernel that helps straightening out the performance of mfakto when the factor sizes increase (it moves out the big drop one more bit). My HD7950@1100MHz now runs 100M candidates:
bits : GHz-days/day 67-68: 448 68-69: 476 69-70: 459 70-71: 416 71-72: 417 72-73: 418 [COLOR=DarkGreen]73-74: 408[/COLOR] <== the new one, was 361 before 74-82: 361 Attempts of achieving this using a new 5x16-bit kernel or an improved montgomery kernel yielded slow results. The solution is a "4x15-bit + 1x16-bit" kernel ...[/QUOTE] Niice :smile: If you need any testing/ers, I'm up for it :razz: |
Thanks for the reply and the great work, Bdot. :tu:
|
I'm available for [testing/playing with a beta] too. Eager to raise the limit of my Misfit from 73 to 74 :wink:
Very good job Bdot! (as usually) |
| All times are UTC. The time now is 23:03. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.