mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2014-08-15, 23:52   #1189
potonono
 
potonono's Avatar
 
Jun 2005
USA, IL

110000012 Posts
Default

I'm getting a failure on the selftest for an HD2500.

ERROR: selftest failed for M60004333 (mfakto_cl_63)##
Attached Files
File Type: txt hd2500-test.txt (10.8 KB, 160 views)
potonono is offline   Reply With Quote
Old 2014-08-16, 01:09   #1190
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

216810 Posts
Default

Radeon HD 7770, passed -st2.
Attached Files
File Type: zip R_HD7770_pi.zip (19.4 KB, 75 views)
kracker is offline   Reply With Quote
Old 2014-08-16, 15:00   #1191
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23·271 Posts
Default

Quote:
Originally Posted by potonono View Post
I'm getting a failure on the selftest for an HD2500.

ERROR: selftest failed for M60004333 (mfakto_cl_63)##
Hmm, looks like cl_mg62 and mfakto_cl_63 in general are failing for me too...(HD4600)

Last fiddled with by kracker on 2014-08-16 at 15:00
kracker is offline   Reply With Quote
Old 2014-08-19, 10:51   #1192
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

11258 Posts
Default

Quote:
Originally Posted by kracker View Post
Hmm, looks like cl_mg62 and mfakto_cl_63 in general are failing for me too...(HD4600)
It looks like this change was premature. I'm rolling it back. The Montgomery kernel, however, was not changed for a while - not sure why that one would be affected.

I'll not release anything within the next two weeks as I have no access to my test machines ...
Bdot is offline   Reply With Quote
Old 2014-08-26, 07:08   #1193
Antonio
 
Antonio's Avatar
 
"Antonio Key"
Sep 2011
UK

32·59 Posts
Default

Quote:
Originally Posted by Bdot View Post
Can you tell a bit more about your system:
  • Which Graphics drivers (AMD, Intel and/or NVIDIA, and which version)
    I see a crash during compile as well when trying to run it on my Quadro FX 880M with NV drivers 334.something. It's an NV driver bug, it used to work with older drivers.
  • Interesting detail that GPUType=INTEL make it work ... that one skips optimization and enables a few workarounds in the code.
  • Does mfakto -d 11 / -d 12 / -d 13 / -d 21 / 22 / 23 / ... try to use other devices? (keep increasing the two digits separately until mfakto tells something like "Error: Only 1 platforms found. Cannot use platform 3..." or "Error: Only 1 devices found. Cannot use device 3..:" Does any of the settings select the HD4000? Is the HD4000 listed in the output of "clinfo"?
  • How did you check the HD4000 is enabled? Does it have a monitor connected?
Sorry, my fault - at some point the HD4000 had become disabled, once I enabled it again everything was fine.
Also sorry for the delay, was away from my test machine for some time.
Antonio is offline   Reply With Quote
Old 2014-09-04, 14:46   #1194
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3×199 Posts
Default

Sorry for the delay on this ... I was on vacation, and had a lot of other stuff to do after returning ...

Thank you for your reports:
Quote:
Originally Posted by Jayder View Post
The first issue appears to be an old one (present in 0.14): SievePrimes doesn't seem to adjust after a certain point (or in some cases at all) in certain bit ranges. NumThreads is somewhat involved, but is probably not the culprit.
...

As I mentioned, it's only certain bit ranges, but it also depends on the exponent. I tested both this 4M exponent (above) as well as an 85M exponent. For the 4M, the SievePrimes has trouble adjusting up to 64 bits (64-65 adjusting fine) and the 85M exponent has trouble adjusting up to 68 bits (68-69 adjusting fine).

I noticed all of this first during the selftest (st). Attached are some files. Jayder-NS3 shows that with NumStreams 3 (or less, but not shown here) SPrimes climbs for a while but stops. Jayder-NS4 shows that with NumStreams 4 (or greater, but not shown) SPrimes doesn't change at all.

+/-, s/S, and p/P seem to work as intended, even when SPrimes is stuck as above, but it does not unstick it.
Good observation and an easy explanation: If not each of the NumStreams has sent at least 2 blocks of factor candidates to the GPU, then the resulting timing information is regarded unreliable and no SievePrimes adjustment will be done. It basically means that the job is too small to tell anything about the CPU utilization during the trial factoring, because they were not running in parallel: Each stream will prepare a block of factor candidates, send it off to the GPU and then start preparing the second block. So only when the second block is prepared, the GPU has a chance to run in parallel.

Lowering GridSize will help as that reduces the block size. If you regularly run such small tasks, this may be a good thing anyway, as on average half a block of FC's is wasted per class - if you have only 2 blocks per class, that is 25% wasted. An even better approach might be to run the GPU sieve with MoreClasses=0 for such tasks.


Quote:
Originally Posted by Jayder View Post
The second thing which I noticed is that time per class for my 4M exponent, 63-64 bits, has increased by at least 7%. The other two files in the archive contain brief logs showing this. There seemed to be no difference with the 85M exponent I tested. Settings all the same, computer idle.
I need to see if the same kernel is selected as before ... Maybe I did something wrong with the kernel precedence for APUs ... I'll check your logs and come back to that separately.

Quote:
Originally Posted by Jayder View Post
Finally, I'm told that my "device does not support double precision operations." I don't know enough to know if this is right or not (it probably is), but I thought I'd check. I have an A4-3420 (with HD 6410D). I know the GPU does not have DP, but your description makes it sound like the DP is for the CPU. I don't know, me dumb.
"device" in this case means the device where the OpenCL kernels are running, i.e. the GPU part of your APU. And that one is VLIW5 without DP support. No error here, and also no problem for running mfakto. I changed "WARNING" into "INFO" for the next version.

Quote:
Originally Posted by Jayder View Post
I hope I have helped more than hindered. Thank you again (and kracker, and the many others who've helped).
I'm really thankful for all feedback I can get. There's no way for me to test it on all the possible devices - I need your help with that. Also in respect to unclear descriptions or behavior: It is not sufficient that something is clear for me, I have a special view on mfakto. Please do ask, others may have the same question

Last fiddled with by Bdot on 2014-09-04 at 14:47 Reason: Why do I see typos only after submitting?
Bdot is offline   Reply With Quote
Old 2014-09-06, 22:47   #1195
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3×199 Posts
Default

Finally I managed to create a 74-bit kernel that helps straightening out the performance of mfakto when the factor sizes increase (it moves out the big drop one more bit). My HD7950@1100MHz now runs 100M candidates:

bits : GHz-days/day
67-68: 448
68-69: 476
69-70: 459
70-71: 416
71-72: 417
72-73: 418
73-74: 408 <== the new one, was 361 before
74-82: 361

Attempts of achieving this using a new 5x16-bit kernel or an improved montgomery kernel yielded slow results. The solution is a "4x15-bit + 1x16-bit" kernel ...

Last fiddled with by Bdot on 2014-09-06 at 22:47
Bdot is offline   Reply With Quote
Old 2014-09-06, 23:04   #1196
Rodrigo
 
Rodrigo's Avatar
 
Jun 2010
Pennsylvania

2·467 Posts
Default

Wonderful, thank you!

I'll be watching this channel for news of the new version's official release.

Rodrigo
Rodrigo is offline   Reply With Quote
Old 2014-09-07, 03:01   #1197
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

87816 Posts
Default

Quote:
Originally Posted by Bdot View Post
Finally I managed to create a 74-bit kernel that helps straightening out the performance of mfakto when the factor sizes increase (it moves out the big drop one more bit). My HD7950@1100MHz now runs 100M candidates:

bits : GHz-days/day
67-68: 448
68-69: 476
69-70: 459
70-71: 416
71-72: 417
72-73: 418
73-74: 408 <== the new one, was 361 before
74-82: 361

Attempts of achieving this using a new 5x16-bit kernel or an improved montgomery kernel yielded slow results. The solution is a "4x15-bit + 1x16-bit" kernel ...
Niice
If you need any testing/ers, I'm up for it
kracker is offline   Reply With Quote
Old 2014-09-07, 03:18   #1198
Jayder
 
Jayder's Avatar
 
Dec 2012

27810 Posts
Default

Thanks for the reply and the great work, Bdot.
Jayder is offline   Reply With Quote
Old 2014-09-07, 04:07   #1199
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

3·3,221 Posts
Default

I'm available for [testing/playing with a beta] too. Eager to raise the limit of my Misfit from 73 to 74
Very good job Bdot! (as usually)
LaurV is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
gpuOwL: an OpenCL program for Mersenne primality testing preda GpuOwl 2719 2021-08-05 22:43
mfaktc: a CUDA program for Mersenne prefactoring TheJudger GPU Computing 3497 2021-06-05 12:27
LL with OpenCL msft GPU Computing 433 2019-06-23 21:11
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
Program to TF Mersenne numbers with more than 1 sextillion digits? Stargate38 Factoring 24 2011-11-03 00:34

All times are UTC. The time now is 01:06.


Fri Aug 6 01:06:09 UTC 2021 up 13 days, 19:35, 1 user, load averages: 2.37, 2.40, 2.33

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.