![]() |
|
|
#705 |
|
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
3·29·83 Posts |
Your quote says that George is right. "If the right operand is... greater than or equal to the length in bits of the promoted left operand, the result is undefined." Which is exactly what George said.
|
|
|
|
|
|
#706 |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
9,767 Posts |
|
|
|
|
|
|
#707 |
|
Romulan Interpreter
Jun 2011
Thailand
72·197 Posts |
@Chriss: Haha, no coffee? Happens to me very often when I post before my morning coffee
|
|
|
|
|
|
#708 |
|
Nov 2010
Germany
59710 Posts |
some progress at last:
After spending days to work around an OpenCL compiler abort, I finally got something to work ... to get some idea about it on AMD cards. It still finds only 10% of the selftest factors, and a couple of quirks may still slow it down.
Last fiddled with by Bdot on 2013-03-19 at 23:52 Reason: wording |
|
|
|
|
|
#709 | |
|
"Mr. Meeseeks"
Jan 2012
California, USA
87816 Posts |
Quote:
Very nice! As always, if you need testers...
|
|
|
|
|
|
|
#710 |
|
Nov 2010
Germany
3×199 Posts |
Thanks, I'll certainly come back to that, after I fixed the errors I found so far ...
The GPU sieving itself delivers the correct result, so either I have some mismatch with the number of threads, or the bit counting, or shared memory synchronization. I'll find it. The GCN test on HD 7850 with the same version: mfakto-GPU: 155 GHz-days/day, James: 153 GHz-days/day, mfakto-CPU: 180 GHz-days/day (3 CPU cores) |
|
|
|
|
|
#711 | |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23·271 Posts |
Quote:
|
|
|
|
|
|
|
#712 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23·271 Posts |
|
|
|
|
|
|
#713 | |
|
Nov 2010
Germany
3·199 Posts |
Quote:
Before that, I'd like to tell that I'm getting close to a pre-pre-version of the GPU sieve on OpenCL. Only one kernel (64-77 bit factor size) so far, fix vector size, and barely functional (i.e. room for performance-improvements). I'm looking for AMD-GPU owners who are willing to "waste" a few GHzdays by trying to rediscover a few factors in a complete run, as well as testing out the available settings, finding optimal values etc. As of today, the GPU sieve missed only ~70 of ~15000 factors I gave it in an extended self-test. Barely enough misses to hide "the only remaining bug" . I hope to fix that by the weekend, and would then send out the prototype.If you're willing to join, please let me know the GPU and OS you need it for as well as your email address (PM accepted ).Thanks for your help, Bdot Last fiddled with by Bdot on 2013-04-04 at 15:43 Reason: and email |
|
|
|
|
|
|
#714 | |
|
Banned
"Luigi"
Aug 2002
Team Italia
2×3×11×73 Posts |
Quote:
![]() Luigi |
|
|
|
|
|
|
#715 |
|
Nov 2010
Germany
11258 Posts |
Finally, the the "very last" bug was in the GPU sieving code itself. I almost issued a warning for mfaktc, but it was also a self-made one in my attempt to imitate the CUDA 64-bit shifts, something like this:
mask = i67 > 64 ? 0 : ((ulong) 1 << i67); So no problem for mfaktc found during my porting efforts. I'm just happy we have enough test cases so that this one was discovered. Now, that everything is working for one kernel, I'll start porting the others. And I'll check out a few alternative implementations for performance. Let's see what feedback I receive from the testers. In case it is already worth releasing it, I may move the optimizations to a later version. |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| gpuOwL: an OpenCL program for Mersenne primality testing | preda | GpuOwl | 2718 | 2021-07-06 18:30 |
| mfaktc: a CUDA program for Mersenne prefactoring | TheJudger | GPU Computing | 3497 | 2021-06-05 12:27 |
| LL with OpenCL | msft | GPU Computing | 433 | 2019-06-23 21:11 |
| OpenCL for FPGAs | TObject | GPU Computing | 2 | 2013-10-12 21:09 |
| Program to TF Mersenne numbers with more than 1 sextillion digits? | Stargate38 | Factoring | 24 | 2011-11-03 00:34 |