mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   And now for something completely different (https://www.mersenneforum.org/forumdisplay.php?f=119)
-   -   Generalized Cullen/Woodall Sieving Software (https://www.mersenneforum.org/showthread.php?t=19935)

rogue 2014-11-05 00:38

Generalized Cullen/Woodall Sieving Software
 
I hope that in the next two weeks I can get back to the sieve. It's been sitting for a few weeks because the kernel has a bug from the last change I made and I've been busy. I'll be traveling so when I'm gone I'll have fewer distractions in the evening which should give me an opportunity to work on it.

My personal goal is to search all bases < 10000 up to n = 10000. If I can get the sieve working on OpenCL, then that would go a good way to accomplishing that as the new sieve will support multiple bases whereas gcwsieve only supports one at a time and MultiSieve is too slow (IMO) to what the GPU can do.

rogue 2014-11-07 18:29

[QUOTE=rogue;386882]I hope that in the next two weeks I can get back to the sieve. It's been sitting for a few weeks because the kernel has a bug from the last change I made and I've been busy. I'll be traveling so when I'm gone I'll have fewer distractions in the evening which should give me an opportunity to work on it.[/QUOTE]

Unfortunately the laptop I will be taking has some issue that crashes my application on Windows. I took a Windows patch during summer and it hasn't worked since. The problem is either in OpenCL itself or in Windows.

rogue 2014-11-11 18:51

In testing the code, sometimes it returns good results, sometimes it doesn't. It doesn't return invalid factors when that happens, but it misses factors. I'm at a loss regarding the behavior. One would expect an uninitialized variable, but that doesn't appear to be the problem. I suspect something is writing beyond a memory boundary, but I can't see that either. If anyone wants to take a look at the kernel, maybe you will see what I am doing wrong.

Note that when this does eventually work it will be many times faster than MultiSieve and gcwsiev-smallp for p < n. I haven't run it enough for higher p, but based upon some extrapolations it was 5x faster on my MacBook Pro.

rogue 2014-11-26 20:15

The problem is with the code to compute the inverse. The computed inverse is wrong. It appears to have something to do with using unsigned inputs, but when I change them to signed, the OpenCL compiler complains at compile time. Note that I used code posted in another thread on this forum. That code works fine in C. It just doesn't work fine in OpenCL C. I've a slower version that appears to be returning the correct results.

rogue 2014-11-26 20:47

Now that the computation of the modular inverse is correct, I can release the code as a beta. The factors that are found all appear to be valid. I have a hacked version of pfgw that can take the .log file and verify the factors in it. I need to release that soon.

Here is an example of what you might see when it runs, in this case on an Intel HD Graphics 4000. Note that I fixed that first string of output. I just didn't fix it in what I attached.

[code]
gcwsievecl64 -v -C -W -b200 -B201 -n2 -t2 -N1e5 -P1e6
gcwsievecl v1.0.1, a GPU program to find factors numbers of the form k*b^n+c where k, b, and n are fixed
Quick elimination of terms info (in order of check):
99998 because the term is even
63411 because the term is divisible by a prime < 100
Platform 0 is an Intel(R) Corporation Intel(R) OpenCL, version OpenCL 1.1
Device 0 is an Intel(R) Corporation Intel(R) HD Graphics 4000
workGroupSize = 51200 = 200 * 16 * 16 (blocks * workGroupSizeMultiple * deviceComputeUnits)
Running with 2 threads
Allocated memory (prior to sieving): 315 MB in CPU, 82 MB in GPU
Sieve started: (cmdline) 0 <= p < 1000000 with 36589 terms

Sieve complete: 3 <= p < 1000000 78498 primes tested
Clock time: 34.81 seconds at 2255 p/sec. Factors found: 26031
Processor time: 39.89 sec. (5.46 init + 34.43 sieve).
Seconds spent in CPU and GPU: 13.41 (cpu), 35.72 (gpu)
Percent of time spent in CPU vs. GPU: 27.29 (cpu), 72.71 (gpu)
CPU/GPU utilization: 1.15 (cores), 1.00 (devices)
Started with 36589 terms and sieved to 1000000. 10558 remaining terms written to gcw_201.pfgw
[/code]

Unlike gcwsieve, this program doesn't have restrictions on p (min p > max n). In fact, I could modify gcwsieve to remove that restriction as there is a little trick I used in this code to eliminate it, but if this program works correctly, I don't think that will be necessary.

I will be very curious to know how well this works and how fast (or slow) it is compared to gcwsieve on your systems.

rogue 2014-12-02 16:06

I've fixed various issues:

[code]
Fix factor rate calculation.
Fix writing ABC file as the wrong line was generated.
Fix reading ABC file since it always failed.
Various other code cleanup issues.[/code]

rogue 2014-12-14 19:09

The reason it doesn't work on my Mac Pro appears to be due to a broken driver for AMD GPUs.

TheCount 2014-12-22 03:29

I tried gcwsievecl_1.0.2 without success.
GPU crashes when it starts sieving:
[CODE]>gcwsievecl64.exe -v -C -W -b200 -B201 -n2 -t2 -N1e5 -P1e6
gcwsievecl v1.0.2, a GPU program to find factors of Cullen and Woodall numbers (n*b^n+c where b and n are fixed)
Quick elimination of terms info (in order of check):
99998 because the term is even
63411 because the term is divisible by a prime < 100
Platform 0 is an Advanced Micro Devices, Inc. AMD Accelerated Parallel Processing, version OpenCL 1.2 AMD-APP (1573.4)
Device 0 is an Advanced Micro Devices, Inc. Tahiti
workGroupSize = 409600 = 200 * 64 * 32 (blocks * workGroupSizeMultiple * deviceComputeUnits)
Running with 2 threads
Allocated memory (prior to sieving): 268 MB in CPU, 93 MB in GPU
Sieve started: (cmdline) 0 <= p < 1000000 with 36589 terms[/CODE]
I tried one thread and lower blocks but same issue.
Running Catalyst 14.9 with a 7970 AMD GPU.
Bug: Woodall formula in -h help incorrect.
Bug: -t1 says "Running with 1 threads", should be "Running with 1 thread".
Was able to compile with Visual Studio 2012 no problem.
Keen to use this app if it can be made to work.

TheCount 2014-12-22 14:03

Tried new Catalyst 14.12 without success. Did this app ever work with any version of Catalyst?

rogue 2014-12-22 14:17

Tahiti is probably slower than the CPU. I have another computer at home with a Tahiti, so I'll try running it on that.

1998golfer 2014-12-23 02:42

1 Attachment(s)
If I did this right, then it looks fine on windows 7 x64 with nvidia gtx 760.

[CODE]D:\...\testing>gcwsievecl64 -v -C -W -b200 -B201 -n2 -t2 -N1e5 -P1e6
gcwsievecl v1.0.2, a GPU program to find factors of Cullen and Woodall numbers (n*b^n+c where b and n are fixed)
Quick elimination of terms info (in order of check):
99998 because the term is even
63411 because the term is divisible by a prime < 100
Platform 0 is a NVIDIA Corporation NVIDIA CUDA, version OpenCL 1.1 CUDA 6.5.12
Device 0 is a NVIDIA Corporation GeForce GTX 760
workGroupSize = 38400 = 200 * 32 * 6 (blocks * workGroupSizeMultiple * deviceComputeUnits)
Running with 2 threads
Allocated memory (prior to sieving): 25 MB in CPU, 9 MB in GPU
Sieve started: (cmdline) 0 <= p < 1000000 with 36589 terms

Sieve complete: 3 <= p < 1000000 116898 primes tested
Clock time: 2.02 seconds at 57810 p/sec. Factors found: 26022
Processor time: 1.28 sec. (0.27 init + 1.01 sieve).
Seconds spent in CPU and GPU: 2.76 (cpu), 1.76 (gpu)
Percent of time spent in CPU vs. GPU: 61.10 (cpu), 38.90 (gpu)
CPU/GPU utilization: 0.63 (cores), 0.87 (devices)
Started with 36589 terms and sieved to 1000000. 10567 remaining terms written to gcw_201.pfgw[/CODE]

Attached is a zip containing gcwsieve.log and gcw_201.pfgw


Let me know if I did this right, or if any other tests I do can be of any help...


Thanks,
-1998golfer


All times are UTC. The time now is 19:37.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.