mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > And now for something completely different

Reply
 
Thread Tools
Old 2014-11-05, 00:38   #1
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

19·313 Posts
Default Generalized Cullen/Woodall Sieving Software

I hope that in the next two weeks I can get back to the sieve. It's been sitting for a few weeks because the kernel has a bug from the last change I made and I've been busy. I'll be traveling so when I'm gone I'll have fewer distractions in the evening which should give me an opportunity to work on it.

My personal goal is to search all bases < 10000 up to n = 10000. If I can get the sieve working on OpenCL, then that would go a good way to accomplishing that as the new sieve will support multiple bases whereas gcwsieve only supports one at a time and MultiSieve is too slow (IMO) to what the GPU can do.
rogue is offline   Reply With Quote
Old 2014-11-07, 18:29   #2
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

19·313 Posts
Default

Quote:
Originally Posted by rogue View Post
I hope that in the next two weeks I can get back to the sieve. It's been sitting for a few weeks because the kernel has a bug from the last change I made and I've been busy. I'll be traveling so when I'm gone I'll have fewer distractions in the evening which should give me an opportunity to work on it.
Unfortunately the laptop I will be taking has some issue that crashes my application on Windows. I took a Windows patch during summer and it hasn't worked since. The problem is either in OpenCL itself or in Windows.
rogue is offline   Reply With Quote
Old 2014-11-11, 18:51   #3
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

19×313 Posts
Default

In testing the code, sometimes it returns good results, sometimes it doesn't. It doesn't return invalid factors when that happens, but it misses factors. I'm at a loss regarding the behavior. One would expect an uninitialized variable, but that doesn't appear to be the problem. I suspect something is writing beyond a memory boundary, but I can't see that either. If anyone wants to take a look at the kernel, maybe you will see what I am doing wrong.

Note that when this does eventually work it will be many times faster than MultiSieve and gcwsiev-smallp for p < n. I haven't run it enough for higher p, but based upon some extrapolations it was 5x faster on my MacBook Pro.

Last fiddled with by rogue on 2020-09-24 at 19:47
rogue is offline   Reply With Quote
Old 2014-11-26, 20:15   #4
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

19·313 Posts
Default

The problem is with the code to compute the inverse. The computed inverse is wrong. It appears to have something to do with using unsigned inputs, but when I change them to signed, the OpenCL compiler complains at compile time. Note that I used code posted in another thread on this forum. That code works fine in C. It just doesn't work fine in OpenCL C. I've a slower version that appears to be returning the correct results.

Last fiddled with by rogue on 2014-11-26 at 20:21
rogue is offline   Reply With Quote
Old 2014-11-26, 20:47   #5
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

19×313 Posts
Default

Now that the computation of the modular inverse is correct, I can release the code as a beta. The factors that are found all appear to be valid. I have a hacked version of pfgw that can take the .log file and verify the factors in it. I need to release that soon.

Here is an example of what you might see when it runs, in this case on an Intel HD Graphics 4000. Note that I fixed that first string of output. I just didn't fix it in what I attached.

Code:
gcwsievecl64 -v -C -W -b200 -B201 -n2 -t2 -N1e5 -P1e6
gcwsievecl v1.0.1, a GPU program to find factors numbers of the form k*b^n+c where k, b, and n are fixed
Quick elimination of terms info (in order of check):
    99998 because the term is even
    63411 because the term is divisible by a prime < 100
Platform 0 is an Intel(R) Corporation Intel(R) OpenCL, version OpenCL 1.1
Device 0 is an Intel(R) Corporation Intel(R) HD Graphics 4000
workGroupSize = 51200 = 200 * 16 * 16 (blocks * workGroupSizeMultiple * deviceComputeUnits)
Running with 2 threads
Allocated memory (prior to sieving):  315 MB in CPU, 82 MB in GPU
Sieve started: (cmdline) 0 <= p < 1000000 with 36589 terms

Sieve complete: 3 <= p < 1000000  78498 primes tested
Clock time: 34.81 seconds at 2255 p/sec.  Factors found: 26031
Processor time: 39.89 sec. (5.46 init + 34.43 sieve).
Seconds spent in CPU and GPU: 13.41 (cpu), 35.72 (gpu)
Percent of time spent in CPU vs. GPU: 27.29 (cpu), 72.71 (gpu)
CPU/GPU utilization: 1.15 (cores), 1.00 (devices)
Started with 36589 terms and sieved to 1000000.  10558 remaining terms written to gcw_201.pfgw
Unlike gcwsieve, this program doesn't have restrictions on p (min p > max n). In fact, I could modify gcwsieve to remove that restriction as there is a little trick I used in this code to eliminate it, but if this program works correctly, I don't think that will be necessary.

I will be very curious to know how well this works and how fast (or slow) it is compared to gcwsieve on your systems.

Last fiddled with by rogue on 2020-09-24 at 19:47
rogue is offline   Reply With Quote
Old 2014-12-02, 16:06   #6
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

173B16 Posts
Default

I've fixed various issues:

Code:
   Fix factor rate calculation.
   Fix writing ABC file as the wrong line was generated.
   Fix reading ABC file since it always failed.
   Various other code cleanup issues.

Last fiddled with by rogue on 2020-09-24 at 19:47
rogue is offline   Reply With Quote
Old 2014-12-14, 19:09   #7
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

19×313 Posts
Default

The reason it doesn't work on my Mac Pro appears to be due to a broken driver for AMD GPUs.
rogue is offline   Reply With Quote
Old 2014-12-22, 03:29   #8
TheCount
 
TheCount's Avatar
 
Sep 2013
Perth, Au.

2×72 Posts
Default

I tried gcwsievecl_1.0.2 without success.
GPU crashes when it starts sieving:
Code:
>gcwsievecl64.exe -v -C -W -b200 -B201 -n2 -t2 -N1e5 -P1e6
gcwsievecl v1.0.2, a GPU program to find factors of Cullen and Woodall numbers (n*b^n+c where b and n are fixed)
Quick elimination of terms info (in order of check):
99998 because the term is even
63411 because the term is divisible by a prime < 100
Platform 0 is an Advanced Micro Devices, Inc. AMD Accelerated Parallel Processing, version OpenCL 1.2 AMD-APP (1573.4)
Device 0 is an Advanced Micro Devices, Inc. Tahiti
workGroupSize = 409600 = 200 * 64 * 32 (blocks * workGroupSizeMultiple * deviceComputeUnits)
Running with 2 threads
Allocated memory (prior to sieving): 268 MB in CPU, 93 MB in GPU
Sieve started: (cmdline) 0 <= p < 1000000 with 36589 terms
I tried one thread and lower blocks but same issue.
Running Catalyst 14.9 with a 7970 AMD GPU.
Bug: Woodall formula in -h help incorrect.
Bug: -t1 says "Running with 1 threads", should be "Running with 1 thread".
Was able to compile with Visual Studio 2012 no problem.
Keen to use this app if it can be made to work.
TheCount is offline   Reply With Quote
Old 2014-12-22, 14:03   #9
TheCount
 
TheCount's Avatar
 
Sep 2013
Perth, Au.

2×72 Posts
Default

Tried new Catalyst 14.12 without success. Did this app ever work with any version of Catalyst?
TheCount is offline   Reply With Quote
Old 2014-12-22, 14:17   #10
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

19×313 Posts
Default

Tahiti is probably slower than the CPU. I have another computer at home with a Tahiti, so I'll try running it on that.
rogue is offline   Reply With Quote
Old 2014-12-23, 02:42   #11
1998golfer
 
Dec 2014

116 Posts
Default

If I did this right, then it looks fine on windows 7 x64 with nvidia gtx 760.

Code:
D:\...\testing>gcwsievecl64 -v -C -W -b200 -B201 -n2 -t2 -N1e5 -P1e6
gcwsievecl v1.0.2, a GPU program to find factors of Cullen and Woodall numbers (n*b^n+c where b and n are fixed)
Quick elimination of terms info (in order of check):
    99998 because the term is even
    63411 because the term is divisible by a prime < 100
Platform 0 is a NVIDIA Corporation NVIDIA CUDA, version OpenCL 1.1 CUDA 6.5.12
Device 0 is a NVIDIA Corporation GeForce GTX 760
workGroupSize = 38400 = 200 * 32 * 6 (blocks * workGroupSizeMultiple * deviceComputeUnits)
Running with 2 threads
Allocated memory (prior to sieving):  25 MB in CPU, 9 MB in GPU
Sieve started: (cmdline) 0 <= p < 1000000 with 36589 terms

Sieve complete: 3 <= p < 1000000  116898 primes tested
Clock time: 2.02 seconds at 57810 p/sec.  Factors found: 26022
Processor time: 1.28 sec. (0.27 init + 1.01 sieve).
Seconds spent in CPU and GPU: 2.76 (cpu), 1.76 (gpu)
Percent of time spent in CPU vs. GPU: 61.10 (cpu), 38.90 (gpu)
CPU/GPU utilization: 0.63 (cores), 0.87 (devices)
Started with 36589 terms and sieved to 1000000.  10567 remaining terms written to gcw_201.pfgw
Attached is a zip containing gcwsieve.log and gcw_201.pfgw


Let me know if I did this right, or if any other tests I do can be of any help...


Thanks,
-1998golfer
Attached Files
File Type: zip gcwsieve_results_win7x64_gtx760.zip (437.0 KB, 61 views)
1998golfer is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Generalized Cullen and Woodall Searches rogue And now for something completely different 29 2019-11-20 14:01
Generalized Cullen and Woodall numbers em99010pepe Factoring 9 2019-03-26 08:35
Super Cullen & Woodall primes Citrix And now for something completely different 1 2017-10-26 09:12
Cullen and Woodall altering on Prime Pages jasong jasong 9 2008-01-25 01:51
Can we add Cullen and Woodall p-1ing here? jasong Marin's Mersenne-aries 1 2007-11-18 23:17

All times are UTC. The time now is 23:54.

Sat Oct 24 23:54:36 UTC 2020 up 44 days, 21:05, 1 user, load averages: 1.59, 1.72, 1.77

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.