mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2013-06-28, 23:33   #848
NormanRKN
 
NormanRKN's Avatar
 
Jul 2012
Saarland / Germany

22×17 Posts
Default

I use instead 0.13 x32 the x64 and can´t find any difference because I only sieve on GPU. Performance with CPU-sieve = I don´t know.
NormanRKN is offline   Reply With Quote
Old 2013-06-28, 23:34   #849
Rodrigo
 
Rodrigo's Avatar
 
Jun 2010
Pennsylvania

2·467 Posts
Default

Quote:
Originally Posted by kracker View Post
Why not?
I was asking in case the new changes might lead to a problem/conflict of one sort or another.

But then I thought, what the heck, let's see what happens -- and the result shows up two posts above this one.

I decided not to try the one remaining idea, setting SieveOnGPU=0, as if possible I'd like to do the sieving on the GPU so as to leave Prime95 unaffected. Unless the expected throughput gain for mmfakto from that setting might outweigh the loss to Prime95...

Rodrigo
Rodrigo is offline   Reply With Quote
Old 2013-06-28, 23:36   #850
Rodrigo
 
Rodrigo's Avatar
 
Jun 2010
Pennsylvania

2×467 Posts
Default

Quote:
Originally Posted by NormanRKN View Post
I use instead 0.13 x32 the x64 and can´t find any difference because I only sieve on GPU. Performance with CPU-sieve = I don´t know.
I might try that. My earlier results with x32, though, were better than with x64.

Rodrigo
Rodrigo is offline   Reply With Quote
Old 2013-07-01, 15:03   #851
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

25516 Posts
Default

Quote:
Originally Posted by Jayder View Post
Will the "best fit" GPUSievePrimes value remain constant? Or will it change as you change bit level, exponent, or possibly kernel? For example, if a GPUSievePrimes value of 52000 works best on a 332M exponent going from 2^69 to 2^70, will 52000 also be the best for a 65M exponent and 2^73 to 2^74? What about an 8M exponent and 2^60 to 2^61? Etc.

Are there any good strategies for finding the best value? Other than intelligent trial and error.

I searched for an answer in the mfaktc thread, but it is kind of a massive thread. I also considered experimenting and finding out for myself, but it would take a very long time to find out on my slow GPU. Hopefully I am not missing an answer that is staring me in the face.
Quote:
Originally Posted by Bdot View Post
I have not tested that yet, but here's my theory:

Sieving has a constant speed for different kernels, factor sizes etc. Its speed depends only on the GPUSieve* parameters.

The optimal selection of GPUSievePrimes depends on finding the best relation of the kernel run times of the sieve kernel versus the trial factoring kernel. Better (i. e. longer) sieving means less (i.e. shorter) trial factoring, in a non-linear dependency. Therefore, anything that changes the speed of trial factoring will also change the optimal GPUSievePrimes. If testing takes longer (because a slower kernel needs to be used, or because the exponent has more bits), then GPUSievePrimes can be a bit higher.

However, the differences should be rather small, and so are the achievable improvements. I'll test that and add more details to my theoretical explanation soon.

I'm also working on an automatic optimizer for those and other variables, so that no more manual change-and-test cycle will be needed.
I finally tested a bit on a system with an i7 2600 @ 3.4GHz + HD7850 @ 975 MHz, Win7, 2/4 cores busy with prime95, Catalyst 13.4 (yes, the one with one core for mfakto even though GPU sieving).

  • On this system, 32 or 64 bit makes almost no difference. If any, then 64 bits tends to be ~0.05% faster.
  • GPUSievePrimes peaked at 110k for both 333M and 63M exponents, going at least 73 -> 74 bits, up to 80 -> 81 bits
  • Going 72 -> 73 bits, peak GPUSievePrimes is at 100k
  • The performance differences between GPUSievePrimes 100k and 110k (as an example) were consistent, but small (~0.2%).
  • Stopping prime95 adds ~1% performance to mfakto.
There are still a lot more tests to be done, but a few conclusions:
  • being a bit off the optimum for GPUSievePrimes has very little effect
  • the optimum does not measurably depend on the exponent size nor the factor size, as long as the same kernel is used.
  • the optimum does depend on the selected kernel: "smaller" kernels are faster and should be used with slightly lower GPUSievePrimes
I'm not sure to which extent these results apply to other systems - I think this should be similar on all systems ... however, LaurV reported much lower GPUSievePrimes be optimal, so there may be differences.


LaurV, could you do me a favor and run the following performance test for me, if you have some time:
  • switch to CPU sieving: SieveOnGPU=0, SievePrimes=2000, SievePrimesMin=2000
  • download the mfakto special versions and put them in the mfakto directory
  • On an otherwise idle machine (yes, please stop prime95 et al.), run
    mfakto-pi -st > st-pi.log
  • Send me (or post) st-pi.log
The "big" GCN cards have a different SP:DP ratio. mfakto is not using DP, but 32-bit integer math uses the same H/W units. Maybe this is the reason why 79xx is a bit different again ...
Bdot is offline   Reply With Quote
Old 2013-07-01, 15:56   #852
zink
 
Jul 2013

12 Posts
Default

I've been trying to figure out if it is possible to solve problems such as RSA Factoring using OpenCL using redily availabel software. Would I be able to use this software in its current form to factor large semiprimes? I have a system with three HD 7970s I would like to attempt to run a small RSA number on. Is there another program that will solve these problems with OpenCL? All of the factoring projects I can find seem to be CPU based.
zink is offline   Reply With Quote
Old 2013-07-01, 19:53   #853
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

59710 Posts
Default BUGWARNING

Quote:
Originally Posted by Rodrigo View Post
Lowered GPUSieveSize from the default 64 to 48, then tested the same exponent.
this made me think if non-power-of-two values for GPUSieveSize are allowed at all. Yes they are. But thinking a bit more about the dependencies, I think I found a bug in mfakto and mfaktc:

When using GPUSieveProcessSize=24 and GPUSieveSize that is not divisible by 3, then some FCs may go untested.

(GPUSieveSize * 1024) must be divisible by GPUSieveProcessSize.

Worst case would be GPUSieveProcessSize=24 and GPUSieveSize=4, in which case about 1 in 256 FCs would go untested.

Typical settings of GPUSieveProcessSize=24 and GPUSieveSize=64 leaves 1 in 4096 FCs untested, GPUSieveProcessSize=24 and GPUSieveSize=128 about 1 in 16384.

Unfortunately this is something that the selftest cannot cover without increasing the selftest runtime by a factor of 10 at least.

How many of you use GPU sieving with GPUSieveProcessSize=24 ? What do we want to do with those tests?
Bdot is offline   Reply With Quote
Old 2013-07-01, 20:03   #854
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23×271 Posts
Default

Right now, I have GPUSieveSize=64 and GPUSieveProcessSize=24.

Last fiddled with by kracker on 2013-07-01 at 20:15 Reason: Duhh.
kracker is offline   Reply With Quote
Old 2013-07-01, 20:09   #855
NormanRKN
 
NormanRKN's Avatar
 
Jul 2012
Saarland / Germany

6810 Posts
Default

GPUSieveProcessSize=24, yes I use that
NormanRKN is offline   Reply With Quote
Old 2013-07-01, 20:14   #856
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

11258 Posts
Default

Quote:
Originally Posted by NormanRKN View Post
GPUSieveProcessSize=24, yes I use that
Then please switch to GPUSieveSize=63 or GPUSieveSize=126 (or whatever multiple of 3 you like).
Bdot is offline   Reply With Quote
Old 2013-07-02, 06:27   #857
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

226668 Posts
Default

Quote:
Originally Posted by Bdot View Post
LaurV, could you do me a favor and run the following performance test for me, if you have some time:
  • switch to CPU sieving: SieveOnGPU=0, SievePrimes=2000, SievePrimesMin=2000
  • download the mfakto special versions and put them in the mfakto directory
  • On an otherwise idle machine (yes, please stop prime95 et al.), run
    mfakto-pi -st > st-pi.log
  • Send me (or post) st-pi.log
Sure, this is a very well formulated request, I understood exactly what I am requested to do, hehe. I will do it and post the results immediately when I reach home tonight (about 7 hours from now). I feel already in debt to the community and to you. You (and few others here) are doing a wonderful job maintaining all these tools.
LaurV is online now   Reply With Quote
Old 2013-07-02, 08:23   #858
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

11258 Posts
Default

Quote:
Originally Posted by zink View Post
I've been trying to figure out if it is possible to solve problems such as RSA Factoring using OpenCL using redily availabel software. Would I be able to use this software in its current form to factor large semiprimes? I have a system with three HD 7970s I would like to attempt to run a small RSA number on. Is there another program that will solve these problems with OpenCL? All of the factoring projects I can find seem to be CPU based.
mfakto is not usable as a general factorization tool. It depends on special properties of mersenne numbers. It could be modified to allow for other special sieving tasks (like Wagstaff, Fermat, ...), but even that has not been done yet.
Bdot is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
gpuOwL: an OpenCL program for Mersenne primality testing preda GpuOwl 2718 2021-07-06 18:30
mfaktc: a CUDA program for Mersenne prefactoring TheJudger GPU Computing 3497 2021-06-05 12:27
LL with OpenCL msft GPU Computing 433 2019-06-23 21:11
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
Program to TF Mersenne numbers with more than 1 sextillion digits? Stargate38 Factoring 24 2011-11-03 00:34

All times are UTC. The time now is 07:46.


Mon Aug 2 07:46:15 UTC 2021 up 10 days, 2:15, 0 users, load averages: 1.98, 1.54, 1.42

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.