mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2013-07-02, 12:27   #859
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

72×197 Posts
Default

Test if I can attach it. (I did not expect that you generate gigabytes of those test results... )
Attached Files
File Type: zip mfakto-pi_st.zip (155.1 KB, 71 views)
LaurV is online now   Reply With Quote
Old 2013-07-02, 20:06   #860
Rodrigo
 
Rodrigo's Avatar
 
Jun 2010
Pennsylvania

2×467 Posts
Default

Quote:
Originally Posted by Bdot View Post
this made me think if non-power-of-two values for GPUSieveSize are allowed at all. Yes they are. But thinking a bit more about the dependencies, I think I found a bug in mfakto and mfaktc:

When using GPUSieveProcessSize=24 and GPUSieveSize that is not divisible by 3, then some FCs may go untested.

(GPUSieveSize * 1024) must be divisible by GPUSieveProcessSize.

Worst case would be GPUSieveProcessSize=24 and GPUSieveSize=4, in which case about 1 in 256 FCs would go untested.

Typical settings of GPUSieveProcessSize=24 and GPUSieveSize=64 leaves 1 in 4096 FCs untested, GPUSieveProcessSize=24 and GPUSieveSize=128 about 1 in 16384.

Unfortunately this is something that the selftest cannot cover without increasing the selftest runtime by a factor of 10 at least.

How many of you use GPU sieving with GPUSieveProcessSize=24 ? What do we want to do with those tests?
I'm using GPUSieveProcessSize=16, as suggested by kracker here.

Going along with his other idea there, GPUSievePrimes is at 92486.

GPUSieveSize has been back at 64 ever since I reported the results with that value at 48.

With these settings, TF of 72xxxxxx exponents has been hovering around 144-146 GHz-d/d.

All of this work done in mfakto x32.

Rodrigo

Last fiddled with by Rodrigo on 2013-07-02 at 20:07 Reason: add'l info
Rodrigo is offline   Reply With Quote
Old 2013-07-02, 20:34   #861
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23·271 Posts
Default

Quote:
Originally Posted by Rodrigo View Post
I'm using GPUSieveProcessSize=16, as suggested by kracker here.
Actually, I had recommended 24, "GPUSieveProcessSize to 24 from 16" which 16 is default(I think?)

Anyways it might be safer to stay on 16 (for now)
kracker is offline   Reply With Quote
Old 2013-07-02, 21:13   #862
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3·199 Posts
Default

Quote:
Originally Posted by kracker View Post
Actually, I had recommended 24, "GPUSieveProcessSize to 24 from 16" which 16 is default(I think?)

Anyways it might be safer to stay on 16 (for now)
GPUSieveProcessSize=24 is safe as long as GPUSieveSize is a multiple of 3. That is also, what the fix will do - the fix is just in the parameter validation part of reading mfakto.ini.

What is the general opinion if any of the tests need to be repeated? I tried to come up with an estimate how "bad" the bug is ...

My opinion is, that if anyone used GPUSieveProcessSize=24 along with really low GPUSieveSize(say, < 20), then chances of missing a factor (like one in 256 to one in 1024) are too high and the tests should be repeated. For GPUSieveSize=32 and above, missing between one in 4096 and one in 16384 factors is probably not worth the effort.

But I'd like to hear other opinions. If you can provide a guideline for an "acceptable miss probability", then I can compile a list of GPUSieveSize settings that would require a re-test.
Bdot is offline   Reply With Quote
Old 2013-07-02, 21:17   #863
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

10010101012 Posts
Default

Quote:
Originally Posted by LaurV View Post
Test if I can attach it. (I did not expect that you generate gigabytes of those test results... )
Thanks a lot for the test results. They show a pretty accurate proportional performance increase from my 7850. No sign for a different calculation base for the high-end GCNs.

However, I do not yet have a good performance measurement of the GPU sieving itself - something to add soon.
Bdot is offline   Reply With Quote
Old 2013-07-02, 22:10   #864
blahpy
 
blahpy's Avatar
 
Jun 2013

11010112 Posts
Default

GPUSieveSize=64
GPUSieveProcessSize=16

What should I do? I don't think it's possible to "redo" or "double check" TF is it?

Edit: Or are these settings safe? Upon reading it again it seems more like GPUSieveSize=64 only needs to be divisible by 3 for GPUSieveProcessSize=24?

Edit2: Okay, I read more carefully, ignore this post (I would delete it but can't)

Last fiddled with by blahpy on 2013-07-02 at 22:15
blahpy is offline   Reply With Quote
Old 2013-07-02, 22:16   #865
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3·199 Posts
Default

Quote:
Originally Posted by blahpy View Post
GPUSieveSize=64
GPUSieveProcessSize=16

What should I do? I don't think it's possible to "redo" or "double check" TF is it?

Edit: Or are these settings safe? Upon reading it again it seems more like GPUSieveSize=64 only needs to be divisible by for GPUSieveProcessSize=24?
Yes, these (default) settings are safe - no need to rerun anything.
Bdot is offline   Reply With Quote
Old 2013-07-03, 02:34   #866
Rodrigo
 
Rodrigo's Avatar
 
Jun 2010
Pennsylvania

2×467 Posts
Default

Quote:
Originally Posted by kracker View Post
Actually, I had recommended 24, "GPUSieveProcessSize to 24 from 16" which 16 is default(I think?)

Anyways it might be safer to stay on 16 (for now)
Huh, you're right. I guess I confused it because (according to the info in the mfakto.ini file) the default value is 24:

Code:
 
# GPUSieveProcessSize defines how far many bits of the sieve each TF block
# processes (in K bits). Larger values may lead to less wasted cycles by
# reducing the number of times all threads in a warp are not TFing a
# candidate.  However, more shared memory is used which may reduce occupancy.
# Smaller values should lead to a more responsive system (each kernel takes
# less time to execute). GPUSieveProcessSize must be a multiple of 8.
# 
#
# Minimum: GPUSieveProcessSize=8
# Maximum: GPUSieveProcessSize=32  (requires GPUSievePrimes > 310)
#
# Default: GPUSieveProcessSize=24
GPUSieveProcessSize=16
...and so I switched it TO 16, figuring that's what you meant to say.

But, whatever the reason, I seem to have stumbled onto the safest values for the time being.

Rodrigo

Last fiddled with by Rodrigo on 2013-07-03 at 02:35
Rodrigo is offline   Reply With Quote
Old 2013-07-03, 11:25   #867
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3×199 Posts
Default

Hmm, the one that you can download with the win package reads:
Code:
# GPUSieveProcessSize defines how far many bits of the sieve each TF block
# processes (in K bits). Larger values may lead to less wasted cycles by
# reducing the number of times all threads in a warp are not TFing a
# candidate.  However, more shared memory is used which may reduce occupancy.
# Smaller values should lead to a more responsive system (each kernel takes
# less time to execute). GPUSieveProcessSize must be a multiple of 8.
# 
#
# Minimum: GPUSieveProcessSize=8
# Maximum: GPUSieveProcessSize=32  (requires GPUSievePrimes > 310)
#
# Default: GPUSieveProcessSize=16

GPUSieveProcessSize=16
Bdot is offline   Reply With Quote
Old 2013-07-03, 12:00   #868
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2×3×1,693 Posts
Default

Quote:
How many of you use GPU sieving with GPUSieveProcessSize=24 ? What do we want to do with those tests?
I'm using
GPUSieveSize=128
GPUSieveProcessSize=8

I don't think I've ever used 24.
kladner is offline   Reply With Quote
Old 2013-07-03, 15:46   #869
Rodrigo
 
Rodrigo's Avatar
 
Jun 2010
Pennsylvania

2·467 Posts
Default

Quote:
Originally Posted by Bdot View Post
Hmm, the one that you can download with the win package reads:
Code:
# GPUSieveProcessSize defines how far many bits of the sieve each TF block
# processes (in K bits). Larger values may lead to less wasted cycles by
# reducing the number of times all threads in a warp are not TFing a
# candidate.  However, more shared memory is used which may reduce occupancy.
# Smaller values should lead to a more responsive system (each kernel takes
# less time to execute). GPUSieveProcessSize must be a multiple of 8.
# 
#
# Minimum: GPUSieveProcessSize=8
# Maximum: GPUSieveProcessSize=32  (requires GPUSievePrimes > 310)
#
# Default: GPUSieveProcessSize=16
 
GPUSieveProcessSize=16
How weird. My backup of the original mfakto.ini shows exactly what you give there.

The only way that I can think of that could lead to the value changing in the comments line, would be if I had made the change manually to the comments line and never actually adjusted the real setting to the 24 that @kracker had recommended. (In which case I never changed it to OR from 16.)

Although I have no memory of it, this is entirely possible: my wife and I spent several days doing some intense (physical and Web) car shopping. I can picture hurriedly making this change on the way out to yet another auto dealership. This must be what happened, there's no other sensible explanation. I plead temporary insanity.

Rodrigo
Rodrigo is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
gpuOwL: an OpenCL program for Mersenne primality testing preda GpuOwl 2718 2021-07-06 18:30
mfaktc: a CUDA program for Mersenne prefactoring TheJudger GPU Computing 3497 2021-06-05 12:27
LL with OpenCL msft GPU Computing 433 2019-06-23 21:11
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
Program to TF Mersenne numbers with more than 1 sextillion digits? Stargate38 Factoring 24 2011-11-03 00:34

All times are UTC. The time now is 07:45.


Mon Aug 2 07:45:08 UTC 2021 up 10 days, 2:14, 0 users, load averages: 1.55, 1.41, 1.37

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.