20130527, 16:29  #804 
"Mr. Meeseeks"
Jan 2012
California, USA
3^{2}×241 Posts 

20130528, 02:36  #805 
Jul 2006
Calgary
110101001_{2} Posts 

20130529, 17:00  #806 
"Mr. Meeseeks"
Jan 2012
California, USA
100001111001_{2} Posts 
I just noticed this, but there is a huge penalty going to gpu sieve on my APU...

20130529, 19:10  #807 
Romulan Interpreter
"name field"
Jun 2011
Thailand
23437_{8} Posts 
I believe you need to play with that SieveSize value for the sieve (for mfaktc at least, it has to be a lot increased for GPU sieve, otherwise the sieving is very fast and you end up with a lot of candidates and do a lot of unuseful exponentiations). But confirm this first with a guy who runs mfakto, my experience is limited to mfaktc. Even if the values are right, you still have the CPU free and can compensate with GHz days of P1.

20130529, 20:31  #808  
"Victor de Hollander"
Aug 2011
the Netherlands
10010011011_{2} Posts 
st test on my 7950
Quote:
@800mhz (AMD reference clock) 296 GHzdays/day (7071 bit) 283 GHzdays/day (7374 bit) @900mhz (factory clock) 335 GHzdays/day (7071 bit) 318 GHzdays/day (7374 bit) 

20130603, 19:08  #809  
Nov 2010
Germany
3×199 Posts 
Quote:
Quote:
Quote:
Quote:
First of all, it's true, that all VLIW5 and VLIW4 GPUs pay a big penalty for the GPU sieving  much bigger than GCN. Therefore, the sweet spot is with reduced GPUSievePrimes (compared to the default), whereas GCN cards honor an increase. However, I wonder if you really need to go as low as 50k for the APU in order to get the best out of it. I found ~70k for VLIW5 and ~110k for GCN to be optimal. I have just two cards and it may look very different for you. Second, I noticed quite some sensitivity for the GPUSieveProcessSize, where both my cards run best with 24. Third, the AMD drivers. Sadly, the latest windows drivers (13.4) make mfakto consume almost one CPU core, even when GPU sieving. When other programs (like prime95) use a lot of CPU, then mfakto's high CPU load goes away, but at the cost of some 1020% throughput. The 5770's 115 GHzdays/day are pretty close to what should be possible, as are the 7950's results. Which GPUSieve* values did you use? The APU (6550D, right?) speed, however, is too far from the expected 42.5 GHzdays/day. Again, which GPUSieve* values did you use? Generally, most low to midend GPUs will have a much better throughput when CPU sieving, if you can spare enough CPU power to sustain SievePrimes of >20k (VLIW5) or >50k (GCN). I'm working on improving the sieve for the vector platforms, but v0.13 is not yet very optimized in this respect. 

20130603, 23:00  #810  
"Mr. Meeseeks"
Jan 2012
California, USA
4171_{8} Posts 
Quote:


20130604, 09:20  #811 
Jul 2012
Sweden
2×3×7 Posts 
I found a small bugg....
Code:
got assignment: exp=33732341 bit_min=70 bit_max=71 (7.09 GHzdays) Starting trial factoring M33732341 from 2^70 to 2^71 (7.09GHzdays) Using GPU kernel "cl_barrett15_73_gs" No checkpoint file "M33732341.ckp" found. Date Time  class Pct  time ETA  GHzd/day Sieve Wait Jun 04 08:32  1263 27.6%  7.446 1h26m  85.68 21813 0.00% M33732341 has a factor: 1516555032424995693727 found 1 factor for M33732341 from 2^70 to 2^71 (partially tested) [mfakto 0.13Win cl_barrett15_73_gs_4] tf(): total time spent: 15m 19.113s (666.39 GHzdays / day) In other news I have had two reboots with crashed video driver. After reboot the machine starts up without any video output (still possible to log in via remote desktop). The next reboot brings back the screen. I'm writing it off as a thermal problem but curiously both crashes happened when switching assignments, there were no save files left. It's been a couple of days since last crash and everything is running stable right now. I'll report back if I find out some more. /Göran 
20130604, 11:44  #812  
Nov 2010
Germany
3·199 Posts 
Quote:
If primenet accepted the factor as "F" (meaning found by TF), then you probably got credit of more than 8 GHzdays/day, within 15 minutes. Sounds rather like 800 GHzdays/day, but I cannot easily guess how much credit you'll get for a factor, so I calculate it as if you completed the whole bitlevel within 15 min (or whatever it took you to find the factor). In my experience that is about the lower limit of what you get as credit from primenet. Quote:
Then you could try to run two instances in parallel  the other one would keep the GPU busy while one saves the results, and otherwise they would evenly split the GPU power. (Evenly meaning the number of kernel invocations, not necessarily the performance.) Edit: your GPUSievePrimes seems very low ... why? Last fiddled with by Bdot on 20130604 at 12:41 

20130604, 11:54  #813  
Nov 2010
Germany
3·199 Posts 
Quote:
Please check, if VectorSize=2 or VectorSize=3 helps when GPU sieving. This would not be very fortunate, as it leaves ~25% of the vector units unused, but it may still be better than having to spill registers to global memory ... 

20130615, 06:13  #814 
Dec 2012
427_{8} Posts 
Will the "best fit" GPUSievePrimes value remain constant? Or will it change as you change bit level, exponent, or possibly kernel? For example, if a GPUSievePrimes value of 52000 works best on a 332M exponent going from 2^69 to 2^70, will 52000 also be the best for a 65M exponent and 2^73 to 2^74? What about an 8M exponent and 2^60 to 2^61? Etc.
Are there any good strategies for finding the best value? Other than intelligent trial and error. I searched for an answer in the mfaktc thread, but it is kind of a massive thread. I also considered experimenting and finding out for myself, but it would take a very long time to find out on my slow GPU. Hopefully I am not missing an answer that is staring me in the face. 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
gpuOwL: an OpenCL program for Mersenne primality testing  preda  GpuOwl  2780  20220809 14:36 
mfaktc: a CUDA program for Mersenne prefactoring  TheJudger  GPU Computing  3541  20220421 22:37 
LL with OpenCL  msft  GPU Computing  433  20190623 21:11 
OpenCL for FPGAs  TObject  GPU Computing  2  20131012 21:09 
Program to TF Mersenne numbers with more than 1 sextillion digits?  Stargate38  Factoring  24  20111103 00:34 