![]() |
![]() |
#804 |
"Mr. Meeseeks"
Jan 2012
California, USA
32×241 Posts |
![]() |
![]() |
![]() |
![]() |
#805 |
Jul 2006
Calgary
1101010012 Posts |
![]() |
![]() |
![]() |
![]() |
#806 |
"Mr. Meeseeks"
Jan 2012
California, USA
1000011110012 Posts |
![]()
I just noticed this, but there is a huge penalty going to gpu sieve on my APU...
|
![]() |
![]() |
![]() |
#807 |
Romulan Interpreter
"name field"
Jun 2011
Thailand
234378 Posts |
![]()
I believe you need to play with that SieveSize value for the sieve (for mfaktc at least, it has to be a lot increased for GPU sieve, otherwise the sieving is very fast and you end up with a lot of candidates and do a lot of unuseful exponentiations). But confirm this first with a guy who runs mfakto, my experience is limited to mfaktc. Even if the values are right, you still have the CPU free and can compensate with GHz days of P-1.
|
![]() |
![]() |
![]() |
#808 | |
"Victor de Hollander"
Aug 2011
the Netherlands
100100110112 Posts |
![]()
-st test on my 7950
Quote:
@800mhz (AMD reference clock) 296 GHzdays/day (70-71 bit) 283 GHzdays/day (73-74 bit) @900mhz (factory clock) 335 GHzdays/day (70-71 bit) 318 GHzdays/day (73-74 bit) |
|
![]() |
![]() |
![]() |
#809 | ||||
Nov 2010
Germany
3×199 Posts |
![]() Quote:
Quote:
Quote:
Quote:
First of all, it's true, that all VLIW5 and VLIW4 GPUs pay a big penalty for the GPU sieving - much bigger than GCN. Therefore, the sweet spot is with reduced GPUSievePrimes (compared to the default), whereas GCN cards honor an increase. However, I wonder if you really need to go as low as 50k for the APU in order to get the best out of it. I found ~70k for VLIW5 and ~110k for GCN to be optimal. I have just two cards and it may look very different for you. Second, I noticed quite some sensitivity for the GPUSieveProcessSize, where both my cards run best with 24. Third, the AMD drivers. Sadly, the latest windows drivers (13.4) make mfakto consume almost one CPU core, even when GPU sieving. When other programs (like prime95) use a lot of CPU, then mfakto's high CPU load goes away, but at the cost of some 10-20% throughput. The 5770's 115 GHz-days/day are pretty close to what should be possible, as are the 7950's results. Which GPUSieve* values did you use? The APU (6550D, right?) speed, however, is too far from the expected 42.5 GHz-days/day. Again, which GPUSieve* values did you use? Generally, most low to mid-end GPUs will have a much better throughput when CPU sieving, if you can spare enough CPU power to sustain SievePrimes of >20k (VLIW5) or >50k (GCN). I'm working on improving the sieve for the vector platforms, but v0.13 is not yet very optimized in this respect. |
||||
![]() |
![]() |
![]() |
#810 | |
"Mr. Meeseeks"
Jan 2012
California, USA
41718 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#811 |
Jul 2012
Sweden
2×3×7 Posts |
![]()
I found a small bugg....
![]() Code:
got assignment: exp=33732341 bit_min=70 bit_max=71 (7.09 GHz-days) Starting trial factoring M33732341 from 2^70 to 2^71 (7.09GHz-days) Using GPU kernel "cl_barrett15_73_gs" No checkpoint file "M33732341.ckp" found. Date Time | class Pct | time ETA | GHz-d/day Sieve Wait Jun 04 08:32 | 1263 27.6% | 7.446 1h26m | 85.68 21813 0.00% M33732341 has a factor: 1516555032424995693727 found 1 factor for M33732341 from 2^70 to 2^71 (partially tested) [mfakto 0.13-Win cl_barrett15_73_gs_4] tf(): total time spent: 15m 19.113s (666.39 GHz-days / day) In other news I have had two reboots with crashed video driver. After reboot the machine starts up without any video output (still possible to log in via remote desktop). The next reboot brings back the screen. I'm writing it off as a thermal problem but curiously both crashes happened when switching assignments, there were no save files left. It's been a couple of days since last crash and everything is running stable right now. I'll report back if I find out some more. /Göran |
![]() |
![]() |
![]() |
#812 | ||
Nov 2010
Germany
3·199 Posts |
![]() Quote:
If primenet accepted the factor as "F" (meaning found by TF), then you probably got credit of more than 8 GHz-days/day, within 15 minutes. Sounds rather like 800 GHz-days/day, but I cannot easily guess how much credit you'll get for a factor, so I calculate it as if you completed the whole bit-level within 15 min (or whatever it took you to find the factor). In my experience that is about the lower limit of what you get as credit from primenet. Quote:
Then you could try to run two instances in parallel - the other one would keep the GPU busy while one saves the results, and otherwise they would evenly split the GPU power. (Evenly meaning the number of kernel invocations, not necessarily the performance.) Edit: your GPUSievePrimes seems very low ... why? Last fiddled with by Bdot on 2013-06-04 at 12:41 |
||
![]() |
![]() |
![]() |
#813 | |
Nov 2010
Germany
3·199 Posts |
![]() Quote:
Please check, if VectorSize=2 or VectorSize=3 helps when GPU sieving. This would not be very fortunate, as it leaves ~25% of the vector units unused, but it may still be better than having to spill registers to global memory ... |
|
![]() |
![]() |
![]() |
#814 |
Dec 2012
4278 Posts |
![]()
Will the "best fit" GPUSievePrimes value remain constant? Or will it change as you change bit level, exponent, or possibly kernel? For example, if a GPUSievePrimes value of 52000 works best on a 332M exponent going from 2^69 to 2^70, will 52000 also be the best for a 65M exponent and 2^73 to 2^74? What about an 8M exponent and 2^60 to 2^61? Etc.
Are there any good strategies for finding the best value? Other than intelligent trial and error. I searched for an answer in the mfaktc thread, but it is kind of a massive thread. I also considered experimenting and finding out for myself, but it would take a very long time to find out on my slow GPU. Hopefully I am not missing an answer that is staring me in the face. |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
gpuOwL: an OpenCL program for Mersenne primality testing | preda | GpuOwl | 2780 | 2022-08-09 14:36 |
mfaktc: a CUDA program for Mersenne prefactoring | TheJudger | GPU Computing | 3541 | 2022-04-21 22:37 |
LL with OpenCL | msft | GPU Computing | 433 | 2019-06-23 21:11 |
OpenCL for FPGAs | TObject | GPU Computing | 2 | 2013-10-12 21:09 |
Program to TF Mersenne numbers with more than 1 sextillion digits? | Stargate38 | Factoring | 24 | 2011-11-03 00:34 |