![]() |
|
|
#815 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23×271 Posts |
Sorry for the late reply. VectorSize 3 works on cpu sieving but not on gpu sieving...
|
|
|
|
|
|
#816 |
|
Nov 2010
Germany
25516 Posts |
|
|
|
|
|
|
#817 | |
|
Nov 2010
Germany
3·199 Posts |
Quote:
Sieving has a constant speed for different kernels, factor sizes etc. Its speed depends only on the GPUSieve* parameters. The optimal selection of GPUSievePrimes depends on finding the best relation of the kernel run times of the sieve kernel versus the trial factoring kernel. Better (i. e. longer) sieving means less (i.e. shorter) trial factoring, in a non-linear dependency. Therefore, anything that changes the speed of trial factoring will also change the optimal GPUSievePrimes. If testing takes longer (because a slower kernel needs to be used, or because the exponent has more bits), then GPUSievePrimes can be a bit higher. However, the differences should be rather small, and so are the achievable improvements. I'll test that and add more details to my theoretical explanation soon. I'm also working on an automatic optimizer for those and other variables, so that no more manual change-and-test cycle will be needed. |
|
|
|
|
|
|
#818 |
|
Jun 2010
Pennsylvania
2·467 Posts |
I've been testing the new version of mfakto on my 7770, and all I can say is -- wow!!
![]() Single instances of TF (70 --> 71) on an M672xxxxx went from 71 minutes (71.56 GHz-days/day) with 0.12 (x64), to 36 minutes (142.23 GHz-days/day) with 0.13 (x64). Thank you Bdot... and thank you kracker for alerting me to the new version! Oh, and Prime95 performance apparently is no longer taking a hit of any kind. Fantastic job! One curiosity I noticed. The results for the same TF on the 0.13 x32 version were 35 minutes and 144.67 Ghz-days/day, better than for the x64 variety. (Nothing else changed in the system environment.) Is that expected? This wasn't the case with the 0.12 x32 (86 minutes, 59.61 GHz-days/day) which wasn't quite as productive as the x64. Rodrigo |
|
|
|
|
|
#819 |
|
Jun 2010
Pennsylvania
16468 Posts |
UPDATE:
Just finished a run of mfakto 0.13 x32 in two instances on the 7770 (CPU: Core i7-3770, Windows 7 Home Premium x64, three Prime95 workers running). One instance completed at a rate of 73.89 GHz-days/day; the other, at 73.92. (This time it wasn't the exact same exponent that was factored, but two consecutive ones.) Prime95 still virtually if not completely unaffected, but there seems to be almost no benefit any longer to running more than one instance of mfakto. (With version 0.12 I could run three instances of TF and reach ~140 GHz-days/day, although not consistently and the average was closer to 130.) Another thing: this time (with the two instances running) there was a severe lag whenever I moved the cursor or tried to reposition a window. I have verified that this does not happen when running the single instance of 0.13. Hope that this info is useful. Rodrigo Last fiddled with by Rodrigo on 2013-06-25 at 05:35 Reason: update to the update |
|
|
|
|
|
#820 |
|
Dec 2012
11616 Posts |
0.13 moved the sieving part to the GPU, which was previously done on the CPU. So, unless I'm hugely mistaken, when using GPU sieving you will only need to ever run one instance to reach full GPU load. The CPU will remain unused, because it's not doing work anymore. This is all expected behaviour. Pretty great, right?
If it's still lagging with one instance, I think GPUSieveSize is the setting you'll want to lower. It will make everything a lot more responsive, and I didn't see any reduction in GHzD when I lowered it. As for why x32 seems better/the same as x64, I am not sure. I will leave that for somebody more knowledgeable to answer. But the difference between the two was only one minute; is it abnormal for it to fluctuate that greatly? You weren't using the GPU for something else? Off topic: I discovered something that maybe others don't know about: Ctrl+S will pause/unpause the worker (any command-line function, it seems). I use this now instead of Ctrl+C. Hopefully I'm not breaking anything. |
|
|
|
|
|
#821 |
|
Jun 2005
12910 Posts |
If it is anything like the CUDA version, it's because with 32-bit vs. 64-bit memory addresses there's half the data to keep track of.
The CPU sieve picked up some speed from using 64-bit code to offset this in older versions, but with all that happening in the GPU on the newer versions there's no [net] benefit to 64-bit code. See http://mersenneforum.org/showpost.ph...postcount=1981 for more detail. Last fiddled with by kjaget on 2013-06-25 at 13:08 |
|
|
|
|
|
#822 | |
|
Jun 2010
Pennsylvania
2×467 Posts |
Quote:
No, there was nothing else going on with the GPU during any of the mfakto runs. But @kjaget's explanation makes sense. Nice find about Ctrl+S, by the way. I'll start using that instead of Ctrl+C when I simply need it to pause. Rodrigo |
|
|
|
|
|
|
#823 | |
|
Jun 2010
Pennsylvania
2·467 Posts |
Quote:
Rodrigo |
|
|
|
|
|
|
#824 | ||
|
"Mr. Meeseeks"
Jan 2012
California, USA
23×271 Posts |
Quote:
Quote:
|
||
|
|
|
|
|
#825 | |
|
Jun 2010
Pennsylvania
2·467 Posts |
Quote:
My 7770 is as it came, no adjustments made to it. The only tweak I've made to mfakto 0.13 is to change the VectorSize value from the default 4 down to 2, as suggested by the program itself the first time I ran it. Have you made any other adjustments to increase the output? Rodrigo Last fiddled with by Rodrigo on 2013-06-25 at 17:14 |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| gpuOwL: an OpenCL program for Mersenne primality testing | preda | GpuOwl | 2718 | 2021-07-06 18:30 |
| mfaktc: a CUDA program for Mersenne prefactoring | TheJudger | GPU Computing | 3497 | 2021-06-05 12:27 |
| LL with OpenCL | msft | GPU Computing | 433 | 2019-06-23 21:11 |
| OpenCL for FPGAs | TObject | GPU Computing | 2 | 2013-10-12 21:09 |
| Program to TF Mersenne numbers with more than 1 sextillion digits? | Stargate38 | Factoring | 24 | 2011-11-03 00:34 |