mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2013-05-10, 09:33   #760
Manpowre
 
"Svein Johansen"
May 2013
Norway

3·67 Posts
Default

Quote:
Originally Posted by Bdot View Post
Which mfakto version are you running? If it is anything before the last GPU-sieve-preview, isn't a core per mfakto instance normal?
I started 2 instances of Mfakto on my 6970, and I doubled the amount of Ghz days. I tweaked the ini file a little, and now produce around 90 ghz days with 2x instances on this card. It could be an old version of mfakto, ill check when I come home.
Manpowre is offline   Reply With Quote
Old 2013-05-10, 09:58   #761
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3×199 Posts
Default

Quote:
Originally Posted by Axelsson View Post
The 0.13pre4 is a lot faster on small numbers, I got up to 260 GHzdays/day on numbers just above two millions, even found two new factors that I have reported.

But then I ran the -st2 selftest on the new beta and got 3 failed self tests.
GPU : HD6970, windows 7 professional 64 bit
Hi Axelsson, breaker of all software ,

Thanks a lot for this test, it shows that we should not submit "no factor" results with this version yet. I will most likely have to create a special debugging version for you as I cannot reproduce the error on my HW. Could be specific to Cayman, or the driver version (I will test that part next). It should not have any consequences for normal runs, as the 82-88 bit kernels will never be selected for testing 64-bit factor candidates, and the smaller kernels did find the factor. However, I need to understand what exactly was going wrong.

Once again I'm really happy about the hugely extended -st2 selftest of almost 33k factors. Maybe I should keep them in the release versions ... some errors only show up under very special conditions - like the one you discovered here: one of 33k factors shows an error in 3 of 15 kernels ...

And please, don't forget to send me the result of
"Switch to CPU-sieve (SieveOnGPU=0) and run "mfakto-0.13pre4-pi-win64 -st > st-0.13pre4-pi-win64.log" on an otherwise idle machine."
so I can optimize the kernel selection for Cayman - currently it is entirely based on assumptions.
Bdot is offline   Reply With Quote
Old 2013-05-10, 10:09   #762
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3·199 Posts
Default

Quote:
Originally Posted by Manpowre View Post
I started 2 instances of Mfakto on my 6970, and I doubled the amount of Ghz days. I tweaked the ini file a little, and now produce around 90 ghz days with 2x instances on this card. It could be an old version of mfakto, ill check when I come home.
Most likely you'll get even more out of it when running 3 or 4 instances, if you have so many CPU cores available. Check the GPU utilization (e.g. using GPU-Z) to see the potential it still has.

You should be running mfakto 0.12 if you want to report the results to primenet and get credit for it. 0.12 still uses a lot of CPU power to prepare the GPU calculations. When 0.13 is ready, that part should be improved and you'll get well above 200 GHz (which is short for GHz-days/day) without a lot of CPU load, and with only one mfakto instance.
Bdot is offline   Reply With Quote
Old 2013-05-10, 21:52   #763
Jayder
 
Jayder's Avatar
 
Dec 2012

2×139 Posts
Default

Is there any point in having a 0.13pre4-var version at this point? When I regularly run mfakto I run mfakto-var because I get more GHzD with a SieveSizeLimit of 130 or 154. I tried mfakto-64k and it wasn't as good. Hope I haven't been doing something stupid all this time.

I have an HD6410D APU which is pretty mediocre. For the moment it looks like GPU sieve won't be worth it for me, but maybe I am jumping the gun. Without GPU sieve it typically takes 30-40% of a single core to put the GPU to max. I will play around some more to find the ideal GPU sieve settings.

I've been running st2 for the past 9 hours and it looks like it will take 20-36 hours total. Hard to tell at this point. I hope it's not for nothing.

Last fiddled with by Jayder on 2013-05-10 at 21:55
Jayder is offline   Reply With Quote
Old 2013-05-10, 22:10   #764
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

11258 Posts
Default

Quote:
Originally Posted by Jayder View Post
Is there any point in having a 0.13pre4-var version at this point? When I regularly run mfakto I run mfakto-var because I get more GHzD with a SieveSizeLimit of 130 or 154. I tried mfakto-64k and it wasn't as good. Hope I haven't been doing something stupid all this time.

I have an HD6410D APU which is pretty mediocre. For the moment it looks like GPU sieve won't be worth it for me, but maybe I am jumping the gun. Without GPU sieve it typically takes 30-40% of a single core to put the GPU to max. I will play around some more to find the ideal GPU sieve settings.

I've been running st2 for the past 9 hours and it looks like it will take 20-36 hours total. Hard to tell at this point.
I will certainly provide the -var version for those of you who want to stick with high SievePrimes values on the CPU. In those cases, the higher SieveSizeLimit is indeed a good thing (much better sieve efficiency). The current GPU sieve cannot compete with that setup, if the CPU power was sufficient to drive the GPU with those settings. My estimate is that the total throughput will drop to ~85% when you switch to GPU sieving.

You can Ctrl-C (once) the st2-selftest, and it will still print a summary. I need to find out about the problem Axelsson reported anyway - until that, 0.13pre4 should not be used to submit any "no factor found" results. If I need to provide a new version, then maybe spend the day again for the selftest ...

Thanks a lot for your support.
Bdot is offline   Reply With Quote
Old 2013-05-13, 22:21   #765
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3×199 Posts
Default

Quote:
Originally Posted by Axelsson View Post

But then I ran the -st2 selftest on the new beta and got 3 failed self tests.
Oh boy, I now saw the same errors on my HD7850. Strange that I missed them earlier, but now I could debug much easier.

Result: my old HD5770 (main dev system) is rounding floats a bit differently by default. Cayman (and now my GCN as well) are sometimes giving a different result in the last digit. I'll run all my tests tomorrow (and also check the results this time ), and then provide a new beta version - maybe the last one.
Bdot is offline   Reply With Quote
Old 2013-05-13, 22:43   #766
Axelsson
 
Jul 2012
Sweden

2·3·7 Posts
Default

Great! There is nothing worse than a non reproducible bug.

Just give me another beta and I'll break that one too!

I just wish some moderator would add that description to my user, breaker of programs... I like it!

/Göran
Axelsson is offline   Reply With Quote
Old 2013-05-13, 22:43   #767
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23×271 Posts
Default

Quote:
Originally Posted by Bdot View Post
Oh boy, I now saw the same errors on my HD7850. Strange that I missed them earlier, but now I could debug much easier.

Result: my old HD5770 (main dev system) is rounding floats a bit differently by default. Cayman (and now my GCN as well) are sometimes giving a different result in the last digit. I'll run all my tests tomorrow (and also check the results this time ), and then provide a new beta version - maybe the last one.
I did the full selftest (st2) on pre4 and had no errors, to my knowledge on my 7770...
kracker is offline   Reply With Quote
Old 2013-05-15, 13:42   #768
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3×199 Posts
Default

Quote:
Originally Posted by kracker View Post
I did the full selftest (st2) on pre4 and had no errors, to my knowledge on my 7770...
Yeah, I'm not sure what else influences this ... I also think I checked the earlier logs, but I've overwritten them since I upgraded the driver versions ...

Anyway, I fixed the rounding issue and posted version 0.13pre5. This one finds all factors of the extended selftest (-st2) on both my card models, tested on Catalyst 13.1 and 13.4.

BTW, the high CPU load issue with 13.4 is Windows-specific. On Linux, mfakto stays at 0.1% CPU, but the screen becomes extremely laggy. I need to see if I can do something about that.

Thanks to Kyle who was the first to send me performance data of a Cayman, I think I have the proper kernel selection for that platform as well. Cayman is very interesting: for almost all tests, VectorSize=4 leads by a big margin. However, if you plan to test anything with an upper bit level of 60 or below, then VectorSize=2 is about 35% faster .

I still need the performance info for high-end GCN (Tahiti), as I assume that they don't suffer so much from 32-bit integer multiplications. So if anyone has an HD7870 XT or 79xx, I'd appreciate the results of this test:
  • set VectorSize=2 (recommended for all GCN)
  • set SieveOnGpu=0 (switch to CPU sieving just for this test)
  • run mfakto-0.13pre5-pi-win64 -st > st-0.13pre5-pi.log
And generally, I'm interested in results of the -st2 test. Additional performance info tests are welcome too for verification.


Let's see if Göran is again the first to break this version .

Last fiddled with by Bdot on 2013-05-15 at 13:45 Reason: typso
Bdot is offline   Reply With Quote
Old 2013-05-15, 23:38   #769
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23·271 Posts
Default

pre5 working good... Anything else you want me to test?
kracker is offline   Reply With Quote
Old 2013-05-15, 23:47   #770
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

10010101012 Posts
Default

Quote:
Originally Posted by kracker View Post
pre5 working good... Anything else you want me to test?
Actually, no .

Once enough of you agree that this version is good (i.e. -st2 shows no errors, no serious other issues and it is not slower than the previous version), then I'll release it. In other words, it should be ready for production once I see a "good" message from a Cayman (to confirm my fix works there as well).
Bdot is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
gpuOwL: an OpenCL program for Mersenne primality testing preda GpuOwl 2718 2021-07-06 18:30
mfaktc: a CUDA program for Mersenne prefactoring TheJudger GPU Computing 3497 2021-06-05 12:27
LL with OpenCL msft GPU Computing 433 2019-06-23 21:11
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
Program to TF Mersenne numbers with more than 1 sextillion digits? Stargate38 Factoring 24 2011-11-03 00:34

All times are UTC. The time now is 09:51.


Mon Aug 2 09:51:26 UTC 2021 up 10 days, 4:20, 0 users, load averages: 1.17, 1.25, 1.27

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.