mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfakto: an OpenCL program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=15646)

tului 2014-07-17 23:48

[QUOTE=Bdot;377901]Oh-oh, things like these happen when in a hurry without checking ... even the smallest fix can introduce new bugs :gah:[/QUOTE]

Well a checked out as of today VS2013 x64 with full optimizations, AVX and LTO works fine on R7 260x's. I've been tempted to enable my motherboards VirtuMVP just to add some Intel HD 4000 to the mix but the free Virtu Asus download isn't 8.1 compatible and I'm not buying the $30 real Virtu software

Bdot 2014-07-21 13:05

I've put the win-64 version of mfakto-0.15pre1 on the [URL="http://www.mersenneforum.org/mfakto/mfakto-0.15pre1/"]ftp[/URL]. It is [B]NOT YET FULLY TESTED FOR PRODUCTION[/B]!

This version should have all the fixes for IntelHD as suggested by George, however, lacking such a system I could not test that.

It comes with runtime-modifiable settings: press 'm' to see this menu:

[code]
Settings menu

Num Setting Current value (shortcut outside of the menu for de-/increasing this setting)

1 SievePrimes = 97990 (-/+)
2 SieveSize = 35 (s/S)
3 SieveProcessSize = 35 (p/P)
4 SievePrimesAdjust = 0 (a/A)
5 FlushInterval = 0 (f/F)
6 Verbosity = 1 (v/V)
7 PrintMode = 0 (r/R)
8 Kernel = cl_barrett15_73_2 (k/K)

0 Done (continue factoring)

-1 Exit mfakto (q/Q)
Change setting number:
[/code]Factoring is paused while the menu is shown. While in the menu, select by number. Outside the menu, pressing the keys in parenthesis changes the respective value is steps without pausing TF. Keypresses are evaluated only between classes. Any required reinitializations are done automatically. Changing the kernel is not yet implemented. This feature is intended to let you find the best settings much easier. Please try to break it (and let me know what you did to break it). This includes messing up the settings while running the selftest - there must be no missed factors no matter what you try.

Let me know if you see the need for other parameters to change at runtime (e.g. VectorSize, SieveOnGPU or MoreClasses - but they would require recompilation of the kernels, which I did not yet implement).

I'm not yet convinced of the usability of this feature - let me know if you have ideas how to improve it.

And of course, this version should succeed all self tests and not be slower than 0.14 (but also not faster - the kernels are unchanged, apart from the INTEL definitions).

legendarymudkip 2014-07-21 16:37

It works for me now. I get around 18GHzDays/Day throughput. Is there any way to increase this or does this sound around optimal?

Bdot 2014-07-21 18:52

Most likely, this is about the max you can get. You can try different VectorSize: from George's results I understood VectorSize=4 is fastest.

If it is only about speed for mfakto, then switch to CPU sieving (SieveOnGPU=0) and select a high SievePrimes (e.g. 200000). This will use a portion of a CPU core to help the HD4600.

With GPU sieving, the other options are to play around while it is running; see my previous post. SievePrimes, SieveSize, SieveProcessSize are the adjustable values that affect performance, maybe also FlushInterval.

Play around with it and tell us what the optimal settings are :smile:.

potonono 2014-07-22 03:12

1 Attachment(s)
I have no issue running the selftest now, though did still have to specify -d 11 for it to recognize the GPU. All tests were passed. All test still passed even when changing the various settings from the menu.

If I run with option --perftest, after a little bit of output the program generates a generic Windows error indicating that the program has stopped working. Attached are the selftest and perftest runs.

Bdot 2014-07-22 21:29

Thanks a lot for your testing![QUOTE=potonono;378772]I have no issue running the selftest now, though did still have to specify -d 11 for it to recognize the GPU. All tests were passed. All test still passed even when changing the various settings from the menu.
[/QUOTE]
I'm making a few changes now, maybe -d 11 will no longer be needed with the next version.
[QUOTE=potonono;378772]
If I run with option --perftest, after a little bit of output the program generates a generic Windows error indicating that the program has stopped working. Attached are the selftest and perftest runs.[/QUOTE]
Oh, right. George already reported that but I did not yet act on this ... maybe next version :smile:

If you already tested some real trial factoring, could you please report what the best values for SievePrimes, SieveSize, SieveProcessSize and maybe VectorSize are?

kracker 2014-07-23 02:49

Started st2. :smile:

Bdot 2014-07-24 08:57

Did anyone of you try the new feature on a real exponent to find more efficient settings than the defaults? If you try, then you'll notice that the best SievePrimes, SieveSize, SieveProcessSize may be different for different TF jobs ... and the good thing: any improvements you find by using this version can be applied to version 0.14 by writing them to the mfakto.ini file.

I'm interested to hear about any improvements and what you changed.

kracker 2014-07-24 13:39

[QUOTE=Bdot;378942]Did anyone of you try the new feature on a real exponent to find more efficient settings than the defaults? If you try, then you'll notice that the best SievePrimes, SieveSize, SieveProcessSize may be different for different TF jobs ... and the good thing: any improvements you find by using this version can be applied to version 0.14 by writing them to the mfakto.ini file.

I'm interested to hear about any improvements and what you changed.[/QUOTE]

I'll do/try that. :smile:

On another note...
[code]

Selftest statistics
number of tests 287351
successful tests 287350
no factor found 1

selftest FAILED!

ERROR: selftest failed, exiting.
[/code]
[code]
######### testcase 2584/32927 (M59000521[82-83]) #########
Starting trial factoring M59000521 from 2^82 to 2^83 (16600.99GHz-days)
Using GPU kernel "cl_barrett15_83_gs_4"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Jul 22 22:31 | 3828 0.1% | 1.094 n.a. | n.a. 81205 0.00%
no factor for M59000521 from 2^82 to 2^83 [mfakto 0.15pre1-Win cl_barrett15_83_gs_4]
ERROR: selftest failed for M59000521 (cl_barrett15_83_gs)
no factor found
tf(): total time spent: 1.094s

Starting trial factoring M59000521 from 2^82 to 2^83 (16600.99GHz-days)
Using GPU kernel "cl_barrett15_88_gs_4"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Jul 22 22:31 | 3828 0.1% | 1.221 n.a. | n.a. 81205 0.00%
M59000521 has a factor: 6190124149267876918004257

found 1 factor for M59000521 from 2^82 to 2^83 [mfakto 0.15pre1-Win cl_barrett15_88_gs_4]
selftest for M59000521 passed (cl_barrett15_88_gs)!
tf(): total time spent: 1.221s

Starting trial factoring M59000521 from 2^82 to 2^83 (16600.99GHz-days)
Using GPU kernel "cl_barrett32_87_gs_4"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Jul 22 22:31 | 3828 0.1% | 0.710 n.a. | n.a. 81205 0.00%
M59000521 has a factor: 6190124149267876918004257

found 1 factor for M59000521 from 2^82 to 2^83 [mfakto 0.15pre1-Win cl_barrett32_87_gs_4]
selftest for M59000521 passed (cl_barrett32_87_gs)!
tf(): total time spent: 0.711s

Starting trial factoring M59000521 from 2^82 to 2^83 (16600.99GHz-days)
Using GPU kernel "cl_barrett32_88_gs_4"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Jul 22 22:31 | 3828 0.1% | 0.730 n.a. | n.a. 81205 0.00%
M59000521 has a factor: 6190124149267876918004257

found 1 factor for M59000521 from 2^82 to 2^83 [mfakto 0.15pre1-Win cl_barrett32_88_gs_4]
selftest for M59000521 passed (cl_barrett32_88_gs)!
tf(): total time spent: 0.730s

Starting trial factoring M59000521 from 2^82 to 2^83 (16600.99GHz-days)
Using GPU kernel "cl_barrett32_92_gs_4"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Jul 22 22:31 | 3828 0.1% | 0.831 n.a. | n.a. 81205 0.00%
M59000521 has a factor: 6190124149267876918004257

found 1 factor for M59000521 from 2^82 to 2^83 [mfakto 0.15pre1-Win cl_barrett32_92_gs_4]
selftest for M59000521 passed (cl_barrett32_92_gs)!
tf(): total time spent: 0.831s
[/code]

potonono 2014-07-25 23:53

I haven't had much chance to test yet, except I can confirm VectorSize=4 is best on mine too.

legendarymudkip 2014-07-26 00:00

VectorSize=4 increases throughput by about 1 GHz-d/Day on my end as well.


All times are UTC. The time now is 23:04.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.