mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2014-08-13, 20:52   #1178
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23×271 Posts
Default

Thanks! Will try it out.

On another point... Intel iGPU's do not have double precision...
kracker is offline   Reply With Quote
Old 2014-08-14, 09:13   #1179
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

3×3,221 Posts
Default

"MSVCR110.DLL missing from your system". Didn't need it for the old one.

Quote:
Originally Posted by Bdot View Post
  • does mfakto detect the devices automatically or are switches (like -d 11) required
Yes, (after installing the redistributable thing) there is only one HD card here, successfully detected.

Quote:
  • does it correctly identify the devices and their device type
Yes. What's with the big "elf" file? Can it be deleted?

Quote:
  • is 'mfakto -st' reporting success (on fast systems, or when you have lots of time, 'mfakto -st2') - if testing is too long, you can always interrupt by pressing 'q' or Ctrl-C.
3092/3092 successful tests. Or I could say that something is odd... because all 3092 exponents picked had factors... Hm...

-st2 works fine, no fail. Good job!

Quote:
  • use a normal trial factoring task and try to optimize the ini-file settings: try VectorSize=2 and =4 (1, 3, 8 and 16 are possible as well) to see which is faster, then use the +/-, s/S, p/P keys to get the best possible GHz-days: what was the TF job, and which settings (VectorSize, SievePrimes, SieveSize, SieveProcessSize) gave the best performance for the specific device?
Not so much to do here, GCN card, VS=2 still works best, still playing with it.

Quote:
  • any problems/suggestions?
Cosmetic: I ran it with "-i" switch with no file parameter (just "mfakto -i", by reflex, I was looking for card "info" hehe) and it crashes ugly.

Quote:
Additional performance-testing:

As the new division algorithm is based on double precision, I'd need to get performance results from as many different devices as possible:
  • Modify the ini-file to use the best VectorSize (see above)
  • Switch to CPU sieving: SieveOnGPU=0
  • make sure CPU and GPU are idle
  • run "mfakto-pi.exe -st > st-pi.log"
  • keep it running for one or two minutes, then press q (or Ctrl-C)
  • have a look at st-pi.log: ist the detected clock speed correct (it rarely is on AMD - please let me know the correct one)
  • send me the log
Thanks a lot for any help you can provide - even if the complete checklist is too long for you: any partial result is also appreciated.
Tried to do that. I have the file(s) (from -pi and from --perftest). Where I can put them? [edit: solved, didn't know the quota limit for zip is larger]
Attached Files
File Type: zip MFO_01.ZIP (305.2 KB, 71 views)

Last fiddled with by LaurV on 2014-08-14 at 10:11
LaurV is offline   Reply With Quote
Old 2014-08-14, 10:08   #1180
Antonio
 
Antonio's Avatar
 
"Antonio Key"
Sep 2011
UK

53110 Posts
Default

Tried this on my system (i5, 3570k) 2 * NVidia graphics cards and integrated HD4000 enabled, Windows 7.
With GPUType = AUTO or CPU :- Windows reports 'mfakto.exe has stopped working' during the kernel compile.
With GPUType=INTEL :- program compiles the kernel and runs on the CPU successfully.
Antonio is offline   Reply With Quote
Old 2014-08-14, 19:10   #1181
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

11258 Posts
Default

Quote:
Originally Posted by kracker View Post
Thanks! Will try it out.

On another point... Intel iGPU's do not have double precision...
Ohh ...

Good that I implemented a check for that ... you should receive a greeting and the kernels in question be skipped.
Bdot is offline   Reply With Quote
Old 2014-08-14, 21:16   #1182
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

25516 Posts
Default

Thanks for your tests, there are quite some news to me:
Quote:
Originally Posted by LaurV View Post
"MSVCR110.DLL missing from your system". Didn't need it for the old one.
I did not remember it is different, I thought I moved to VS12 before 0.14 ... but git tells otherwise ... So this needs to be added to the requirements list.
Quote:
Originally Posted by LaurV View Post
Yes. What's with the big "elf" file? Can it be deleted?
It's the kernels compiled for your device. You can delete it, and mfakto will not recreate it if you set UseBinFile to empty. If mfakto finds the file during startup, it will skip kernel recompilation, improving startup time a lot.
Quote:
Originally Posted by LaurV View Post
Cosmetic: I ran it with "-i" switch with no file parameter (just "mfakto -i", by reflex, I was looking for card "info" hehe) and it crashes ugly.
Very good. It's actually reports like these that I'm looking for. Fixed.
Quote:
Originally Posted by LaurV View Post
Tried to do that. I have the file(s) (from -pi and from --perftest). Where I can put them? [edit: solved, didn't know the quota limit for zip is larger]
Your card does not even spin up to full clock speed for the -pi test - your CPU is just too slow I need to see how I can improve GPU utilisation for this test.
Also the --perftest shows that my old PhenomII is between 2 and 4 times as fast as your CPU ... did you keep prime95 running?
The GPU part of --perftest thinks that the optimal GPUSievePrimes is a little above 110k. It will depend on the TF task though. As the card has plenty of memory with relatively large caches, probably GPUSieveSize and GPUSieveProcessingSize maxed out are best as well.
Bdot is offline   Reply With Quote
Old 2014-08-14, 21:31   #1183
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3·199 Posts
Default

Quote:
Originally Posted by Antonio View Post
Tried this on my system (i5, 3570k) 2 * NVidia graphics cards and integrated HD4000 enabled, Windows 7.
With GPUType = AUTO or CPU :- Windows reports 'mfakto.exe has stopped working' during the kernel compile.
With GPUType=INTEL :- program compiles the kernel and runs on the CPU successfully.
Can you tell a bit more about your system:
  • Which Graphics drivers (AMD, Intel and/or NVIDIA, and which version)
    I see a crash during compile as well when trying to run it on my Quadro FX 880M with NV drivers 334.something. It's an NV driver bug, it used to work with older drivers.
  • Interesting detail that GPUType=INTEL make it work ... that one skips optimization and enables a few workarounds in the code.
  • Does mfakto -d 11 / -d 12 / -d 13 / -d 21 / 22 / 23 / ... try to use other devices? (keep increasing the two digits separately until mfakto tells something like "Error: Only 1 platforms found. Cannot use platform 3..." or "Error: Only 1 devices found. Cannot use device 3..:" Does any of the settings select the HD4000? Is the HD4000 listed in the output of "clinfo"?
  • How did you check the HD4000 is enabled? Does it have a monitor connected?
Bdot is offline   Reply With Quote
Old 2014-08-15, 01:33   #1184
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

10010101012 Posts
Default

Quote:
Originally Posted by Jayder View Post
I know it's been a while since release, but if you can be bothered to, would you mind making a 64kB version if not also (optionally) a -var version? The GPU sieve is nice, but I think I am willing to switch back as the CPU sieve results in almost twice the speed on my APU. The standard 36kB sieve size limit is also quite a bit slower than 64kB.

Feel free to say no or to put it at the end of your to-do list. I can stick with the GPU sieve for a while longer. I seem to be the only one wanting it, and I don't expect you to go out of your way or anything.
I've added -64k and -var versions to the current version at the ftp. I have tested this version extensively and LaurV also reported successful tests. I'd ask you to run the -st2 selftest with the settings you intended to use, then feel free to use it for your normal TF tasks.
Bdot is offline   Reply With Quote
Old 2014-08-15, 01:38   #1185
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

25BF16 Posts
Default

Quote:
Originally Posted by Bdot View Post
It's the kernels compiled for your device. You can delete it, and mfakto will not recreate it if you set UseBinFile to empty. If mfakto finds the file during startup, it will skip kernel recompilation, improving startup time a lot.
I understood as much as this, looking into the new ini file, after I posted my previous post.

Quote:
Your card does not even spin up to full clock speed for the -pi test - your CPU is just too slow I need to see how I can improve GPU utilisation for this test.
Indeed, I was going to say, that wheelbarrow has an old Core 2 CPU, with a 7970 on it, it took me a while to find the suitable mobo (with new PCIE and old Socket 775, haha) and it is not used for anything else except mfakto. You may remember when I was asking here about win32 and after a struggle with it, I installed win64. The monitor, till today, still shows the "black screen of death", with the "you are victim of piracy" window in the middle, which is always covered by the misfit window, haha. I don't use the computer for other things.

Performance-wise: new mfacto seems a bit faster but also the computer is less responsive. I decreased the GPUSieveSize to 64 and the ProcessSize to 16, it seems the best. BTW I remember is was a bug long ago, missing some factors when the ProcessSize was 24, is that fixed? (I only use 16 and 32 since that time, and I see that now the default is set to 24).

Last fiddled with by LaurV on 2014-08-15 at 01:47
LaurV is offline   Reply With Quote
Old 2014-08-15, 07:36   #1186
Jayder
 
Jayder's Avatar
 
Dec 2012

2×139 Posts
Default

Quote:
Originally Posted by Bdot View Post
I've added -64k and -var versions to the current version at the ftp. I have tested this version extensively and LaurV also reported successful tests. I'd ask you to run the -st2 selftest with the settings you intended to use, then feel free to use it for your normal TF tasks.
I can't properly express my thanks. I will definitely test it thoroughly, and I will report back for good measure.

Last fiddled with by Jayder on 2014-08-15 at 07:36
Jayder is offline   Reply With Quote
Old 2014-08-15, 12:18   #1187
Jayder
 
Jayder's Avatar
 
Dec 2012

2·139 Posts
Default

There are a few things I've noticed already. I did a little searching, but please forgive me if they are known about or are not issues. In all of my tests, I am using the standard 64-bit version of mfakto and not one of the special versions.

The first issue appears to be an old one (present in 0.14): SievePrimes doesn't seem to adjust after a certain point (or in some cases at all) in certain bit ranges. NumThreads is somewhat involved, but is probably not the culprit. I've pasted below some of my outputs. Descriptions come before the snippets.


In the following, the SievePrimes climbs from 50k and gets stuck somewhere before 182656. The CPU idle is low, but, whether it gets even lower or much higher, it stays at 182656. Note the "n.a.%"
Code:
[date    time] exponent  [TF bits]: percent   class #, seq      GHz/d     time |    ETA |    #FCs |      rate | SieveP. | CPU idle
[Aug 15 02:10] M4412033    [63-64]:  21.3%   975/4620,204/960   31.37   1.215s | 15m19s |  44.04M |  36.25M/s |  144321 |  11708us =  20.24%
[Aug 15 02:10] M4412033    [63-64]:  21.4%   976/4620,205/960   30.76   1.239s | 15m35s |  44.04M |  35.54M/s |  162361 |  10475us =  17.76%
[Aug 15 02:10] M4412033    [63-64]:  21.5%   987/4620,206/960   32.68   1.166s | 14m39s |  41.94M |  35.97M/s |  182656 |   4000us =   n.a.%
[Aug 15 02:11] M4412033    [63-64]:  21.6%   991/4620,207/960   32.71   1.165s | 14m37s |  41.94M |  36.00M/s |  182656 |   3993us =   n.a.%
[Aug 15 02:11] M4412033    [63-64]:  21.7%  1000/4620,208/960   32.71   1.165s | 14m36s |  41.94M |  36.00M/s |  182656 |   3651us =   n.a.%

If I set SievePrimes to be higher than 182656, it will not lower itself, even if it is set much higher.
Code:
[date    time] exponent  [TF bits]: percent   class #, seq      GHz/d     time |    ETA |    #FCs |      rate | SieveP. | CPU idle
[Aug 15 02:12] M4412033    [63-64]:  23.3%  1068/4620,224/960   26.39   1.444s | 17m43s |  41.94M |  29.05M/s |  300000 |    104us =   n.a.%
[Aug 15 02:12] M4412033    [63-64]:  23.4%  1071/4620,225/960   27.59   1.381s | 16m55s |  41.94M |  30.37M/s |  300000 |    105us =   n.a.%
[Aug 15 02:12] M4412033    [63-64]:  23.5%  1075/4620,226/960   27.56   1.383s | 16m55s |  41.94M |  30.33M/s |  300000 |    102us =   n.a.%
[Aug 15 02:12] M4412033    [63-64]:  23.6%  1080/4620,227/960   27.46   1.388s | 16m57s |  41.94M |  30.22M/s |  300000 |    104us =   n.a.%
[Aug 15 02:12] M4412033    [63-64]:  23.8%  1083/4620,228/960   27.54   1.384s | 16m53s |  41.94M |  30.31M/s |  300000 |     92us =   n.a.%
As I mentioned, it's only certain bit ranges, but it also depends on the exponent. I tested both this 4M exponent (above) as well as an 85M exponent. For the 4M, the SievePrimes has trouble adjusting up to 64 bits (64-65 adjusting fine) and the 85M exponent has trouble adjusting up to 68 bits (68-69 adjusting fine).

I noticed all of this first during the selftest (st). Attached are some files. Jayder-NS3 shows that with NumStreams 3 (or less, but not shown here) SPrimes climbs for a while but stops. Jayder-NS4 shows that with NumStreams 4 (or greater, but not shown) SPrimes doesn't change at all.

+/-, s/S, and p/P seem to work as intended, even when SPrimes is stuck as above, but it does not unstick it.


The second thing which I noticed is that time per class for my 4M exponent, 63-64 bits, has increased by at least 7%. The other two files in the archive contain brief logs showing this. There seemed to be no difference with the 85M exponent I tested. Settings all the same, computer idle.

Finally, I'm told that my "device does not support double precision operations." I don't know enough to know if this is right or not (it probably is), but I thought I'd check. I have an A4-3420 (with HD 6410D). I know the GPU does not have DP, but your description makes it sound like the DP is for the CPU. I don't know, me dumb.

I hope I have helped more than hindered. Thank you again (and kracker, and the many others who've helped).
Attached Files
File Type: zip Jayder-0.15pre2.zip (7.1 KB, 84 views)

Last fiddled with by Jayder on 2014-08-15 at 12:33
Jayder is offline   Reply With Quote
Old 2014-08-15, 17:29   #1188
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

1000011110002 Posts
Default

-st2 passed on Llano APU(6550D)

Also, -pi info for it. 7770 and HD4600 coming after I finish these assignments...

Also, I can not get my HD4600 detected in any other way except -d 11 still.
(System with two AMD(7770) cards and the "integrated" one.)
Attached Files
File Type: zip st-pi-APU.zip (23.1 KB, 65 views)
kracker is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
gpuOwL: an OpenCL program for Mersenne primality testing preda GpuOwl 2719 2021-08-05 22:43
mfaktc: a CUDA program for Mersenne prefactoring TheJudger GPU Computing 3497 2021-06-05 12:27
LL with OpenCL msft GPU Computing 433 2019-06-23 21:11
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
Program to TF Mersenne numbers with more than 1 sextillion digits? Stargate38 Factoring 24 2011-11-03 00:34

All times are UTC. The time now is 01:06.


Fri Aug 6 01:06:11 UTC 2021 up 13 days, 19:35, 1 user, load averages: 2.37, 2.40, 2.33

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.