![]() |
Thanks! Will try it out.
On another point... Intel iGPU's [URL="https://software.intel.com/en-us/forums/topic/393241"]do not have[/URL] double precision... :sad: |
1 Attachment(s)
"MSVCR110.DLL missing from your system". Didn't need it for the old one.
[QUOTE=Bdot;380305] [LIST][*] does mfakto detect the devices automatically or are switches (like -d 11) required[/LIST][/QUOTE] Yes, (after installing the redistributable thing) there is only one HD card here, successfully detected. [QUOTE] [LIST][*]does it correctly identify the devices and their device type[/LIST][/QUOTE]Yes. What's with the big "elf" file? Can it be deleted? [QUOTE] [LIST][*] is 'mfakto -st' reporting success (on fast systems, or when you have lots of time, 'mfakto -st[B]2[/B]') - if testing is too long, you can always interrupt by pressing 'q' or Ctrl-C.[/LIST][/QUOTE]3092/3092 successful tests. Or I could say that something is odd... because all 3092 exponents picked had factors... Hm... :razz: -st2 works fine, no fail. Good job! [QUOTE] [LIST][*] use a normal trial factoring task and try to optimize the ini-file settings: try VectorSize=2 and =4 (1, 3, 8 and 16 are possible as well) to see which is faster, then use the +/-, s/S, p/P keys to get the best possible GHz-days: what was the TF job, and which settings (VectorSize, SievePrimes, SieveSize, SieveProcessSize) gave the best performance for the specific device?[/LIST][/QUOTE]Not so much to do here, GCN card, VS=2 still works best, still playing with it. [QUOTE] [LIST][*] any problems/suggestions?[/LIST] [/QUOTE]Cosmetic: I ran it with "-i" switch with no file parameter (just "mfakto -i", by reflex, I was looking for card "info" hehe) and it crashes ugly. [QUOTE] Additional performance-testing: As the new division algorithm is based on double precision, I'd need to get performance results from as many different devices as possible: [LIST][*]Modify the ini-file to use the best VectorSize (see above)[*]Switch to CPU sieving: SieveOnGPU=0[*]make sure CPU and GPU are idle[*]run "mfakto-pi.exe -st > st-pi.log"[*]keep it running for one or two minutes, then press q (or Ctrl-C)[*]have a look at st-pi.log: ist the detected clock speed correct (it rarely is on AMD - please let me know the correct one)[*]send me the log[/LIST]Thanks a lot for any help you can provide - even if the complete checklist is too long for you: any partial result is also appreciated.[/QUOTE]Tried to do that. I have the file(s) (from -pi and from --perftest). Where I can put them? [edit: solved, didn't know the quota limit for zip is larger] |
Tried this on my system (i5, 3570k) 2 * NVidia graphics cards and integrated HD4000 enabled, Windows 7.
With GPUType = AUTO or CPU :- Windows reports 'mfakto.exe has stopped working' during the kernel compile. With GPUType=INTEL :- program compiles the kernel and runs on the CPU successfully. |
[QUOTE=kracker;380308]Thanks! Will try it out.
On another point... Intel iGPU's [URL="https://software.intel.com/en-us/forums/topic/393241"]do not have[/URL] double precision... :sad:[/QUOTE] Ohh ... :blush: Good that I implemented a check for that ... you should receive a greeting and the kernels in question be skipped. |
Thanks for your tests, there are quite some news to me:
[QUOTE=LaurV;380351]"MSVCR110.DLL missing from your system". Didn't need it for the old one. [/QUOTE] I did not remember it is different, I thought I moved to VS12 before 0.14 ... but [URL="https://github.com/Bdot42/mfakto/commits/master?page=2"]git [/URL]tells otherwise ... So this needs to be added to the requirements list. [QUOTE=LaurV;380351] Yes. What's with the big "elf" file? Can it be deleted? [/QUOTE] It's the kernels compiled for your device. You can delete it, and mfakto will not recreate it if you set UseBinFile to empty. If mfakto finds the file during startup, it will skip kernel recompilation, improving startup time a lot. [QUOTE=LaurV;380351] Cosmetic: I ran it with "-i" switch with no file parameter (just "mfakto -i", by reflex, I was looking for card "info" hehe) and it crashes ugly. [/QUOTE] Very good. It's actually reports like these that I'm looking for. Fixed. [QUOTE=LaurV;380351] Tried to do that. I have the file(s) (from -pi and from --perftest). Where I can put them? [edit: solved, didn't know the quota limit for zip is larger][/QUOTE] Your card does not even spin up to full clock speed for the -pi test - your CPU is just too slow :razz: I need to see how I can improve GPU utilisation for this test. Also the --perftest shows that my old PhenomII is between 2 and 4 times as fast as your CPU ... did you keep prime95 running? The GPU part of --perftest thinks that the optimal GPUSievePrimes is a little above 110k. It will depend on the TF task though. As the card has plenty of memory with relatively large caches, probably GPUSieveSize and GPUSieveProcessingSize maxed out are best as well. |
[QUOTE=Antonio;380352]Tried this on my system (i5, 3570k) 2 * NVidia graphics cards and integrated HD4000 enabled, Windows 7.
With GPUType = AUTO or CPU :- Windows reports 'mfakto.exe has stopped working' during the kernel compile. With GPUType=INTEL :- program compiles the kernel and runs on the CPU successfully.[/QUOTE] Can you tell a bit more about your system: [LIST][*]Which Graphics drivers (AMD, Intel and/or NVIDIA, and which version) I see a crash during compile as well when trying to run it on my Quadro FX 880M with NV drivers 334.something. It's an NV driver bug, it used to work with older drivers.[*]Interesting detail that GPUType=INTEL make it work ... that one skips optimization and enables a few workarounds in the code.[*]Does mfakto -d 11 / -d 12 / -d 13 / -d 21 / 22 / 23 / ... try to use other devices? (keep increasing the two digits separately until mfakto tells something like "Error: Only 1 platforms found. Cannot use platform 3..." or "Error: Only 1 devices found. Cannot use device 3..:" Does any of the settings select the HD4000? Is the HD4000 listed in the output of "clinfo"?[*]How did you check the HD4000 is enabled? Does it have a monitor connected?[/LIST] |
[QUOTE=Jayder;380174]I know it's been a while since release, but if you can be bothered to, would you mind making a 64kB version if not also (optionally) a -var version? The GPU sieve is nice, but I think I am willing to switch back as the CPU sieve results in almost twice the speed on my APU. The standard 36kB sieve size limit is also quite a bit slower than 64kB.
Feel free to say no or to put it at the end of your to-do list. :smile: I can stick with the GPU sieve for a while longer. I seem to be the only one wanting it, and I don't expect you to go out of your way or anything.[/QUOTE] I've added -64k and -var versions to the current version at the [URL="http://mersenneforum.org/mfakto/mfakto-0.15pre2/mfakto-0.15pre2.zip"]ftp[/URL]. I have tested this version extensively and LaurV also reported successful tests. I'd ask you to run the -st2 selftest with the settings you intended to use, then feel free to use it for your normal TF tasks. |
[QUOTE=Bdot;380408]It's the kernels compiled for your device. You can delete it, and mfakto will not recreate it if you set UseBinFile to empty. If mfakto finds the file during startup, it will skip kernel recompilation, improving startup time a lot.
[/QUOTE] I understood as much as this, looking into the new ini file, after I posted my previous post. [QUOTE]Your card does not even spin up to full clock speed for the -pi test - your CPU is just too slow :razz: I need to see how I can improve GPU utilisation for this test.[/QUOTE]Indeed, I was going to say, that wheelbarrow has an old Core 2 CPU, with a 7970 on it, it took me a while to find the suitable mobo (with new PCIE and old Socket 775, haha) and it is not used for anything else except mfakto. You may remember when I was asking here about win32 and after a struggle with it, I installed win64. The monitor, till today, still shows the "black screen of death", with the "you are victim of piracy" window in the middle, which is always covered by the misfit window, haha. I don't use the computer for other things. Performance-wise: new mfacto seems a bit faster but also the computer is less responsive. I decreased the GPUSieveSize to 64 and the ProcessSize to 16, it seems the best. BTW I remember is was a bug long ago, missing some factors when the ProcessSize was 24, is that fixed? (I only use 16 and 32 since that time, and I see that now the default is set to 24). |
[QUOTE=Bdot;380430]I've added -64k and -var versions to the current version at the [URL="http://mersenneforum.org/mfakto/mfakto-0.15pre2/mfakto-0.15pre2.zip"]ftp[/URL]. I have tested this version extensively and LaurV also reported successful tests. I'd ask you to run the -st2 selftest with the settings you intended to use, then feel free to use it for your normal TF tasks.[/QUOTE]
I can't properly express my thanks. :smile: I will definitely test it thoroughly, and I will report back for good measure. |
1 Attachment(s)
There are a few things I've noticed already. I did a little searching, but please forgive me if they are known about or are not issues. In all of my tests, I am using the standard 64-bit version of mfakto and not one of the special versions.
The first issue appears to be an old one (present in 0.14): SievePrimes doesn't seem to adjust after a certain point (or in some cases at all) in certain bit ranges. NumThreads is somewhat involved, but is probably not the culprit. I've pasted below some of my outputs. Descriptions come before the snippets. In the following, the SievePrimes climbs from 50k and gets stuck somewhere before 182656. The CPU idle is low, but, whether it gets even lower or much higher, it stays at 182656. Note the "n.a.%" [CODE][date time] exponent [TF bits]: percent class #, seq GHz/d time | ETA | #FCs | rate | SieveP. | CPU idle [Aug 15 02:10] M4412033 [63-64]: 21.3% 975/4620,204/960 31.37 1.215s | 15m19s | 44.04M | 36.25M/s | 144321 | 11708us = 20.24% [Aug 15 02:10] M4412033 [63-64]: 21.4% 976/4620,205/960 30.76 1.239s | 15m35s | 44.04M | 35.54M/s | 162361 | 10475us = 17.76% [Aug 15 02:10] M4412033 [63-64]: 21.5% 987/4620,206/960 32.68 1.166s | 14m39s | 41.94M | 35.97M/s | 182656 | 4000us = n.a.% [Aug 15 02:11] M4412033 [63-64]: 21.6% 991/4620,207/960 32.71 1.165s | 14m37s | 41.94M | 36.00M/s | 182656 | 3993us = n.a.% [Aug 15 02:11] M4412033 [63-64]: 21.7% 1000/4620,208/960 32.71 1.165s | 14m36s | 41.94M | 36.00M/s | 182656 | 3651us = n.a.%[/CODE] If I set SievePrimes to be higher than 182656, it will not lower itself, even if it is set much higher. [code][date time] exponent [TF bits]: percent class #, seq GHz/d time | ETA | #FCs | rate | SieveP. | CPU idle [Aug 15 02:12] M4412033 [63-64]: 23.3% 1068/4620,224/960 26.39 1.444s | 17m43s | 41.94M | 29.05M/s | 300000 | 104us = n.a.% [Aug 15 02:12] M4412033 [63-64]: 23.4% 1071/4620,225/960 27.59 1.381s | 16m55s | 41.94M | 30.37M/s | 300000 | 105us = n.a.% [Aug 15 02:12] M4412033 [63-64]: 23.5% 1075/4620,226/960 27.56 1.383s | 16m55s | 41.94M | 30.33M/s | 300000 | 102us = n.a.% [Aug 15 02:12] M4412033 [63-64]: 23.6% 1080/4620,227/960 27.46 1.388s | 16m57s | 41.94M | 30.22M/s | 300000 | 104us = n.a.% [Aug 15 02:12] M4412033 [63-64]: 23.8% 1083/4620,228/960 27.54 1.384s | 16m53s | 41.94M | 30.31M/s | 300000 | 92us = n.a.%[/code] As I mentioned, it's only certain bit ranges, but it also depends on the exponent. I tested both this 4M exponent (above) as well as an 85M exponent. For the 4M, the SievePrimes has trouble adjusting up to 64 bits (64-65 adjusting fine) and the 85M exponent has trouble adjusting up to 68 bits (68-69 adjusting fine). I noticed all of this first during the selftest (st). Attached are some files. Jayder-NS3 shows that with NumStreams 3 (or less, but not shown here) SPrimes climbs for a while but stops. Jayder-NS4 shows that with NumStreams 4 (or greater, but not shown) SPrimes doesn't change at all. +/-, s/S, and p/P seem to work as intended, even when SPrimes is stuck as above, but it does not unstick it. The second thing which I noticed is that time per class for my 4M exponent, 63-64 bits, has increased by at least 7%. The other two files in the archive contain brief logs showing this. There seemed to be no difference with the 85M exponent I tested. Settings all the same, computer idle. Finally, I'm told that my "device does not support double precision operations." I don't know enough to know if this is right or not (it probably is), but I thought I'd check. I have an A4-3420 (with HD 6410D). I know the GPU does not have DP, but your description makes it sound like the DP is for the CPU. I don't know, me dumb. :cmd: I hope I have helped more than hindered. Thank you again (and kracker, and the many others who've helped). |
1 Attachment(s)
-st2 passed on Llano APU(6550D)
Also, -pi info for it. 7770 and HD4600 coming after I finish these assignments... :razz: Also, I can not get my HD4600 detected in any other way except -d 11 still. (System with two AMD(7770) cards and the "integrated" one.) |
| All times are UTC. The time now is 23:04. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.