![]() |
|
|
#1178 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23×271 Posts |
|
|
|
|
|
|
#1179 | ||||||
|
Romulan Interpreter
Jun 2011
Thailand
3×3,221 Posts |
"MSVCR110.DLL missing from your system". Didn't need it for the old one.
Quote:
Quote:
Quote:
![]() -st2 works fine, no fail. Good job! Quote:
Quote:
Quote:
Last fiddled with by LaurV on 2014-08-14 at 10:11 |
||||||
|
|
|
|
|
#1180 |
|
"Antonio Key"
Sep 2011
UK
53110 Posts |
Tried this on my system (i5, 3570k) 2 * NVidia graphics cards and integrated HD4000 enabled, Windows 7.
With GPUType = AUTO or CPU :- Windows reports 'mfakto.exe has stopped working' during the kernel compile. With GPUType=INTEL :- program compiles the kernel and runs on the CPU successfully. |
|
|
|
|
|
#1181 | |
|
Nov 2010
Germany
11258 Posts |
Quote:
![]() Good that I implemented a check for that ... you should receive a greeting and the kernels in question be skipped. |
|
|
|
|
|
|
#1182 | |||
|
Nov 2010
Germany
25516 Posts |
Thanks for your tests, there are quite some news to me:
Quote:
It's the kernels compiled for your device. You can delete it, and mfakto will not recreate it if you set UseBinFile to empty. If mfakto finds the file during startup, it will skip kernel recompilation, improving startup time a lot. Quote:
Quote:
I need to see how I can improve GPU utilisation for this test.Also the --perftest shows that my old PhenomII is between 2 and 4 times as fast as your CPU ... did you keep prime95 running? The GPU part of --perftest thinks that the optimal GPUSievePrimes is a little above 110k. It will depend on the TF task though. As the card has plenty of memory with relatively large caches, probably GPUSieveSize and GPUSieveProcessingSize maxed out are best as well. |
|||
|
|
|
|
|
#1183 | |
|
Nov 2010
Germany
3·199 Posts |
Quote:
|
|
|
|
|
|
|
#1184 | |
|
Nov 2010
Germany
10010101012 Posts |
Quote:
|
|
|
|
|
|
|
#1185 | ||
|
Romulan Interpreter
Jun 2011
Thailand
25BF16 Posts |
Quote:
Quote:
Performance-wise: new mfacto seems a bit faster but also the computer is less responsive. I decreased the GPUSieveSize to 64 and the ProcessSize to 16, it seems the best. BTW I remember is was a bug long ago, missing some factors when the ProcessSize was 24, is that fixed? (I only use 16 and 32 since that time, and I see that now the default is set to 24). Last fiddled with by LaurV on 2014-08-15 at 01:47 |
||
|
|
|
|
|
#1186 | |
|
Dec 2012
2×139 Posts |
Quote:
I will definitely test it thoroughly, and I will report back for good measure.
Last fiddled with by Jayder on 2014-08-15 at 07:36 |
|
|
|
|
|
|
#1187 |
|
Dec 2012
2·139 Posts |
There are a few things I've noticed already. I did a little searching, but please forgive me if they are known about or are not issues. In all of my tests, I am using the standard 64-bit version of mfakto and not one of the special versions.
The first issue appears to be an old one (present in 0.14): SievePrimes doesn't seem to adjust after a certain point (or in some cases at all) in certain bit ranges. NumThreads is somewhat involved, but is probably not the culprit. I've pasted below some of my outputs. Descriptions come before the snippets. In the following, the SievePrimes climbs from 50k and gets stuck somewhere before 182656. The CPU idle is low, but, whether it gets even lower or much higher, it stays at 182656. Note the "n.a.%" Code:
[date time] exponent [TF bits]: percent class #, seq GHz/d time | ETA | #FCs | rate | SieveP. | CPU idle [Aug 15 02:10] M4412033 [63-64]: 21.3% 975/4620,204/960 31.37 1.215s | 15m19s | 44.04M | 36.25M/s | 144321 | 11708us = 20.24% [Aug 15 02:10] M4412033 [63-64]: 21.4% 976/4620,205/960 30.76 1.239s | 15m35s | 44.04M | 35.54M/s | 162361 | 10475us = 17.76% [Aug 15 02:10] M4412033 [63-64]: 21.5% 987/4620,206/960 32.68 1.166s | 14m39s | 41.94M | 35.97M/s | 182656 | 4000us = n.a.% [Aug 15 02:11] M4412033 [63-64]: 21.6% 991/4620,207/960 32.71 1.165s | 14m37s | 41.94M | 36.00M/s | 182656 | 3993us = n.a.% [Aug 15 02:11] M4412033 [63-64]: 21.7% 1000/4620,208/960 32.71 1.165s | 14m36s | 41.94M | 36.00M/s | 182656 | 3651us = n.a.% If I set SievePrimes to be higher than 182656, it will not lower itself, even if it is set much higher. Code:
[date time] exponent [TF bits]: percent class #, seq GHz/d time | ETA | #FCs | rate | SieveP. | CPU idle [Aug 15 02:12] M4412033 [63-64]: 23.3% 1068/4620,224/960 26.39 1.444s | 17m43s | 41.94M | 29.05M/s | 300000 | 104us = n.a.% [Aug 15 02:12] M4412033 [63-64]: 23.4% 1071/4620,225/960 27.59 1.381s | 16m55s | 41.94M | 30.37M/s | 300000 | 105us = n.a.% [Aug 15 02:12] M4412033 [63-64]: 23.5% 1075/4620,226/960 27.56 1.383s | 16m55s | 41.94M | 30.33M/s | 300000 | 102us = n.a.% [Aug 15 02:12] M4412033 [63-64]: 23.6% 1080/4620,227/960 27.46 1.388s | 16m57s | 41.94M | 30.22M/s | 300000 | 104us = n.a.% [Aug 15 02:12] M4412033 [63-64]: 23.8% 1083/4620,228/960 27.54 1.384s | 16m53s | 41.94M | 30.31M/s | 300000 | 92us = n.a.% I noticed all of this first during the selftest (st). Attached are some files. Jayder-NS3 shows that with NumStreams 3 (or less, but not shown here) SPrimes climbs for a while but stops. Jayder-NS4 shows that with NumStreams 4 (or greater, but not shown) SPrimes doesn't change at all. +/-, s/S, and p/P seem to work as intended, even when SPrimes is stuck as above, but it does not unstick it. The second thing which I noticed is that time per class for my 4M exponent, 63-64 bits, has increased by at least 7%. The other two files in the archive contain brief logs showing this. There seemed to be no difference with the 85M exponent I tested. Settings all the same, computer idle. Finally, I'm told that my "device does not support double precision operations." I don't know enough to know if this is right or not (it probably is), but I thought I'd check. I have an A4-3420 (with HD 6410D). I know the GPU does not have DP, but your description makes it sound like the DP is for the CPU. I don't know, me dumb. ![]() I hope I have helped more than hindered. Thank you again (and kracker, and the many others who've helped). Last fiddled with by Jayder on 2014-08-15 at 12:33 |
|
|
|
|
|
#1188 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
1000011110002 Posts |
-st2 passed on Llano APU(6550D)
Also, -pi info for it. 7770 and HD4600 coming after I finish these assignments... ![]() Also, I can not get my HD4600 detected in any other way except -d 11 still. (System with two AMD(7770) cards and the "integrated" one.) |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| gpuOwL: an OpenCL program for Mersenne primality testing | preda | GpuOwl | 2719 | 2021-08-05 22:43 |
| mfaktc: a CUDA program for Mersenne prefactoring | TheJudger | GPU Computing | 3497 | 2021-06-05 12:27 |
| LL with OpenCL | msft | GPU Computing | 433 | 2019-06-23 21:11 |
| OpenCL for FPGAs | TObject | GPU Computing | 2 | 2013-10-12 21:09 |
| Program to TF Mersenne numbers with more than 1 sextillion digits? | Stargate38 | Factoring | 24 | 2011-11-03 00:34 |