![]() |
|
|
#749 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23·271 Posts |
On upgrade to 12.4 the cpu usage is back...
But it might have been since I installed my 7770, I'll take it out later and see.
Last fiddled with by kracker on 2013-05-02 at 16:32 |
|
|
|
|
|
#750 |
|
Nov 2010
Germany
10010101012 Posts |
![]() This means, hd4000 cannot handle larger number of threads, but does not return an error if exceeding the limit ... I certainly did not expect that. If you keep the GridSize=0 and run a simple TF (e.g. mfakto.hd4000.exe -d 11 -tf 51340871 69 70) what is the reported speed? |
|
|
|
|
|
#751 |
|
Nov 2010
Germany
10010101012 Posts |
|
|
|
|
|
|
#752 | |
|
May 2013
1012 Posts |
Quote:
maximum threads per grid 134217728 in the device info a lie? Code:
mfakto 0.12-Win-HD4000 (64bit build) Runtime options Inifile mfakto.ini SievePrimesMin 5000 SievePrimesMax 200000 SievePrimes 25000 SievePrimesAdjust 1 NumStreams 3 GridSize 0 WorkFile worktodo.txt ResultsFile results.txt Checkpoints enabled CheckpointDelay 300s Stages enabled StopAfterFactor class PrintMode full V5UserID none ComputerID none AllowSleep yes TimeStampInResults no VectorSize 4 GPUType AUTO SieveOnGPU no SmallExp no SieveCPUMask 0 Compiletime options SIEVE_SIZE_LIMIT 36kiB SIEVE_SIZE 289731bits SIEVE_SPLIT 250 MORE_CLASSES enabled Select device - Get device info - Compiling kernels .......... WARNING: Unknown GPU name, assuming VLIW5 type. Please post the device name "Intel(R) HD Graphics 4000 (Intel(R) Corporation)" to http://www.mersenneforum.org/showthread.php?t=15646 to have it added to mfakto. Set GPUType in mfakto.ini to select a GPU type yourself and avoid this warning. OpenCL device info name Intel(R) HD Graphics 4000 (Intel(R) Corporation) device (driver) version OpenCL 1.2 (9.18.10.3071) maximum threads per block 512 maximum threads per grid 134217728 number of multiprocessors 16 (1280 compute elements) clock rate 350MHz Automatic parameters threads per grid 131072 optimizing kernels for VLIW5 running a simple selftest ... ########## testcase 1/19 (#2597) ########## ########## testcase 2/19 (#2598) ########## ########## testcase 3/19 (#2) ########## ########## testcase 4/19 (#25) ########## ########## testcase 5/19 (#39) ########## ########## testcase 6/19 (#57) ########## ########## testcase 7/19 (#70) ########## ########## testcase 8/19 (#72) ########## ########## testcase 9/19 (#73) ########## ########## testcase 10/19 (#82) ########## ########## testcase 11/19 (#88) ########## ########## testcase 12/19 (#106) ########## ########## testcase 13/19 (#355) ########## ########## testcase 14/19 (#358) ########## ########## testcase 15/19 (#666) ########## ########## testcase 16/19 (#1547) ########## ########## testcase 17/19 (#1552) ########## ########## testcase 18/19 (#1556) ########## ########## testcase 19/19 (#1557) ########## Selftest statistics number of tests 50 successful tests 50 selftest PASSED! got assignment: exp=51340871 bit_min=69 bit_max=70 Starting trial factoring M51340871 from 2^69 to 2^70 (2.33GHz-days) k_min = 5748790375020 - k_max = 11497580755081 Using GPU kernel "mfakto_cl_barrett72" No checkpoint file "M51340871.ckp" found. done | ETA | GHz |time/class| #FCs | avg. rate | SieveP. |CPU idle 0.1% | 10h09m | 5.50 | 38.125s | 267.52M | 7.02M/s | 25000 | 0.00% 0.2% | 10h15m | 5.43 | 38.578s | 270.66M | 7.02M/s | 21875 | 0.00% 0.3% | 10h22m | 5.37 | 39.037s | 273.94M | 7.02M/s | 19140 | 0.00% 0.4% | 10h29m | 5.31 | 39.508s | 277.22M | 7.02M/s | 16747 | 0.00% 0.5% | 10h36m | 5.24 | 39.992s | 280.63M | 7.02M/s | 14653 | 0.00% 0.6% | 10h43m | 5.18 | 40.496s | 284.16M | 7.02M/s | 12821 | 0.00% 0.7% | 10h51m | 5.11 | 40.995s | 287.70M | 7.02M/s | 11218 | 0.00% 0.8% | 10h58m | 5.05 | 41.525s | 291.37M | 7.02M/s | 9815 | 0.00% 0.9% | 11h07m | 4.98 | 42.083s | 295.17M | 7.01M/s | 8588 | 0.00% 1.0% | 11h15m | 4.92 | 42.634s | 299.11M | 7.02M/s | 7514 | 0.00% 1.1% | 11h23m | 4.85 | 43.196s | 303.04M | 7.02M/s | 6574 | 0.00% 1.3% | 11h31m | 4.79 | 43.767s | 307.10M | 7.02M/s | 5752 | 0.00% 1.4% | 11h40m | 4.72 | 44.367s | 311.30M | 7.02M/s | 5033 | 0.00% 1.5% | 11h40m | 4.72 | 44.405s | 311.56M | 7.02M/s | 5000 | 0.00% 1.6% | 11h39m | 4.72 | 44.405s | 311.56M | 7.02M/s | 5000 | 0.00% 1.7% | 11h38m | 4.72 | 44.415s | 311.56M | 7.01M/s | 5000 | 0.00% 1.8% | 11h38m | 4.72 | 44.411s | 311.56M | 7.02M/s | 5000 | 0.00% |
|
|
|
|
|
|
#753 | |
|
Nov 2010
Germany
3·199 Posts |
Well, even if we can get it to 10GHz-days/day, the CPU is still faster than that ... but it's good to know in any case.
Quote:
Mfakto normally starts all kernels using a 2D "grid" of "maximum threads per block" x "threads per grid / maximum threads per block". On AMD cards, that usually is 256 x 8192 (Gridsize=4, i.e. 2M threads). Following that theory for HD4000, GridSize=1, i.e. 262144 = 512 x 512 should also work without errors (which did not). Worst thing about this is, that no error is returned, but the excess threads seem to be silently ignored. Makes it harder to troubleshoot. |
|
|
|
|
|
|
#754 | |
|
"Mr. Meeseeks"
Jan 2012
California, USA
41708 Posts |
Quote:
EDIT: taking out the 7770 still takes one core on integrated 6550D... I'm probably doing something wrong again... Last fiddled with by kracker on 2013-05-03 at 01:43 |
|
|
|
|
|
|
#755 |
|
Nov 2010
Germany
3·199 Posts |
Which mfakto version are you running? If it is anything before the last GPU-sieve-preview, isn't a core per mfakto instance normal?
|
|
|
|
|
|
#756 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23×271 Posts |
|
|
|
|
|
|
#757 | |
|
Nov 2010
Germany
3·199 Posts |
Quote:
I now finished the next beta, 0.13pre4. I did not remove the workaround for the compiler-bug of the older Catalyst versions. Therefore, this one does not require 13.4 - I tested it on 13.1. Apart from taking less CPU, 13.1 was also ~2% faster than 13.4 ... |
|
|
|
|
|
|
#758 |
|
Jul 2012
Sweden
4210 Posts |
The 0.13pre4 is a lot faster on small numbers, I got up to 260 GHzdays/day on numbers just above two millions, even found two new factors that I have reported.
![]() But then I ran the -st2 selftest on the new beta and got 3 failed self tests. ![]() Code:
########## testcase 19172/32927 ########## Starting trial factoring M597345241 from 2^63 to 2^64 (0.00GHz-days) Using GPU kernel "cl_barrett15_82_gs" Date Time | class Pct | time ETA | GHz-d/day Sieve Wait May 10 02:21 | 1676 0.1% | 0.012 n.a. | n.a. 82485 0.00% no factor for M597345241 from 2^63 to 2^64 [mfakto 0.13pre4-Win cl_barrett15_82_gs_4] ERROR: selftest failed for M597345241 (cl_barrett15_82_gs) no factor found tf(): total time spent: 0.012s Starting trial factoring M597345241 from 2^63 to 2^64 (0.00GHz-days) Using GPU kernel "cl_barrett15_83_gs" Date Time | class Pct | time ETA | GHz-d/day Sieve Wait May 10 02:21 | 1676 0.1% | 0.012 n.a. | n.a. 82485 0.00% no factor for M597345241 from 2^63 to 2^64 [mfakto 0.13pre4-Win cl_barrett15_83_gs_4] ERROR: selftest failed for M597345241 (cl_barrett15_83_gs) no factor found tf(): total time spent: 0.012s Starting trial factoring M597345241 from 2^63 to 2^64 (0.00GHz-days) Using GPU kernel "cl_barrett15_88_gs" Date Time | class Pct | time ETA | GHz-d/day Sieve Wait May 10 02:21 | 1676 0.1% | 0.012 n.a. | n.a. 82485 0.00% no factor for M597345241 from 2^63 to 2^64 [mfakto 0.13pre4-Win cl_barrett15_88_gs_4] ERROR: selftest failed for M597345241 (cl_barrett15_88_gs) no factor found tf(): total time spent: 0.012s Last fiddled with by Axelsson on 2013-05-10 at 07:00 |
|
|
|
|
|
#759 |
|
Jul 2012
Sweden
2·3·7 Posts |
By the way, I'm still on Catalyst 12.10
Code:
Runtime options Inifile mfakto.ini Verbosity 1 SieveOnGPU yes GPUSievePrimes 82486 GPUSieveSize 64Mi bits GPUSieveProcessSize 16Ki bits WorkFile worktodo.txt ResultsFile results.txt Checkpoints enabled CheckpointDelay 300s Stages enabled StopAfterFactor class PrintMode compact V5UserID none ComputerID none TimeStampInResults yes VectorSize 4 GPUType AUTO SmallExp no Compiletime options MORE_CLASSES enabled Select device - Get device info - Compiling kernels ................. OpenCL device info name Cayman (Advanced Micro Devices, Inc.) device (driver) version OpenCL 1.2 AMD-APP (1016.4) (1016.4 (VM)) maximum threads per block 256 maximum threads per grid 16777216 number of multiprocessors 24 (1536 compute elements) clock rate 880MHz Automatic parameters threads per grid 2097152 optimizing kernels for VLIW4 |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| gpuOwL: an OpenCL program for Mersenne primality testing | preda | GpuOwl | 2718 | 2021-07-06 18:30 |
| mfaktc: a CUDA program for Mersenne prefactoring | TheJudger | GPU Computing | 3497 | 2021-06-05 12:27 |
| LL with OpenCL | msft | GPU Computing | 433 | 2019-06-23 21:11 |
| OpenCL for FPGAs | TObject | GPU Computing | 2 | 2013-10-12 21:09 |
| Program to TF Mersenne numbers with more than 1 sextillion digits? | Stargate38 | Factoring | 24 | 2011-11-03 00:34 |