mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfakto: an OpenCL program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=15646)

kracker 2013-05-02 16:31

On upgrade to 12.4 the cpu usage is back... :no: But it might have been since I installed my 7770, I'll take it out later and see.

Bdot 2013-05-02 23:39

[QUOTE=JoDu;338981]With GridSize=0 mfakto.hd4000.exe -d 11 -st passes all 1559 tests.[/QUOTE]
:shock:
This means, hd4000 cannot handle larger number of threads, but does not return an error if exceeding the limit ... I certainly did not expect that.

If you keep the GridSize=0 and run a simple TF (e.g. mfakto.hd4000.exe -d 11 -tf 51340871 69 70) what is the reported speed?

Bdot 2013-05-02 23:41

[QUOTE=kracker;339032]On upgrade to 12.4 the cpu usage is back... :no: But it might have been since I installed my 7770, I'll take it out later and see.[/QUOTE]
You mean Catalyst 13.4? Working well for me, but I just have a single GPU per machine ...

JoDu 2013-05-03 00:07

[QUOTE=Bdot;339069]:shock:
This means, hd4000 cannot handle larger number of threads, but does not return an error if exceeding the limit ... I certainly did not expect that.

If you keep the GridSize=0 and run a simple TF (e.g. mfakto.hd4000.exe -d 11 -tf 51340871 69 70) what is the reported speed?[/QUOTE]

Not super great, is the

maximum threads per grid 134217728

in the device info a lie?

[CODE]
mfakto 0.12-Win-HD4000 (64bit build)


Runtime options
Inifile mfakto.ini
SievePrimesMin 5000
SievePrimesMax 200000
SievePrimes 25000
SievePrimesAdjust 1
NumStreams 3
GridSize 0
WorkFile worktodo.txt
ResultsFile results.txt
Checkpoints enabled
CheckpointDelay 300s
Stages enabled
StopAfterFactor class
PrintMode full
V5UserID none
ComputerID none
AllowSleep yes
TimeStampInResults no
VectorSize 4
GPUType AUTO
SieveOnGPU no
SmallExp no
SieveCPUMask 0
Compiletime options
SIEVE_SIZE_LIMIT 36kiB
SIEVE_SIZE 289731bits
SIEVE_SPLIT 250
MORE_CLASSES enabled
Select device - Get device info - Compiling kernels ..........
WARNING: Unknown GPU name, assuming VLIW5 type. Please post the device name "Intel(R) HD Graphics 4000 (Intel(R) Corporation)" to http://www.mersenneforum.org/showthread.php?t=15646 to have it added to mfakto. Set GPUType in mfakto.ini to select a GPU type yourself and avoid this warning.

OpenCL device info
name Intel(R) HD Graphics 4000 (Intel(R) Corporation)
device (driver) version OpenCL 1.2 (9.18.10.3071)
maximum threads per block 512
maximum threads per grid 134217728
number of multiprocessors 16 (1280 compute elements)
clock rate 350MHz

Automatic parameters
threads per grid 131072
optimizing kernels for VLIW5

running a simple selftest ...
########## testcase 1/19 (#2597) ##########
########## testcase 2/19 (#2598) ##########
########## testcase 3/19 (#2) ##########
########## testcase 4/19 (#25) ##########
########## testcase 5/19 (#39) ##########
########## testcase 6/19 (#57) ##########
########## testcase 7/19 (#70) ##########
########## testcase 8/19 (#72) ##########
########## testcase 9/19 (#73) ##########
########## testcase 10/19 (#82) ##########
########## testcase 11/19 (#88) ##########
########## testcase 12/19 (#106) ##########
########## testcase 13/19 (#355) ##########
########## testcase 14/19 (#358) ##########
########## testcase 15/19 (#666) ##########
########## testcase 16/19 (#1547) ##########
########## testcase 17/19 (#1552) ##########
########## testcase 18/19 (#1556) ##########
########## testcase 19/19 (#1557) ##########
Selftest statistics
number of tests 50
successful tests 50

selftest PASSED!

got assignment: exp=51340871 bit_min=69 bit_max=70
Starting trial factoring M51340871 from 2^69 to 2^70 (2.33GHz-days)
k_min = 5748790375020 - k_max = 11497580755081
Using GPU kernel "mfakto_cl_barrett72"
No checkpoint file "M51340871.ckp" found.
done | ETA | GHz |time/class| #FCs | avg. rate | SieveP. |CPU idle
0.1% | 10h09m | 5.50 | 38.125s | 267.52M | 7.02M/s | 25000 | 0.00%
0.2% | 10h15m | 5.43 | 38.578s | 270.66M | 7.02M/s | 21875 | 0.00%
0.3% | 10h22m | 5.37 | 39.037s | 273.94M | 7.02M/s | 19140 | 0.00%
0.4% | 10h29m | 5.31 | 39.508s | 277.22M | 7.02M/s | 16747 | 0.00%
0.5% | 10h36m | 5.24 | 39.992s | 280.63M | 7.02M/s | 14653 | 0.00%
0.6% | 10h43m | 5.18 | 40.496s | 284.16M | 7.02M/s | 12821 | 0.00%
0.7% | 10h51m | 5.11 | 40.995s | 287.70M | 7.02M/s | 11218 | 0.00%
0.8% | 10h58m | 5.05 | 41.525s | 291.37M | 7.02M/s | 9815 | 0.00%
0.9% | 11h07m | 4.98 | 42.083s | 295.17M | 7.01M/s | 8588 | 0.00%
1.0% | 11h15m | 4.92 | 42.634s | 299.11M | 7.02M/s | 7514 | 0.00%
1.1% | 11h23m | 4.85 | 43.196s | 303.04M | 7.02M/s | 6574 | 0.00%
1.3% | 11h31m | 4.79 | 43.767s | 307.10M | 7.02M/s | 5752 | 0.00%
1.4% | 11h40m | 4.72 | 44.367s | 311.30M | 7.02M/s | 5033 | 0.00%
1.5% | 11h40m | 4.72 | 44.405s | 311.56M | 7.02M/s | 5000 | 0.00%
1.6% | 11h39m | 4.72 | 44.405s | 311.56M | 7.02M/s | 5000 | 0.00%
1.7% | 11h38m | 4.72 | 44.415s | 311.56M | 7.01M/s | 5000 | 0.00%
1.8% | 11h38m | 4.72 | 44.411s | 311.56M | 7.02M/s | 5000 | 0.00%

[/CODE]

Bdot 2013-05-03 01:11

[QUOTE=JoDu;339073]Not super great,
[/QUOTE]

Well, even if we can get it to 10GHz-days/day, the CPU is still faster than that ... but it's good to know in any case.

[QUOTE=JoDu;339073] is the maximum threads per grid 134217728

in the device info a lie?

[/QUOTE]

Maybe, maybe not. Most likely it supports 512x512x512 threads. AMD cards report something similar (256x256x256), but they don't really care about the dimensions, as long as the total fits. Possibly HD4000 is different.

Mfakto normally starts all kernels using a 2D "grid" of "maximum threads per block" x "threads per grid / maximum threads per block". On AMD cards, that usually is 256 x 8192 (Gridsize=4, i.e. 2M threads). Following that theory for HD4000, GridSize=1, i.e. 262144 = 512 x 512 should also work without errors (which did not). Worst thing about this is, that no error is returned, but the excess threads seem to be silently ignored. Makes it harder to troubleshoot.

kracker 2013-05-03 01:14

[QUOTE=Bdot;339070]You mean Catalyst 13.4? Working well for me, but I just have a single GPU per machine ...[/QUOTE]

Yes, catalyst 13.4, 13.2 worked fine but it now uses one core.. let me take out either gpu later.

EDIT: taking out the 7770 still takes one core on integrated 6550D... I'm probably doing something wrong again...

Bdot 2013-05-03 12:08

[QUOTE=kracker;339079]Yes, catalyst 13.4, 13.2 worked fine but it now uses one core.. let me take out either gpu later.

EDIT: taking out the 7770 still takes one core on integrated 6550D... I'm probably doing something wrong again...[/QUOTE]

Which mfakto version are you running? If it is anything before the last GPU-sieve-preview, isn't a core per mfakto instance normal?

kracker 2013-05-03 17:09

[QUOTE=Bdot;339131]Which mfakto version are you running? If it is anything before the last GPU-sieve-preview, isn't a core per mfakto instance normal?[/QUOTE]

This is the gpu sieve version. one instance uses 99% gpu and one core cpu... I might try 12.2 later.

Bdot 2013-05-08 23:09

[QUOTE=kracker;339160]This is the gpu sieve version. one instance uses 99% gpu and one core cpu... I might try 12.2 later.[/QUOTE]

I can now reproduce this and have posted it on the AMD forum for help. In the meantime, I can only recommend to go back to something before 13.4.

I now finished the next beta, 0.13pre4. I did not remove the workaround for the compiler-bug of the older Catalyst versions. Therefore, this one does not require 13.4 - I tested it on 13.1. Apart from taking less CPU, 13.1 was also ~2% faster than 13.4 ...

Axelsson 2013-05-10 06:58

The 0.13pre4 is a lot faster on small numbers, I got up to 260 GHzdays/day on numbers just above two millions, even found two new factors that I have reported. :smile:

But then I ran the -st2 selftest on the new beta and got 3 failed self tests. :sad:

[CODE]########## testcase 19172/32927 ##########
Starting trial factoring M597345241 from 2^63 to 2^64 (0.00GHz-days)
Using GPU kernel "cl_barrett15_82_gs"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
May 10 02:21 | 1676 0.1% | 0.012 n.a. | n.a. 82485 0.00%
no factor for M597345241 from 2^63 to 2^64 [mfakto 0.13pre4-Win cl_barrett15_82_gs_4]
ERROR: selftest failed for M597345241 (cl_barrett15_82_gs)
no factor found
tf(): total time spent: 0.012s

Starting trial factoring M597345241 from 2^63 to 2^64 (0.00GHz-days)
Using GPU kernel "cl_barrett15_83_gs"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
May 10 02:21 | 1676 0.1% | 0.012 n.a. | n.a. 82485 0.00%
no factor for M597345241 from 2^63 to 2^64 [mfakto 0.13pre4-Win cl_barrett15_83_gs_4]
ERROR: selftest failed for M597345241 (cl_barrett15_83_gs)
no factor found
tf(): total time spent: 0.012s

Starting trial factoring M597345241 from 2^63 to 2^64 (0.00GHz-days)
Using GPU kernel "cl_barrett15_88_gs"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
May 10 02:21 | 1676 0.1% | 0.012 n.a. | n.a. 82485 0.00%
no factor for M597345241 from 2^63 to 2^64 [mfakto 0.13pre4-Win cl_barrett15_88_gs_4]
ERROR: selftest failed for M597345241 (cl_barrett15_88_gs)
no factor found
tf(): total time spent: 0.012s
[/CODE]GPU : HD6970, windows 7 professional 64 bit

Axelsson 2013-05-10 07:05

By the way, I'm still on Catalyst 12.10

[CODE]Runtime options
Inifile mfakto.ini
Verbosity 1
SieveOnGPU yes
GPUSievePrimes 82486
GPUSieveSize 64Mi bits
GPUSieveProcessSize 16Ki bits
WorkFile worktodo.txt
ResultsFile results.txt
Checkpoints enabled
CheckpointDelay 300s
Stages enabled
StopAfterFactor class
PrintMode compact
V5UserID none
ComputerID none
TimeStampInResults yes
VectorSize 4
GPUType AUTO
SmallExp no
Compiletime options
MORE_CLASSES enabled
Select device - Get device info - Compiling kernels .................

OpenCL device info
name Cayman (Advanced Micro Devices, Inc.)
device (driver) version OpenCL 1.2 AMD-APP (1016.4) (1016.4 (VM))
maximum threads per block 256
maximum threads per grid 16777216
number of multiprocessors 24 (1536 compute elements)
clock rate 880MHz

Automatic parameters
threads per grid 2097152
optimizing kernels for VLIW4[/CODE]


All times are UTC. The time now is 23:08.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.