mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2013-05-02, 16:31   #749
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23×271 Posts
Default

On upgrade to 12.4 the cpu usage is back... But it might have been since I installed my 7770, I'll take it out later and see.

Last fiddled with by kracker on 2013-05-02 at 16:32
kracker is offline   Reply With Quote
Old 2013-05-02, 23:39   #750
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3·199 Posts
Default

Quote:
Originally Posted by JoDu View Post
With GridSize=0 mfakto.hd4000.exe -d 11 -st passes all 1559 tests.

This means, hd4000 cannot handle larger number of threads, but does not return an error if exceeding the limit ... I certainly did not expect that.

If you keep the GridSize=0 and run a simple TF (e.g. mfakto.hd4000.exe -d 11 -tf 51340871 69 70) what is the reported speed?
Bdot is offline   Reply With Quote
Old 2013-05-02, 23:41   #751
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

25516 Posts
Default

Quote:
Originally Posted by kracker View Post
On upgrade to 12.4 the cpu usage is back... But it might have been since I installed my 7770, I'll take it out later and see.
You mean Catalyst 13.4? Working well for me, but I just have a single GPU per machine ...
Bdot is offline   Reply With Quote
Old 2013-05-03, 00:07   #752
JoDu
 
May 2013

1012 Posts
Default

Quote:
Originally Posted by Bdot View Post

This means, hd4000 cannot handle larger number of threads, but does not return an error if exceeding the limit ... I certainly did not expect that.

If you keep the GridSize=0 and run a simple TF (e.g. mfakto.hd4000.exe -d 11 -tf 51340871 69 70) what is the reported speed?
Not super great, is the

maximum threads per grid 134217728

in the device info a lie?

Code:
mfakto 0.12-Win-HD4000 (64bit build)


Runtime options
  Inifile                   mfakto.ini
  SievePrimesMin            5000
  SievePrimesMax            200000
  SievePrimes               25000
  SievePrimesAdjust         1
  NumStreams                3
  GridSize                  0
  WorkFile                  worktodo.txt
  ResultsFile               results.txt
  Checkpoints               enabled
  CheckpointDelay           300s
  Stages                    enabled
  StopAfterFactor           class
  PrintMode                 full
  V5UserID                  none
  ComputerID                none
  AllowSleep                yes
  TimeStampInResults        no
  VectorSize                4
  GPUType                   AUTO
  SieveOnGPU                no
  SmallExp                  no
  SieveCPUMask              0
Compiletime options
  SIEVE_SIZE_LIMIT          36kiB
  SIEVE_SIZE                289731bits
  SIEVE_SPLIT               250
  MORE_CLASSES              enabled
Select device - Get device info - Compiling kernels ..........
WARNING: Unknown GPU name, assuming VLIW5 type. Please post the device name "Intel(R) HD Graphics 4000 (Intel(R) Corporation)" to http://www.mersenneforum.org/showthread.php?t=15646 to have it added to mfakto. Set GPUType in mfakto.ini to select a GPU type yourself and avoid this warning.

OpenCL device info
  name                      Intel(R) HD Graphics 4000 (Intel(R) Corporation)
  device (driver) version   OpenCL 1.2  (9.18.10.3071)
  maximum threads per block 512
  maximum threads per grid  134217728
  number of multiprocessors 16 (1280 compute elements)
  clock rate                350MHz

Automatic parameters
  threads per grid          131072
  optimizing kernels for    VLIW5

running a simple selftest ...
########## testcase 1/19 (#2597) ##########
########## testcase 2/19 (#2598) ##########
########## testcase 3/19 (#2) ##########
########## testcase 4/19 (#25) ##########
########## testcase 5/19 (#39) ##########
########## testcase 6/19 (#57) ##########
########## testcase 7/19 (#70) ##########
########## testcase 8/19 (#72) ##########
########## testcase 9/19 (#73) ##########
########## testcase 10/19 (#82) ##########
########## testcase 11/19 (#88) ##########
########## testcase 12/19 (#106) ##########
########## testcase 13/19 (#355) ##########
########## testcase 14/19 (#358) ##########
########## testcase 15/19 (#666) ##########
########## testcase 16/19 (#1547) ##########
########## testcase 17/19 (#1552) ##########
########## testcase 18/19 (#1556) ##########
########## testcase 19/19 (#1557) ##########
Selftest statistics                          
  number of tests           50
  successful tests          50

selftest PASSED!

got assignment: exp=51340871 bit_min=69 bit_max=70
Starting trial factoring M51340871 from 2^69 to 2^70 (2.33GHz-days)
  k_min = 5748790375020 - k_max = 11497580755081
Using GPU kernel "mfakto_cl_barrett72"
No checkpoint file "M51340871.ckp" found.
  done |    ETA |     GHz |time/class|    #FCs | avg. rate | SieveP. |CPU idle
  0.1% | 10h09m |    5.50 |  38.125s | 267.52M |   7.02M/s |   25000 |   0.00%
  0.2% | 10h15m |    5.43 |  38.578s | 270.66M |   7.02M/s |   21875 |   0.00%
  0.3% | 10h22m |    5.37 |  39.037s | 273.94M |   7.02M/s |   19140 |   0.00%
  0.4% | 10h29m |    5.31 |  39.508s | 277.22M |   7.02M/s |   16747 |   0.00%
  0.5% | 10h36m |    5.24 |  39.992s | 280.63M |   7.02M/s |   14653 |   0.00%
  0.6% | 10h43m |    5.18 |  40.496s | 284.16M |   7.02M/s |   12821 |   0.00%
  0.7% | 10h51m |    5.11 |  40.995s | 287.70M |   7.02M/s |   11218 |   0.00%
  0.8% | 10h58m |    5.05 |  41.525s | 291.37M |   7.02M/s |    9815 |   0.00%
  0.9% | 11h07m |    4.98 |  42.083s | 295.17M |   7.01M/s |    8588 |   0.00%
  1.0% | 11h15m |    4.92 |  42.634s | 299.11M |   7.02M/s |    7514 |   0.00%
  1.1% | 11h23m |    4.85 |  43.196s | 303.04M |   7.02M/s |    6574 |   0.00%
  1.3% | 11h31m |    4.79 |  43.767s | 307.10M |   7.02M/s |    5752 |   0.00%
  1.4% | 11h40m |    4.72 |  44.367s | 311.30M |   7.02M/s |    5033 |   0.00%
  1.5% | 11h40m |    4.72 |  44.405s | 311.56M |   7.02M/s |    5000 |   0.00%
  1.6% | 11h39m |    4.72 |  44.405s | 311.56M |   7.02M/s |    5000 |   0.00%
  1.7% | 11h38m |    4.72 |  44.415s | 311.56M |   7.01M/s |    5000 |   0.00%
  1.8% | 11h38m |    4.72 |  44.411s | 311.56M |   7.02M/s |    5000 |   0.00%
JoDu is offline   Reply With Quote
Old 2013-05-03, 01:11   #753
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

59710 Posts
Default

Quote:
Originally Posted by JoDu View Post
Not super great,
Well, even if we can get it to 10GHz-days/day, the CPU is still faster than that ... but it's good to know in any case.

Quote:
Originally Posted by JoDu View Post
is the maximum threads per grid 134217728

in the device info a lie?
Maybe, maybe not. Most likely it supports 512x512x512 threads. AMD cards report something similar (256x256x256), but they don't really care about the dimensions, as long as the total fits. Possibly HD4000 is different.

Mfakto normally starts all kernels using a 2D "grid" of "maximum threads per block" x "threads per grid / maximum threads per block". On AMD cards, that usually is 256 x 8192 (Gridsize=4, i.e. 2M threads). Following that theory for HD4000, GridSize=1, i.e. 262144 = 512 x 512 should also work without errors (which did not). Worst thing about this is, that no error is returned, but the excess threads seem to be silently ignored. Makes it harder to troubleshoot.
Bdot is offline   Reply With Quote
Old 2013-05-03, 01:14   #754
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23×271 Posts
Default

Quote:
Originally Posted by Bdot View Post
You mean Catalyst 13.4? Working well for me, but I just have a single GPU per machine ...
Yes, catalyst 13.4, 13.2 worked fine but it now uses one core.. let me take out either gpu later.

EDIT: taking out the 7770 still takes one core on integrated 6550D... I'm probably doing something wrong again...

Last fiddled with by kracker on 2013-05-03 at 01:43
kracker is offline   Reply With Quote
Old 2013-05-03, 12:08   #755
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3·199 Posts
Default

Quote:
Originally Posted by kracker View Post
Yes, catalyst 13.4, 13.2 worked fine but it now uses one core.. let me take out either gpu later.

EDIT: taking out the 7770 still takes one core on integrated 6550D... I'm probably doing something wrong again...
Which mfakto version are you running? If it is anything before the last GPU-sieve-preview, isn't a core per mfakto instance normal?
Bdot is offline   Reply With Quote
Old 2013-05-03, 17:09   #756
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

216810 Posts
Default

Quote:
Originally Posted by Bdot View Post
Which mfakto version are you running? If it is anything before the last GPU-sieve-preview, isn't a core per mfakto instance normal?
This is the gpu sieve version. one instance uses 99% gpu and one core cpu... I might try 12.2 later.
kracker is offline   Reply With Quote
Old 2013-05-08, 23:09   #757
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3×199 Posts
Default

Quote:
Originally Posted by kracker View Post
This is the gpu sieve version. one instance uses 99% gpu and one core cpu... I might try 12.2 later.
I can now reproduce this and have posted it on the AMD forum for help. In the meantime, I can only recommend to go back to something before 13.4.

I now finished the next beta, 0.13pre4. I did not remove the workaround for the compiler-bug of the older Catalyst versions. Therefore, this one does not require 13.4 - I tested it on 13.1. Apart from taking less CPU, 13.1 was also ~2% faster than 13.4 ...
Bdot is offline   Reply With Quote
Old 2013-05-10, 06:58   #758
Axelsson
 
Jul 2012
Sweden

2·3·7 Posts
Default

The 0.13pre4 is a lot faster on small numbers, I got up to 260 GHzdays/day on numbers just above two millions, even found two new factors that I have reported.

But then I ran the -st2 selftest on the new beta and got 3 failed self tests.

Code:
########## testcase 19172/32927 ##########
Starting trial factoring M597345241 from 2^63 to 2^64 (0.00GHz-days)
Using GPU kernel "cl_barrett15_82_gs"
Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait
May 10 02:21 | 1676   0.1% |  0.012    n.a. |      n.a.    82485    0.00%
no factor for M597345241 from 2^63 to 2^64 [mfakto 0.13pre4-Win cl_barrett15_82_gs_4]
ERROR: selftest failed for M597345241 (cl_barrett15_82_gs)
  no factor found
tf(): total time spent:  0.012s

Starting trial factoring M597345241 from 2^63 to 2^64 (0.00GHz-days)
Using GPU kernel "cl_barrett15_83_gs"
Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait
May 10 02:21 | 1676   0.1% |  0.012    n.a. |      n.a.    82485    0.00%
no factor for M597345241 from 2^63 to 2^64 [mfakto 0.13pre4-Win cl_barrett15_83_gs_4]
ERROR: selftest failed for M597345241 (cl_barrett15_83_gs)
  no factor found
tf(): total time spent:  0.012s

Starting trial factoring M597345241 from 2^63 to 2^64 (0.00GHz-days)
Using GPU kernel "cl_barrett15_88_gs"
Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait
May 10 02:21 | 1676   0.1% |  0.012    n.a. |      n.a.    82485    0.00%
no factor for M597345241 from 2^63 to 2^64 [mfakto 0.13pre4-Win cl_barrett15_88_gs_4]
ERROR: selftest failed for M597345241 (cl_barrett15_88_gs)
  no factor found
tf(): total time spent:  0.012s
GPU : HD6970, windows 7 professional 64 bit

Last fiddled with by Axelsson on 2013-05-10 at 07:00
Axelsson is offline   Reply With Quote
Old 2013-05-10, 07:05   #759
Axelsson
 
Jul 2012
Sweden

4210 Posts
Default

By the way, I'm still on Catalyst 12.10

Code:
Runtime options
  Inifile                   mfakto.ini
  Verbosity                 1
  SieveOnGPU                yes
  GPUSievePrimes            82486
  GPUSieveSize              64Mi bits
  GPUSieveProcessSize       16Ki bits
  WorkFile                  worktodo.txt
  ResultsFile               results.txt
  Checkpoints               enabled
  CheckpointDelay           300s
  Stages                    enabled
  StopAfterFactor           class
  PrintMode                 compact
  V5UserID                  none
  ComputerID                none
  TimeStampInResults        yes
  VectorSize                4
  GPUType                   AUTO
  SmallExp                  no
Compiletime options
  MORE_CLASSES              enabled
Select device - Get device info - Compiling kernels .................

OpenCL device info
  name                      Cayman (Advanced Micro Devices, Inc.)
  device (driver) version   OpenCL 1.2 AMD-APP (1016.4) (1016.4 (VM))
  maximum threads per block 256
  maximum threads per grid  16777216
  number of multiprocessors 24 (1536 compute elements)
  clock rate                880MHz

Automatic parameters
  threads per grid          2097152
  optimizing kernels for    VLIW4
Axelsson is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
gpuOwL: an OpenCL program for Mersenne primality testing preda GpuOwl 2718 2021-07-06 18:30
mfaktc: a CUDA program for Mersenne prefactoring TheJudger GPU Computing 3497 2021-06-05 12:27
LL with OpenCL msft GPU Computing 433 2019-06-23 21:11
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
Program to TF Mersenne numbers with more than 1 sextillion digits? Stargate38 Factoring 24 2011-11-03 00:34

All times are UTC. The time now is 09:50.


Mon Aug 2 09:50:00 UTC 2021 up 10 days, 4:19, 0 users, load averages: 1.20, 1.24, 1.27

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.