mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2015-05-10, 06:44   #1299
vsuite
 
Jan 2010

1628 Posts
Default

mfakto works, albeit slowly, with Ivy Bridge 4790K with HD 4600 graphics.

Sieve on GPU must be disabled or else process crashes driver and mfakto enters non processing loop.

By itself mfakto gives estimated 14.5 GHz Days/Day. When Prime95 runs 3 LL plus 1 ECM thread separately, mfakto does about 7.

Prime95 slows down marginally with mfakto running.
vsuite is offline   Reply With Quote
Old 2015-05-11, 16:18   #1300
emlowe
 
"Earle"
May 2015
California

1 Posts
Default

Seeing that I had a 2013 iMac with the Intel Iris Pro, I though I would try ported the latest version to OSX. I figured the Intel Iris Pro was mostly some variation of the Intel HD graphics.

Porting was fairly easy, although gpusieve.cl won't compile with the every helpful "parse error" thrown. I'm trying to track that down.

Let me know if you want the patches. Porting the C/C++ code pretty much involved using the correct header <OpenCL/OpenCL.h> and adding the library (-framework OpenCL).

The CL code throws off a bunch of warnings about missing prototypes (I guess it's a little pedantic), so I added those.

Turning off GPU sieving and running -st2 results in 14 failed tests:

Code:
no factor for M67094119 from 2^81 to 2^82 [mfakto 0.15pre5 cl_barrett15_82_2]
no factor for M45448679 from 2^81 to 2^82 [mfakto 0.15pre5 cl_barrett15_82_2]
no factor for M30568231 from 2^81 to 2^82 [mfakto 0.15pre5 cl_barrett15_82_2]
no factor for M71065531 from 2^81 to 2^82 [mfakto 0.15pre5 cl_barrett15_82_2]
no factor for M72067427 from 2^82 to 2^83 [mfakto 0.15pre5 cl_barrett15_83_2]
no factor for M71115521 from 2^87 to 2^88 [mfakto 0.15pre5 cl_barrett15_88_2]
no factor for M59000521 from 2^82 to 2^83 [mfakto 0.15pre5 cl_barrett15_83_2]
no factor for M67094119 from 2^81 to 2^82 [mfakto 0.15pre5 cl_barrett15_82_2]
no factor for M45448679 from 2^81 to 2^82 [mfakto 0.15pre5 cl_barrett15_82_2]
no factor for M18275419 from 2^81 to 2^82 [mfakto 0.15pre5 cl_barrett15_82_2]
no factor for M30568231 from 2^81 to 2^82 [mfakto 0.15pre5 cl_barrett15_82_2]
no factor for M71065531 from 2^81 to 2^82 [mfakto 0.15pre5 cl_barrett15_82_2]
no factor for M72067427 from 2^82 to 2^83 [mfakto 0.15pre5 cl_barrett15_83_2]
no factor for M71115521 from 2^87 to 2^88 [mfakto 0.15pre5 cl_barrett15_88_2]


Code:
Runtime options
  Inifile                   mfakto.ini
  Verbosity                 1
  SieveOnGPU                no
  MoreClasses               yes (due to CPU-sieving)
  SievePrimesMin            5000
  SievePrimesMax            200000
  SievePrimes               25000
  SievePrimesAdjust         1
  NumStreams                3
  GridSize                  4
  SieveCPUMask              0
  WorkFile                  worktodo.txt
  ResultsFile               results.txt
  Checkpoints               enabled
  CheckpointDelay           300s
  Stages                    enabled
  StopAfterFactor           class
  PrintMode                 compact
  V5UserID                  none
  ComputerID                none
  TimeStampInResults        yes
  VectorSize                2
  GPUType                   INTEL
  SmallExp                  no
  Additional compile options -I. -DVECTOR_SIZE=2 -DMORE_CLASSES -g 
  UseBinfile                mfakto_Kernels.elf
Compiletime options
  SIEVE_SIZE_LIMIT          36kiB
  SIEVE_SIZE                289731bits
  SIEVE_SPLIT               250
  DEBUG_FACTOR_FIRST        enabled (DEBUG option)

Select device - Get device info:

INFO: Device does not support out-of-order operations. Fallback to in-order queues.

OpenCL device info
  name                      Iris Pro (Intel)
  device (driver) version   OpenCL 1.2  (1.2(Mar 27 2015 01:47:22))
  maximum threads per block 512
  maximum threads per grid  134217728
  number of multiprocessors 40 (40 compute elements)
  clock rate                1150MHz

Automatic parameters
  threads per grid          0
  optimizing kernels for    INTEL
emlowe is offline   Reply With Quote
Old 2015-05-17, 02:14   #1301
vsuite
 
Jan 2010

2·3·19 Posts
Default

mfakto .15pre5 -st and -st2 both want to run 34062 tests. Passed everything so far (>1200 tests)

Intel Haswell 4790K with HD 4600 graphics.
Sieve on GPU disabled.

I now run with SievePrimes=75000 and SievePrimesAdjust=0. Speed is about 18.58 GHzDay/day without Prime95 running, and 8.6 GHzDay/day with Prime95 running a 4 thread LL.

The default SievePrimes=25000 is slower than 50 or 75000. mfakto crashes with 100000.
With SievePrimesAdjust=1, the # of SievePrimes drops steadily during processing, as too the GHzday/day.

Last fiddled with by vsuite on 2015-05-17 at 02:41 Reason: Add information
vsuite is offline   Reply With Quote
Old 2015-05-18, 08:09   #1302
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

10010101012 Posts
Default

Hi everyone,

I'm almost settled in my new home now. Eventually I may come back here and answer one or another question

Like this one: The reason that 0.15 will run the same number of test cases on -st and -st2 modes is, that I changed it Both will run all test cases, but -st will use only the kernel that would be selected during normal runs for this bit level, where -st2 tries each kernel that is covering the required bit level. -st will try to find each factor once, -st2 multiple times using different kernels. Therefore, -st should still be much faster than -st2, but cover a lot more "real" use cases than in 0.14.


Or, I may even try to continue working on the 0.15 release ... the reported selftest failures show that there is still some work to do.
Bdot is offline   Reply With Quote
Old 2015-05-27, 22:25   #1303
Ethan (EO)
 
Ethan (EO)'s Avatar
 
"Ethan O'Connor"
Oct 2002
GIMPS since Jan 1996

9210 Posts
Thumbs up

I finally got around to trying mfakto 0.14 on a Radeon HD6950 I've had sitting on my desk.

No time for thorough benchmarking at the moment, but:

cl_barrett_15_71_gs_4 is producing about 215GHz-d/d for an exponent in the 45.9M range (Stock Clock - 840MHz core)

This puts it around the GTX 460/560/760 class for mfaktx


-ethan
Ethan (EO) is offline   Reply With Quote
Old 2015-05-28, 02:35   #1304
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

25B916 Posts
Default

Quote:
Originally Posted by Ethan (EO) View Post
I finally got around to trying mfakto 0.14 on a Radeon HD6950 I've had sitting on my desk.

No time for thorough benchmarking at the moment, but:

cl_barrett_15_71_gs_4 is producing about 215GHz-d/d for an exponent in the 45.9M range (Stock Clock - 840MHz core)

This puts it around the GTX 460/560/760 class for mfaktx


-ethan
you should immediately report it here (yes, I looked, no result for it!)
LaurV is offline   Reply With Quote
Old 2015-05-28, 10:03   #1305
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3·199 Posts
Default

Quote:
Originally Posted by LaurV View Post
you should immediately report it here (yes, I looked, no result for it!)
It seems like James has thrown out the multipliers for the older cards (pre GCN) - they all appear as if they were not able to run mfakto. 6950/6970 have been on this list before. We have only very few samples so the entry can be rather inaccurate. These cards use a distinct architecture (VLIW4), where I did not really optimize mfakto for. If you run the mfakto 0.15 benchmarking script, then I may be able to see if using another kernel would give better results. I will probably not do more optimizations than that for VLIW4.
Bdot is offline   Reply With Quote
Old 2015-06-05, 21:24   #1306
willmore
 
willmore's Avatar
 
Aug 2002

22×3×5 Posts
Default perftestmfakto.cmd missing

I don't know what I did wrong, but I pulled a .zip from the github and put it on my test machine. I can't find a perftestmfakto.cmd script. So, I ran mfakto with --perftest. Here are the results.

CPU is a Pentium G2020 and the GPU is an HD7770.

If there is more I should do, please let me know. I would be glad to help test.
Attached Files
File Type: txt mfakto.20150605.txt (27.8 KB, 265 views)
willmore is offline   Reply With Quote
Old 2015-06-07, 03:24   #1307
willmore
 
willmore's Avatar
 
Aug 2002

22×3×5 Posts
Default

It failed some selftests as well. I can send you the selftest log. Even xz compressed, it's 2.3 MB. Lots of output there.

I have installed 0.14 and it passed the -st2 just fine. I'm running the performance test on it out of curiousity. I can provide any of those logs if you think it might be of help.

Cheers.
willmore is offline   Reply With Quote
Old 2015-06-08, 19:19   #1308
Ethan (EO)
 
Ethan (EO)'s Avatar
 
"Ethan O'Connor"
Oct 2002
GIMPS since Jan 1996

22·23 Posts
Default

Quote:
Originally Posted by Bdot View Post
It seems like James has thrown out the multipliers for the older cards (pre GCN) - they all appear as if they were not able to run mfakto. 6950/6970 have been on this list before. We have only very few samples so the entry can be rather inaccurate. These cards use a distinct architecture (VLIW4), where I did not really optimize mfakto for. If you run the mfakto 0.15 benchmarking script, then I may be able to see if using another kernel would give better results. I will probably not do more optimizations than that for VLIW4.

I've pulled the card now, but did manage to run the 0.15pre5 benchmarking script first; results attached.
Attached Files
File Type: zip testresults.zip (45.7 KB, 73 views)
Ethan (EO) is offline   Reply With Quote
Old 2015-07-25, 20:54   #1309
glebm
 
"Gleb Mazovetskiy"
Jul 2015
London, UK

22 Posts
Default

The latest source from GitHub, compiled Visual Studio 2013 and App SDK 3.0 Beta running on Tonga (R9 380) fails both selftests (0.14) works.

An interesting warning shows up during the CLtest -- integer conversion resulted in a change of sign:

Code:
.\mfakto.exe --CLtest
mfakto 0.15pre5-Win (64bit build)


Runtime options
  Inifile                   mfakto.ini
  Verbosity                 3
  SieveOnGPU                yes
  MoreClasses               yes
  GPUSievePrimes            81157
  GPUSieveProcessSize       24Ki bits
  GPUSieveSize              96Mi bits
  FlushInterval             0
  WorkFile                  worktodo.txt
  ResultsFile               results.txt
  Checkpoints               enabled
  CheckpointDelay           300s
  Stages                    enabled
  StopAfterFactor           class
  PrintMode                 compact
  V5UserID                  none
  ComputerID                none
  ProgressHeader            "Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait"
  ProgressFormat            "%d %T | %C %p%% | %t  %e |   %g  %s  %W%%"
  TimeStampInResults        yes
  VectorSize                2
  GPUType                   GCN3
  SmallExp                  no
  UseBinfile                mfakto_Kernels.elf
OpenCL Platform 1/1: Advanced Micro Devices, Inc., Version: OpenCL 2.0 AMD-APP (1800.5)
Device 1/1: Tonga (Advanced Micro Devices, Inc.),
device version: OpenCL 2.0 AMD-APP (1800.5), driver version: 1800.5 (VM)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_khr_gl_depth_images cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_spir cl_khr_subgroups cl_khr_gl_event cl_khr_depth_images
Global memory:4294967296, Global memory cache: 16384, local memory: 32768, workgroup size: 256, Work dimensions: 3[256, 256, 256, 0, 0] , Max clock speed:980, compute units:28
Compiling kernels (build options: "-I. -DVECTOR_SIZE=2 -O3 -DMORE_CLASSES -DCL_GPU_SIEVE").
        BUILD OUTPUT
".\common.cl", line 57: warning: OpenCL extension is now part of core
  #pragma  OPENCL EXTENSION cl_khr_fp64 : enable
                            ^

".\gpusieve.cl", line 95: warning: integer conversion resulted in a change of
          sign
                                    1<<16, 1<<17, 1<<18, 1<<19, 1<<20, 1<<21, 1<<22, 1<<23, 1<<24, 1<<25, 1<<26, 1<<27, 1<<28, 1<<29, 1<<30, 1<<31,
                                                                                                                                             ^

"C:\Users\Gleb\AppData\Local\Temp\OCLFC7E.tmp.cl", line 114: warning: variable
          "carry1" was set but never used
    uint_v carry0, carry1;
                   ^


        END OF BUILD OUTPUT
Error 0 (Success): clBuildProgram
..Error -46 (Invalid kernel name): Creating Kernel mfakto_cl_71 from program. (clCreateKernel)
.Error -46 (Invalid kernel name): Creating Kernel mfakto_cl_63 from program. (clCreateKernel)
.Error -46 (Invalid kernel name): Creating Kernel cl_barrett32_79 from program. (clCreateKernel)
.Error -46 (Invalid kernel name): Creating Kernel cl_barrett32_77 from program. (clCreateKernel)
.Error -46 (Invalid kernel name): Creating Kernel cl_barrett32_76 from program. (clCreateKernel)
.Error -46 (Invalid kernel name): Creating Kernel cl_barrett32_92 from program. (clCreateKernel)
.Error -46 (Invalid kernel name): Creating Kernel cl_barrett32_88 from program. (clCreateKernel)
.Error -46 (Invalid kernel name): Creating Kernel cl_barrett32_87 from program. (clCreateKernel)
.....Error -46 (Invalid kernel name): Creating Kernel cl_barrett15_88 from program. (clCreateKernel)
.Error -46 (Invalid kernel name): Creating Kernel cl_barrett15_83 from program. (clCreateKernel)
.Error -46 (Invalid kernel name): Creating Kernel cl_barrett15_82 from program. (clCreateKernel)
..Error -46 (Invalid kernel name): Creating Kernel cl_mg62 from program. (clCreateKernel)
.Error -46 (Invalid kernel name): Creating Kernel cl_mg88 from program. (clCreateKernel)
loop 1:
1 threads: RES (32): <4 x 1> <26 x 0> ... 0 188573 4666430
loop 2:
2 threads: RES (32): 2 <3 x 1> <3 x 2> <23 x 0> ... 0 188573 4666430
loop 3:
3 threads: RES (32): 3 <3 x 1> <3 x 2> <3 x 3> <20 x 0> ... 0 188573 4666430
loop 4:
4 threads: RES (32): 4 <3 x 1> <3 x 2> <3 x 3> <3 x 4> <17 x 0> ... 0 188573 4666430
loop 5:
5 threads: RES (32): 5 <3 x 1> <3 x 2> <3 x 3> <3 x 4> <3 x 5> <14 x 0> ... 0 188573 4666430
loop 6:
6 threads: RES (32): 6 <3 x 1> <3 x 2> <3 x 3> <3 x 4> <3 x 5> <3 x 6> <11 x 0> ... 0 188573 4666430
loop 7:
7 threads: RES (32): 7 <3 x 1> <3 x 2> <3 x 3> <3 x 4> <3 x 5> <3 x 6> <3 x 7> <8 x 0> ... 0 188573 4666430
loop 8:
8 threads: RES (32): 8 <3 x 1> <3 x 2> <3 x 3> <3 x 4> <3 x 5> <3 x 6> <3 x 7> <3 x 8> <5 x 0> ... 0 188573 4666430
loop 9:
9 threads: RES (32): 9 <3 x 1> <3 x 2> <3 x 3> <3 x 4> <3 x 5> <3 x 6> <3 x 7> <3 x 8> <3 x 9> 0 0 ... 0 188573 4666430
loop 10:
10 threads: RES (32): 10 <3 x 1> <3 x 2> <3 x 3> <3 x 4> <3 x 5> <3 x 6> <3 x 7> <3 x 8> <3 x 9> <3 x 10> ... 10 10 4666430
glebm is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
gpuOwL: an OpenCL program for Mersenne primality testing preda GpuOwl 2718 2021-07-06 18:30
mfaktc: a CUDA program for Mersenne prefactoring TheJudger GPU Computing 3497 2021-06-05 12:27
LL with OpenCL msft GPU Computing 433 2019-06-23 21:11
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
Program to TF Mersenne numbers with more than 1 sextillion digits? Stargate38 Factoring 24 2011-11-03 00:34

All times are UTC. The time now is 17:23.


Mon Aug 2 17:23:50 UTC 2021 up 10 days, 11:52, 0 users, load averages: 1.85, 2.18, 2.22

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.