mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfakto: an OpenCL program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=15646)

vsuite 2015-05-10 06:44

mfakto works, albeit slowly, with Ivy Bridge 4790K with HD 4600 graphics.

Sieve on GPU must be disabled or else process crashes driver and mfakto enters non processing loop.

By itself mfakto gives estimated 14.5 GHz Days/Day. When Prime95 runs 3 LL plus 1 ECM thread separately, mfakto does about 7.

Prime95 slows down marginally with mfakto running.

emlowe 2015-05-11 16:18

Seeing that I had a 2013 iMac with the Intel Iris Pro, I though I would try ported the latest version to OSX. I figured the Intel Iris Pro was mostly some variation of the Intel HD graphics.

Porting was fairly easy, although gpusieve.cl won't compile with the every helpful "parse error" thrown. I'm trying to track that down.

Let me know if you want the patches. Porting the C/C++ code pretty much involved using the correct header <OpenCL/OpenCL.h> and adding the library (-framework OpenCL).

The CL code throws off a bunch of warnings about missing prototypes (I guess it's a little pedantic), so I added those.

Turning off GPU sieving and running -st2 results in 14 failed tests:

[CODE]no factor for M67094119 from 2^81 to 2^82 [mfakto 0.15pre5 cl_barrett15_82_2]
no factor for M45448679 from 2^81 to 2^82 [mfakto 0.15pre5 cl_barrett15_82_2]
no factor for M30568231 from 2^81 to 2^82 [mfakto 0.15pre5 cl_barrett15_82_2]
no factor for M71065531 from 2^81 to 2^82 [mfakto 0.15pre5 cl_barrett15_82_2]
no factor for M72067427 from 2^82 to 2^83 [mfakto 0.15pre5 cl_barrett15_83_2]
no factor for M71115521 from 2^87 to 2^88 [mfakto 0.15pre5 cl_barrett15_88_2]
no factor for M59000521 from 2^82 to 2^83 [mfakto 0.15pre5 cl_barrett15_83_2]
no factor for M67094119 from 2^81 to 2^82 [mfakto 0.15pre5 cl_barrett15_82_2]
no factor for M45448679 from 2^81 to 2^82 [mfakto 0.15pre5 cl_barrett15_82_2]
no factor for M18275419 from 2^81 to 2^82 [mfakto 0.15pre5 cl_barrett15_82_2]
no factor for M30568231 from 2^81 to 2^82 [mfakto 0.15pre5 cl_barrett15_82_2]
no factor for M71065531 from 2^81 to 2^82 [mfakto 0.15pre5 cl_barrett15_82_2]
no factor for M72067427 from 2^82 to 2^83 [mfakto 0.15pre5 cl_barrett15_83_2]
no factor for M71115521 from 2^87 to 2^88 [mfakto 0.15pre5 cl_barrett15_88_2][/CODE]



[CODE]Runtime options
Inifile mfakto.ini
Verbosity 1
SieveOnGPU no
MoreClasses yes (due to CPU-sieving)
SievePrimesMin 5000
SievePrimesMax 200000
SievePrimes 25000
SievePrimesAdjust 1
NumStreams 3
GridSize 4
SieveCPUMask 0
WorkFile worktodo.txt
ResultsFile results.txt
Checkpoints enabled
CheckpointDelay 300s
Stages enabled
StopAfterFactor class
PrintMode compact
V5UserID none
ComputerID none
TimeStampInResults yes
VectorSize 2
GPUType INTEL
SmallExp no
Additional compile options -I. -DVECTOR_SIZE=2 -DMORE_CLASSES -g
UseBinfile mfakto_Kernels.elf
Compiletime options
SIEVE_SIZE_LIMIT 36kiB
SIEVE_SIZE 289731bits
SIEVE_SPLIT 250
DEBUG_FACTOR_FIRST enabled (DEBUG option)

Select device - Get device info:

INFO: Device does not support out-of-order operations. Fallback to in-order queues.

OpenCL device info
name Iris Pro (Intel)
device (driver) version OpenCL 1.2 (1.2(Mar 27 2015 01:47:22))
maximum threads per block 512
maximum threads per grid 134217728
number of multiprocessors 40 (40 compute elements)
clock rate 1150MHz

Automatic parameters
threads per grid 0
optimizing kernels for INTEL
[/CODE]

vsuite 2015-05-17 02:14

mfakto .15pre5 -st and -st2 both want to run 34062 tests. Passed everything so far (>1200 tests)

Intel Haswell 4790K with HD 4600 graphics.
Sieve on GPU disabled.

I now run with SievePrimes=75000 and SievePrimesAdjust=0. Speed is about 18.58 GHzDay/day without Prime95 running, and 8.6 GHzDay/day with Prime95 running a 4 thread LL.

The default SievePrimes=25000 is slower than 50 or 75000. mfakto crashes with 100000.
With SievePrimesAdjust=1, the # of SievePrimes drops steadily during processing, as too the GHzday/day.

Bdot 2015-05-18 08:09

Hi everyone,

I'm almost settled in my new home now. Eventually I may come back here and answer one or another question :smile:

Like this one: The reason that 0.15 will run the same number of test cases on -st and -st2 modes is, that I changed it :wink: Both will run all test cases, but -st will use only the kernel that would be selected during normal runs for this bit level, where -st2 tries each kernel that is covering the required bit level. -st will try to find each factor once, -st2 multiple times using different kernels. Therefore, -st should still be much faster than -st2, but cover a lot more "real" use cases than in 0.14.


Or, I may even try to continue working on the 0.15 release ... the reported selftest failures show that there is still some work to do.

Ethan (EO) 2015-05-27 22:25

I finally got around to trying mfakto 0.14 on a Radeon HD6950 I've had sitting on my desk.

No time for thorough benchmarking at the moment, but:

cl_barrett_15_71_gs_4 is producing about 215GHz-d/d for an exponent in the 45.9M range (Stock Clock - 840MHz core)

This puts it around the GTX 460/560/760 class for mfaktx


-ethan

LaurV 2015-05-28 02:35

[QUOTE=Ethan (EO);403096]I finally got around to trying mfakto 0.14 on a Radeon HD6950 I've had sitting on my desk.

No time for thorough benchmarking at the moment, but:

cl_barrett_15_71_gs_4 is producing about 215GHz-d/d for an exponent in the 45.9M range (Stock Clock - 840MHz core)

This puts it around the GTX 460/560/760 class for mfaktx


-ethan[/QUOTE]
you should immediately :smile: [URL="http://www.mersenne.ca/mfaktc.php?sort=ghdpd"]report it here[/URL] (yes, I looked, no result for it!)

Bdot 2015-05-28 10:03

[QUOTE=LaurV;403113]you should immediately :smile: [URL="http://www.mersenne.ca/mfaktc.php?sort=ghdpd"]report it here[/URL] (yes, I looked, no result for it!)[/QUOTE]

It seems like James has thrown out the multipliers for the older cards (pre GCN) - they all appear as if they were not able to run mfakto. 6950/6970 have been on this list before. We have only very few samples so the entry can be rather inaccurate. These cards use a distinct architecture (VLIW4), where I did not really optimize mfakto for. If you run the mfakto 0.15 benchmarking script, then I may be able to see if using another kernel would give better results. I will probably not do more optimizations than that for VLIW4.

willmore 2015-06-05 21:24

perftestmfakto.cmd missing
 
1 Attachment(s)
I don't know what I did wrong, but I pulled a .zip from the github and put it on my test machine. I can't find a perftestmfakto.cmd script. So, I ran mfakto with --perftest. Here are the results.

CPU is a Pentium G2020 and the GPU is an HD7770.

If there is more I should do, please let me know. I would be glad to help test.

willmore 2015-06-07 03:24

It failed some selftests as well. I can send you the selftest log. Even xz compressed, it's 2.3 MB. Lots of output there.

I have installed 0.14 and it passed the -st2 just fine. I'm running the performance test on it out of curiousity. I can provide any of those logs if you think it might be of help.

Cheers.

Ethan (EO) 2015-06-08 19:19

1 Attachment(s)
[QUOTE=Bdot;403127]It seems like James has thrown out the multipliers for the older cards (pre GCN) - they all appear as if they were not able to run mfakto. 6950/6970 have been on this list before. We have only very few samples so the entry can be rather inaccurate. These cards use a distinct architecture (VLIW4), where I did not really optimize mfakto for. If you run the mfakto 0.15 benchmarking script, then I may be able to see if using another kernel would give better results. I will probably not do more optimizations than that for VLIW4.[/QUOTE]


I've pulled the card now, but did manage to run the 0.15pre5 benchmarking script first; results attached.

glebm 2015-07-25 20:54

The latest source from GitHub, compiled Visual Studio 2013 and App SDK 3.0 Beta running on Tonga (R9 380) fails both selftests (0.14) works.

An interesting warning shows up during the CLtest -- [I]integer conversion resulted in a change of sign[/I]:

[CODE].\mfakto.exe --CLtest
mfakto 0.15pre5-Win (64bit build)


Runtime options
Inifile mfakto.ini
Verbosity 3
SieveOnGPU yes
MoreClasses yes
GPUSievePrimes 81157
GPUSieveProcessSize 24Ki bits
GPUSieveSize 96Mi bits
FlushInterval 0
WorkFile worktodo.txt
ResultsFile results.txt
Checkpoints enabled
CheckpointDelay 300s
Stages enabled
StopAfterFactor class
PrintMode compact
V5UserID none
ComputerID none
ProgressHeader "Date Time | class Pct | time ETA | GHz-d/day Sieve Wait"
ProgressFormat "%d %T | %C %p%% | %t %e | %g %s %W%%"
TimeStampInResults yes
VectorSize 2
GPUType GCN3
SmallExp no
UseBinfile mfakto_Kernels.elf
OpenCL Platform 1/1: Advanced Micro Devices, Inc., Version: OpenCL 2.0 AMD-APP (1800.5)
Device 1/1: Tonga (Advanced Micro Devices, Inc.),
device version: OpenCL 2.0 AMD-APP (1800.5), driver version: 1800.5 (VM)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_khr_gl_depth_images cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_spir cl_khr_subgroups cl_khr_gl_event cl_khr_depth_images
Global memory:4294967296, Global memory cache: 16384, local memory: 32768, workgroup size: 256, Work dimensions: 3[256, 256, 256, 0, 0] , Max clock speed:980, compute units:28
Compiling kernels (build options: "-I. -DVECTOR_SIZE=2 -O3 -DMORE_CLASSES -DCL_GPU_SIEVE").
BUILD OUTPUT
".\common.cl", line 57: warning: OpenCL extension is now part of core
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
^

".\gpusieve.cl", line 95: warning: integer conversion resulted in a change of
sign
1<<16, 1<<17, 1<<18, 1<<19, 1<<20, 1<<21, 1<<22, 1<<23, 1<<24, 1<<25, 1<<26, 1<<27, 1<<28, 1<<29, 1<<30, 1<<31,
^

"C:\Users\Gleb\AppData\Local\Temp\OCLFC7E.tmp.cl", line 114: warning: variable
"carry1" was set but never used
uint_v carry0, carry1;
^


END OF BUILD OUTPUT
Error 0 (Success): clBuildProgram
..Error -46 (Invalid kernel name): Creating Kernel mfakto_cl_71 from program. (clCreateKernel)
.Error -46 (Invalid kernel name): Creating Kernel mfakto_cl_63 from program. (clCreateKernel)
.Error -46 (Invalid kernel name): Creating Kernel cl_barrett32_79 from program. (clCreateKernel)
.Error -46 (Invalid kernel name): Creating Kernel cl_barrett32_77 from program. (clCreateKernel)
.Error -46 (Invalid kernel name): Creating Kernel cl_barrett32_76 from program. (clCreateKernel)
.Error -46 (Invalid kernel name): Creating Kernel cl_barrett32_92 from program. (clCreateKernel)
.Error -46 (Invalid kernel name): Creating Kernel cl_barrett32_88 from program. (clCreateKernel)
.Error -46 (Invalid kernel name): Creating Kernel cl_barrett32_87 from program. (clCreateKernel)
.....Error -46 (Invalid kernel name): Creating Kernel cl_barrett15_88 from program. (clCreateKernel)
.Error -46 (Invalid kernel name): Creating Kernel cl_barrett15_83 from program. (clCreateKernel)
.Error -46 (Invalid kernel name): Creating Kernel cl_barrett15_82 from program. (clCreateKernel)
..Error -46 (Invalid kernel name): Creating Kernel cl_mg62 from program. (clCreateKernel)
.Error -46 (Invalid kernel name): Creating Kernel cl_mg88 from program. (clCreateKernel)
loop 1:
1 threads: RES (32): <4 x 1> <26 x 0> ... 0 188573 4666430
loop 2:
2 threads: RES (32): 2 <3 x 1> <3 x 2> <23 x 0> ... 0 188573 4666430
loop 3:
3 threads: RES (32): 3 <3 x 1> <3 x 2> <3 x 3> <20 x 0> ... 0 188573 4666430
loop 4:
4 threads: RES (32): 4 <3 x 1> <3 x 2> <3 x 3> <3 x 4> <17 x 0> ... 0 188573 4666430
loop 5:
5 threads: RES (32): 5 <3 x 1> <3 x 2> <3 x 3> <3 x 4> <3 x 5> <14 x 0> ... 0 188573 4666430
loop 6:
6 threads: RES (32): 6 <3 x 1> <3 x 2> <3 x 3> <3 x 4> <3 x 5> <3 x 6> <11 x 0> ... 0 188573 4666430
loop 7:
7 threads: RES (32): 7 <3 x 1> <3 x 2> <3 x 3> <3 x 4> <3 x 5> <3 x 6> <3 x 7> <8 x 0> ... 0 188573 4666430
loop 8:
8 threads: RES (32): 8 <3 x 1> <3 x 2> <3 x 3> <3 x 4> <3 x 5> <3 x 6> <3 x 7> <3 x 8> <5 x 0> ... 0 188573 4666430
loop 9:
9 threads: RES (32): 9 <3 x 1> <3 x 2> <3 x 3> <3 x 4> <3 x 5> <3 x 6> <3 x 7> <3 x 8> <3 x 9> 0 0 ... 0 188573 4666430
loop 10:
10 threads: RES (32): 10 <3 x 1> <3 x 2> <3 x 3> <3 x 4> <3 x 5> <3 x 6> <3 x 7> <3 x 8> <3 x 9> <3 x 10> ... 10 10 4666430
[/CODE]


All times are UTC. The time now is 22:59.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.