![]() |
|
|
#1299 |
|
Jan 2010
2·3·19 Posts |
mfakto works, albeit slowly, with Ivy Bridge 4790K with HD 4600 graphics.
Sieve on GPU must be disabled or else process crashes driver and mfakto enters non processing loop. By itself mfakto gives estimated 14.5 GHz Days/Day. When Prime95 runs 3 LL plus 1 ECM thread separately, mfakto does about 7. Prime95 slows down marginally with mfakto running. |
|
|
|
|
|
#1300 |
|
"Earle"
May 2015
California
1 Posts |
Seeing that I had a 2013 iMac with the Intel Iris Pro, I though I would try ported the latest version to OSX. I figured the Intel Iris Pro was mostly some variation of the Intel HD graphics.
Porting was fairly easy, although gpusieve.cl won't compile with the every helpful "parse error" thrown. I'm trying to track that down. Let me know if you want the patches. Porting the C/C++ code pretty much involved using the correct header <OpenCL/OpenCL.h> and adding the library (-framework OpenCL). The CL code throws off a bunch of warnings about missing prototypes (I guess it's a little pedantic), so I added those. Turning off GPU sieving and running -st2 results in 14 failed tests: Code:
no factor for M67094119 from 2^81 to 2^82 [mfakto 0.15pre5 cl_barrett15_82_2] no factor for M45448679 from 2^81 to 2^82 [mfakto 0.15pre5 cl_barrett15_82_2] no factor for M30568231 from 2^81 to 2^82 [mfakto 0.15pre5 cl_barrett15_82_2] no factor for M71065531 from 2^81 to 2^82 [mfakto 0.15pre5 cl_barrett15_82_2] no factor for M72067427 from 2^82 to 2^83 [mfakto 0.15pre5 cl_barrett15_83_2] no factor for M71115521 from 2^87 to 2^88 [mfakto 0.15pre5 cl_barrett15_88_2] no factor for M59000521 from 2^82 to 2^83 [mfakto 0.15pre5 cl_barrett15_83_2] no factor for M67094119 from 2^81 to 2^82 [mfakto 0.15pre5 cl_barrett15_82_2] no factor for M45448679 from 2^81 to 2^82 [mfakto 0.15pre5 cl_barrett15_82_2] no factor for M18275419 from 2^81 to 2^82 [mfakto 0.15pre5 cl_barrett15_82_2] no factor for M30568231 from 2^81 to 2^82 [mfakto 0.15pre5 cl_barrett15_82_2] no factor for M71065531 from 2^81 to 2^82 [mfakto 0.15pre5 cl_barrett15_82_2] no factor for M72067427 from 2^82 to 2^83 [mfakto 0.15pre5 cl_barrett15_83_2] no factor for M71115521 from 2^87 to 2^88 [mfakto 0.15pre5 cl_barrett15_88_2] Code:
Runtime options Inifile mfakto.ini Verbosity 1 SieveOnGPU no MoreClasses yes (due to CPU-sieving) SievePrimesMin 5000 SievePrimesMax 200000 SievePrimes 25000 SievePrimesAdjust 1 NumStreams 3 GridSize 4 SieveCPUMask 0 WorkFile worktodo.txt ResultsFile results.txt Checkpoints enabled CheckpointDelay 300s Stages enabled StopAfterFactor class PrintMode compact V5UserID none ComputerID none TimeStampInResults yes VectorSize 2 GPUType INTEL SmallExp no Additional compile options -I. -DVECTOR_SIZE=2 -DMORE_CLASSES -g UseBinfile mfakto_Kernels.elf Compiletime options SIEVE_SIZE_LIMIT 36kiB SIEVE_SIZE 289731bits SIEVE_SPLIT 250 DEBUG_FACTOR_FIRST enabled (DEBUG option) Select device - Get device info: INFO: Device does not support out-of-order operations. Fallback to in-order queues. OpenCL device info name Iris Pro (Intel) device (driver) version OpenCL 1.2 (1.2(Mar 27 2015 01:47:22)) maximum threads per block 512 maximum threads per grid 134217728 number of multiprocessors 40 (40 compute elements) clock rate 1150MHz Automatic parameters threads per grid 0 optimizing kernels for INTEL |
|
|
|
|
|
#1301 |
|
Jan 2010
1628 Posts |
mfakto .15pre5 -st and -st2 both want to run 34062 tests. Passed everything so far (>1200 tests)
Intel Haswell 4790K with HD 4600 graphics. Sieve on GPU disabled. I now run with SievePrimes=75000 and SievePrimesAdjust=0. Speed is about 18.58 GHzDay/day without Prime95 running, and 8.6 GHzDay/day with Prime95 running a 4 thread LL. The default SievePrimes=25000 is slower than 50 or 75000. mfakto crashes with 100000. With SievePrimesAdjust=1, the # of SievePrimes drops steadily during processing, as too the GHzday/day. Last fiddled with by vsuite on 2015-05-17 at 02:41 Reason: Add information |
|
|
|
|
|
#1302 |
|
Nov 2010
Germany
10010101012 Posts |
Hi everyone,
I'm almost settled in my new home now. Eventually I may come back here and answer one or another question ![]() Like this one: The reason that 0.15 will run the same number of test cases on -st and -st2 modes is, that I changed it Both will run all test cases, but -st will use only the kernel that would be selected during normal runs for this bit level, where -st2 tries each kernel that is covering the required bit level. -st will try to find each factor once, -st2 multiple times using different kernels. Therefore, -st should still be much faster than -st2, but cover a lot more "real" use cases than in 0.14.Or, I may even try to continue working on the 0.15 release ... the reported selftest failures show that there is still some work to do. |
|
|
|
|
|
#1303 |
|
"Ethan O'Connor"
Oct 2002
GIMPS since Jan 1996
22×23 Posts |
I finally got around to trying mfakto 0.14 on a Radeon HD6950 I've had sitting on my desk.
No time for thorough benchmarking at the moment, but: cl_barrett_15_71_gs_4 is producing about 215GHz-d/d for an exponent in the 45.9M range (Stock Clock - 840MHz core) This puts it around the GTX 460/560/760 class for mfaktx -ethan |
|
|
|
|
|
#1304 | |
|
Romulan Interpreter
Jun 2011
Thailand
32×29×37 Posts |
Quote:
report it here (yes, I looked, no result for it!)
|
|
|
|
|
|
|
#1305 | |
|
Nov 2010
Germany
11258 Posts |
Quote:
|
|
|
|
|
|
|
#1306 |
|
Aug 2002
748 Posts |
I don't know what I did wrong, but I pulled a .zip from the github and put it on my test machine. I can't find a perftestmfakto.cmd script. So, I ran mfakto with --perftest. Here are the results.
CPU is a Pentium G2020 and the GPU is an HD7770. If there is more I should do, please let me know. I would be glad to help test. |
|
|
|
|
|
#1307 |
|
Aug 2002
1111002 Posts |
It failed some selftests as well. I can send you the selftest log. Even xz compressed, it's 2.3 MB. Lots of output there.
I have installed 0.14 and it passed the -st2 just fine. I'm running the performance test on it out of curiousity. I can provide any of those logs if you think it might be of help. Cheers. |
|
|
|
|
|
#1308 | |
|
"Ethan O'Connor"
Oct 2002
GIMPS since Jan 1996
22×23 Posts |
Quote:
I've pulled the card now, but did manage to run the 0.15pre5 benchmarking script first; results attached. |
|
|
|
|
|
|
#1309 |
|
"Gleb Mazovetskiy"
Jul 2015
London, UK
22 Posts |
The latest source from GitHub, compiled Visual Studio 2013 and App SDK 3.0 Beta running on Tonga (R9 380) fails both selftests (0.14) works.
An interesting warning shows up during the CLtest -- integer conversion resulted in a change of sign: Code:
.\mfakto.exe --CLtest
mfakto 0.15pre5-Win (64bit build)
Runtime options
Inifile mfakto.ini
Verbosity 3
SieveOnGPU yes
MoreClasses yes
GPUSievePrimes 81157
GPUSieveProcessSize 24Ki bits
GPUSieveSize 96Mi bits
FlushInterval 0
WorkFile worktodo.txt
ResultsFile results.txt
Checkpoints enabled
CheckpointDelay 300s
Stages enabled
StopAfterFactor class
PrintMode compact
V5UserID none
ComputerID none
ProgressHeader "Date Time | class Pct | time ETA | GHz-d/day Sieve Wait"
ProgressFormat "%d %T | %C %p%% | %t %e | %g %s %W%%"
TimeStampInResults yes
VectorSize 2
GPUType GCN3
SmallExp no
UseBinfile mfakto_Kernels.elf
OpenCL Platform 1/1: Advanced Micro Devices, Inc., Version: OpenCL 2.0 AMD-APP (1800.5)
Device 1/1: Tonga (Advanced Micro Devices, Inc.),
device version: OpenCL 2.0 AMD-APP (1800.5), driver version: 1800.5 (VM)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_khr_gl_depth_images cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_spir cl_khr_subgroups cl_khr_gl_event cl_khr_depth_images
Global memory:4294967296, Global memory cache: 16384, local memory: 32768, workgroup size: 256, Work dimensions: 3[256, 256, 256, 0, 0] , Max clock speed:980, compute units:28
Compiling kernels (build options: "-I. -DVECTOR_SIZE=2 -O3 -DMORE_CLASSES -DCL_GPU_SIEVE").
BUILD OUTPUT
".\common.cl", line 57: warning: OpenCL extension is now part of core
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
^
".\gpusieve.cl", line 95: warning: integer conversion resulted in a change of
sign
1<<16, 1<<17, 1<<18, 1<<19, 1<<20, 1<<21, 1<<22, 1<<23, 1<<24, 1<<25, 1<<26, 1<<27, 1<<28, 1<<29, 1<<30, 1<<31,
^
"C:\Users\Gleb\AppData\Local\Temp\OCLFC7E.tmp.cl", line 114: warning: variable
"carry1" was set but never used
uint_v carry0, carry1;
^
END OF BUILD OUTPUT
Error 0 (Success): clBuildProgram
..Error -46 (Invalid kernel name): Creating Kernel mfakto_cl_71 from program. (clCreateKernel)
.Error -46 (Invalid kernel name): Creating Kernel mfakto_cl_63 from program. (clCreateKernel)
.Error -46 (Invalid kernel name): Creating Kernel cl_barrett32_79 from program. (clCreateKernel)
.Error -46 (Invalid kernel name): Creating Kernel cl_barrett32_77 from program. (clCreateKernel)
.Error -46 (Invalid kernel name): Creating Kernel cl_barrett32_76 from program. (clCreateKernel)
.Error -46 (Invalid kernel name): Creating Kernel cl_barrett32_92 from program. (clCreateKernel)
.Error -46 (Invalid kernel name): Creating Kernel cl_barrett32_88 from program. (clCreateKernel)
.Error -46 (Invalid kernel name): Creating Kernel cl_barrett32_87 from program. (clCreateKernel)
.....Error -46 (Invalid kernel name): Creating Kernel cl_barrett15_88 from program. (clCreateKernel)
.Error -46 (Invalid kernel name): Creating Kernel cl_barrett15_83 from program. (clCreateKernel)
.Error -46 (Invalid kernel name): Creating Kernel cl_barrett15_82 from program. (clCreateKernel)
..Error -46 (Invalid kernel name): Creating Kernel cl_mg62 from program. (clCreateKernel)
.Error -46 (Invalid kernel name): Creating Kernel cl_mg88 from program. (clCreateKernel)
loop 1:
1 threads: RES (32): <4 x 1> <26 x 0> ... 0 188573 4666430
loop 2:
2 threads: RES (32): 2 <3 x 1> <3 x 2> <23 x 0> ... 0 188573 4666430
loop 3:
3 threads: RES (32): 3 <3 x 1> <3 x 2> <3 x 3> <20 x 0> ... 0 188573 4666430
loop 4:
4 threads: RES (32): 4 <3 x 1> <3 x 2> <3 x 3> <3 x 4> <17 x 0> ... 0 188573 4666430
loop 5:
5 threads: RES (32): 5 <3 x 1> <3 x 2> <3 x 3> <3 x 4> <3 x 5> <14 x 0> ... 0 188573 4666430
loop 6:
6 threads: RES (32): 6 <3 x 1> <3 x 2> <3 x 3> <3 x 4> <3 x 5> <3 x 6> <11 x 0> ... 0 188573 4666430
loop 7:
7 threads: RES (32): 7 <3 x 1> <3 x 2> <3 x 3> <3 x 4> <3 x 5> <3 x 6> <3 x 7> <8 x 0> ... 0 188573 4666430
loop 8:
8 threads: RES (32): 8 <3 x 1> <3 x 2> <3 x 3> <3 x 4> <3 x 5> <3 x 6> <3 x 7> <3 x 8> <5 x 0> ... 0 188573 4666430
loop 9:
9 threads: RES (32): 9 <3 x 1> <3 x 2> <3 x 3> <3 x 4> <3 x 5> <3 x 6> <3 x 7> <3 x 8> <3 x 9> 0 0 ... 0 188573 4666430
loop 10:
10 threads: RES (32): 10 <3 x 1> <3 x 2> <3 x 3> <3 x 4> <3 x 5> <3 x 6> <3 x 7> <3 x 8> <3 x 9> <3 x 10> ... 10 10 4666430
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| gpuOwL: an OpenCL program for Mersenne primality testing | preda | GpuOwl | 2718 | 2021-07-06 18:30 |
| mfaktc: a CUDA program for Mersenne prefactoring | TheJudger | GPU Computing | 3497 | 2021-06-05 12:27 |
| LL with OpenCL | msft | GPU Computing | 433 | 2019-06-23 21:11 |
| OpenCL for FPGAs | TObject | GPU Computing | 2 | 2013-10-12 21:09 |
| Program to TF Mersenne numbers with more than 1 sextillion digits? | Stargate38 | Factoring | 24 | 2011-11-03 00:34 |