mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfakto: an OpenCL program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=15646)

M344587487 2019-01-15 22:24

Anyone had any luck running mfakto on the Mali GPUs found in ARM devices or am I barking up the wrong tree? After removing x86-specific flag -m64 from the Makefile and pointing to the right include and lib directories mfakto compiled but failed to compile OpenCL kernels at runtime:

Compilation:[code]gcc -Wall -O3 -funroll-loops -ffast-math -finline-functions -frerun-loop-opt -fgcse-sm -fgcse-las -flto -I/usr/rk3399-libs/include -DBUILD_OPENCL -funroll-all-loops -funsafe-loop-optimizations -fira-region=all -fsched-spec-load -fsched-stalled-insns=10 -fsched-stalled-insns-dep=10 -fno-align-labels -c sieve.c -o sieve.o
gcc -Wall -O3 -funroll-loops -ffast-math -finline-functions -frerun-loop-opt -fgcse-sm -fgcse-las -flto -I/usr/rk3399-libs/include -DBUILD_OPENCL -c timer.c -o timer.o
gcc -Wall -O3 -funroll-loops -ffast-math -finline-functions -frerun-loop-opt -fgcse-sm -fgcse-las -flto -I/usr/rk3399-libs/include -DBUILD_OPENCL -c parse.c -o parse.o
gcc -Wall -O3 -funroll-loops -ffast-math -finline-functions -frerun-loop-opt -fgcse-sm -fgcse-las -flto -I/usr/rk3399-libs/include -DBUILD_OPENCL -c read_config.c -o read_config.o
gcc -Wall -O3 -funroll-loops -ffast-math -finline-functions -frerun-loop-opt -fgcse-sm -fgcse-las -flto -I/usr/rk3399-libs/include -DBUILD_OPENCL -c mfaktc.c -o mfaktc.o
gcc -Wall -O3 -funroll-loops -ffast-math -finline-functions -frerun-loop-opt -fgcse-sm -fgcse-las -flto -I/usr/rk3399-libs/include -DBUILD_OPENCL -c checkpoint.c -o checkpoint.o
gcc -Wall -O3 -funroll-loops -ffast-math -finline-functions -frerun-loop-opt -fgcse-sm -fgcse-las -flto -I/usr/rk3399-libs/include -DBUILD_OPENCL -c signal_handler.c -o signal_handler.o
gcc -Wall -O3 -funroll-loops -ffast-math -finline-functions -frerun-loop-opt -fgcse-sm -fgcse-las -flto -I/usr/rk3399-libs/include -DBUILD_OPENCL -c filelocking.c -o filelocking.o
gcc -Wall -O3 -funroll-loops -ffast-math -finline-functions -frerun-loop-opt -fgcse-sm -fgcse-las -flto -I/usr/rk3399-libs/include -DBUILD_OPENCL -c output.c -o output.o
gcc -Wall -O3 -funroll-loops -ffast-math -finline-functions -frerun-loop-opt -fgcse-sm -fgcse-las -flto -I/usr/rk3399-libs/include -DBUILD_OPENCL -c mfakto.cpp -o mfakto.o
gcc -Wall -O3 -funroll-loops -ffast-math -finline-functions -frerun-loop-opt -fgcse-sm -fgcse-las -flto -I/usr/rk3399-libs/include -DBUILD_OPENCL -c gpusieve.cpp -o gpusieve.o
gcc -Wall -O3 -funroll-loops -ffast-math -finline-functions -frerun-loop-opt -fgcse-sm -fgcse-las -flto -I/usr/rk3399-libs/include -DBUILD_OPENCL -c perftest.cpp -o perftest.o
perftest.cpp:1737:18: warning: invalid suffix on literal; C++11 requires a space between literal and string macro [-Wliteral-suffix]
std::cerr << "\nKernel file \""KERNEL_FILE"\" not found, it needs to be in the same directory as the executable.\n";
^
perftest.cpp: In function ‘GPUKernels test_cpu_tf_kernels(cl_uint)’:
perftest.cpp:1018:88: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 3 has type ‘cl_ulong {aka long unsigned int}’ [-Wformat=]
mystuff.exponent, num_fcs >> 20, ((cl_ulong)num_loops*mystuff.threads_per_grid)>>20);
~~~~~~~~~~~~~ ^
perftest.cpp:1018:88: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 4 has type ‘cl_ulong {aka long unsigned int}’ [-Wformat=]
perftest.cpp:1021:86: warning: format ‘%llu’ expects argument of type ‘long long unsigned int’, but argument 2 has type ‘cl_ulong {aka long unsigned int}’ [-Wformat=]
printf("k=%llu, %f GHz-days (assignment), %f GHz-days (per test): ", k, ghzd, ghzdt); fflush(stdout);
^
perftest.cpp: In function ‘GPUKernels test_gpu_tf_kernels(cl_uint)’:
perftest.cpp:1153:72: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 3 has type ‘cl_ulong {aka long unsigned int}’ [-Wformat=]
printf("exponent=%u, %lldM FCs each, ", mystuff.exponent, num_fcs>>20);
~~~~~~~~~~~^
perftest.cpp:1156:86: warning: format ‘%llu’ expects argument of type ‘long long unsigned int’, but argument 2 has type ‘cl_ulong {aka long unsigned int}’ [-Wformat=]
printf("k=%llu, %f GHz-days (assignment), %f GHz-days (per test): ", k, ghzd, ghzdt); fflush(stdout);
^
perftest.cpp: In function ‘void CL_test(cl_int)’:
perftest.cpp:1767:3: warning: this ‘if’ clause does not guard... [-Wmisleading-indentation]
if (mystuff.CompileOptions[0]) // if mfakto.ini defined compile options, override the default with them
^~
perftest.cpp:1770:5: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the ‘if’
printf("Compiling kernels (build options: \"%s\").", program_options);
^~~~~~
gcc -Wall -O3 -funroll-loops -ffast-math -finline-functions -frerun-loop-opt -fgcse-sm -fgcse-las -flto -I/usr/rk3399-libs/include -DBUILD_OPENCL -c menu.cpp -o menu.o
menu.cpp: In function ‘void handle_menu(mystuff_t*)’:
menu.cpp:204:10: warning: ignoring return value of ‘char* fgets(char*, int, FILE*)’, declared with attribute warn_unused_result [-Wunused-result]
fgets(choice_string, 9, stdin); // std:cin does not allow empty input
~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
gcc -Wall -O3 -funroll-loops -ffast-math -finline-functions -frerun-loop-opt -fgcse-sm -fgcse-las -flto -I/usr/rk3399-libs/include -DBUILD_OPENCL -c kbhit.cpp -o kbhit.o
kbhit.cpp: In member function ‘int keyboard::getch()’:
kbhit.cpp:55:16: warning: ignoring return value of ‘ssize_t read(int, void*, size_t)’, declared with attribute warn_unused_result [-Wunused-result]
} else read(0,&ch,1);
~~~~^~~~~~~~~
g++ sieve.o timer.o parse.o read_config.o mfaktc.o checkpoint.o signal_handler.o filelocking.o output.o mfakto.o gpusieve.o perftest.o menu.o kbhit.o -O3 -funroll-loops -ffast-math -finline-functions -frerun-loop-opt -fgcse-sm -fgcse-las -flto -L/usr/rk3399-libs/lib64 -lOpenCL -o ../mfakto[/code]Runtime:[code]mfakto 0.15pre6 (64bit build)


Runtime options
Inifile mfakto.ini
Verbosity 1
SieveOnGPU yes
MoreClasses yes
GPUSievePrimes 81157
GPUSieveProcessSize 24Ki bits
GPUSieveSize 96Mi bits
FlushInterval 0
WorkFile worktodo.txt
ResultsFile results.txt
Checkpoints enabled
CheckpointDelay 300s
Stages enabled
StopAfterFactor class
PrintMode compact
V5UserID none
ComputerID none
TimeStampInResults yes
VectorSize 2
GPUType AUTO
SmallExp no
UseBinfile mfakto_Kernels.elf
Compiletime options

Select device - Get device info:
WARNING: Unknown GPU name, assuming GCN. Please post the device name "Mali-T860 (ARM)" to http://www.mersenneforum.org/showthread.php?t=15646 to have it added to mfakto. Set GPUType in mfakto.ini to select a GPU type yourself to avoid this warning.

OpenCL device info
name Mali-T860 (ARM)
device (driver) version OpenCL 1.2 v1.r13p0-00rel0-git(a4271c9).31ba04af2d3c01618138bef3aed66c2c (1.2)
maximum threads per block 256
maximum threads per grid 16777216
number of multiprocessors 4 (256 compute elements)
clock rate 200MHz

Automatic parameters
threads per grid 0
optimizing kernels for GCN

Compiling kernels.

BUILD OUTPUT
In file included from <source>:84:
./barrett15.cl:45:41: error: Ternary operator argument types do not match
tmp.d0 = (tmp.d4 > a.d4) ? a.d0 : tmp.d0;
~~~~^~

./barrett15.cl:46:41: error: Ternary operator argument types do not match
tmp.d1 = (tmp.d4 > a.d4) ? a.d1 : tmp.d1;
~~~~^~

./barrett15.cl:47:41: error: Ternary operator argument types do not match
tmp.d2 = (tmp.d4 > a.d4) ? a.d2 : tmp.d2;
~~~~^~

./barrett15.cl:48:41: error: Ternary operator argument types do not match
tmp.d3 = (tmp.d4 > a.d4) ? a.d3 : tmp.d3;
~~~~^~

./barrett15.cl:49:41: error: Ternary operator argument types do not match
tmp.d4 = (tmp.d4 > a.d4) ? a.d4 : tmp.d4; // & 0x7FFF not necessary as tmp.d4 is <= a.d4
~~~~^~

./barrett15.cl:1993:41: error: Ternary operator argument types do not match
tmp.d0 = (tmp.d5 > a.d5) ? a.d0 : tmp.d0;
~~~~^~

./barrett15.cl:1994:41: error: Ternary operator argument types do not match
tmp.d1 = (tmp.d5 > a.d5) ? a.d1 : tmp.d1;
~~~~^~

./barrett15.cl:1995:41: error: Ternary operator argument types do not match
tmp.d2 = (tmp.d5 > a.d5) ? a.d2 : tmp.d2;
~~~~^~

./barrett15.cl:1996:41: error: Ternary operator argument types do not match
tmp.d3 = (tmp.d5 > a.d5) ? a.d3 : tmp.d3;
~~~~^~

./barrett15.cl:1997:41: error: Ternary operator argument types do not match
tmp.d4 = (tmp.d5 > a.d5) ? a.d4 : tmp.d4;
~~~~^~

./barrett15.cl:1998:41: error: Ternary operator argument types do not match
tmp.d5 = (tmp.d5 > a.d5) ? a.d5 : tmp.d5; // & 0x7FFF not necessary as tmp.d5 is <= a.d5
~~~~^~

error: Compiler frontend failed (error code 59)

END OF BUILD OUTPUT
Error -11 (Build program failure): clBuildProgram
ERROR: load_kernels(0) failed[/code]The environment can compile and successfully run a Hello World OpenCL program which uses a kernel. All I've tried so far is manually setting every GPUType in worktodo.ini. It errors at every instance of a ternary operator. I tried replacing one with an if statement but I'm fumbling as I know zero about OpenCL kernels and of course it didn't work. Any thoughts?

cbug 2019-01-23 23:48

Hi,


I am playing a little with mfakto and now got this message with my AMD A8 7600.
I posted you the verbose output, since I didnt know if you would need it.



[code]
Select device - Get device info:
Device 1/1: Spectre (Advanced Micro Devices, Inc.),
device version: OpenCL 1.2 AMD-APP (2639.3), driver version: 2639.3
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event
Global memory:2139906048, Global memory cache: 16384, local memory: 32768, workgroup size: 256, Work dimensions: 3[1024, 1024, 1024, 0, 0] , Max clock speed:720, compute units:6
WARNING: Unknown GPU name, assuming GCN. Please post the device name "Spectre (Advanced Micro Devices, Inc.)" to http://www.mersenneforum.org/showthread.php?t=15646 to have it added to mfakto. Set GPUType in mfakto.ini to select a GPU type yourself to avoid this warning.


OpenCL device info
name Spectre (Advanced Micro Devices, Inc.)
device (driver) version OpenCL 1.2 AMD-APP (2639.3) (2639.3)
maximum threads per block 1024
maximum threads per grid 1073741824
number of multiprocessors 6 (384 compute elements)
clock rate 720MHz
[/code]

SELROC 2019-01-28 12:31

[QUOTE=cbug;506732]Hi,


I am playing a little with mfakto and now got this message with my AMD A8 7600.
I posted you the verbose output, since I didnt know if you would need it.



[code]
Select device - Get device info:
Device 1/1: Spectre (Advanced Micro Devices, Inc.),
device version: OpenCL 1.2 AMD-APP (2639.3), driver version: 2639.3
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event
Global memory:2139906048, Global memory cache: 16384, local memory: 32768, workgroup size: 256, Work dimensions: 3[1024, 1024, 1024, 0, 0] , Max clock speed:720, compute units:6
WARNING: Unknown GPU name, assuming GCN. Please post the device name "Spectre (Advanced Micro Devices, Inc.)" to http://www.mersenneforum.org/showthread.php?t=15646 to have it added to mfakto. Set GPUType in mfakto.ini to select a GPU type yourself to avoid this warning.


OpenCL device info
name Spectre (Advanced Micro Devices, Inc.)
device (driver) version OpenCL 1.2 AMD-APP (2639.3) (2639.3)
maximum threads per block 1024
maximum threads per grid 1073741824
number of multiprocessors 6 (384 compute elements)
clock rate 720MHz
[/code][/QUOTE]




In case the performance is low, run this command:
[CODE]

./mfakto -d 2 --perftest 1
[/CODE]

it will adjust parameters for your GPU.

ixfd64 2019-03-22 19:01

I might be getting access to a Mac Pro soon. Will mfakto work on a Windows virtual machine?

chalsall 2019-03-22 20:07

[QUOTE=ixfd64;511437]I might be getting access to a Mac Pro soon. Will mfakto work on a Windows virtual machine?[/QUOTE]

Probably not optimally.

But why don't you give it a go, and determine empirically? :smile:

ixfd64 2019-03-22 23:14

I just got the computer and installed mfakto on a Windows virtual machine. However, mfakto doesn't run and immediately crashes with a generic error code. I did some research and found some sources saying macOS doesn't support GPU passthrough. It seems that is still the case. :\

Currently I'm trying to compile mfakto myself on the Mac Pro. I saw a few posts from elsewhere saying that macOS doesn't need the AMD APP SDK as it contains a native OpenCL implementation, but the mfakto makefile specifically references the SDK directories. I copied the SDK from a Linux system but got a bunch of syntax errors. Do I even need the SDK to build mfakto?

I did see a post from David (airsquirrels) saying that he managed to build mfakto on macOS after modifying the OpenCL kernels, but he hasn't been on this forum in a while.

kriesel 2019-03-23 17:08

[QUOTE=M344587487;506054]Anyone had any luck running mfakto on the Mali GPUs found in ARM devices or am I barking up the wrong tree? After removing x86-specific flag -m64 from the Makefile and pointing to the right include and lib directories mfakto compiled but failed to compile OpenCL kernels at runtime:
...[code]
Select device - Get device info:
WARNING: Unknown GPU name, assuming GCN. Please post the device name "Mali-T860 (ARM)" to [URL]http://www.mersenneforum.org/showthread.php?t=15646[/URL] to have it added to mfakto. Set GPUType in mfakto.ini to select a GPU type yourself to avoid this warning.

OpenCL device info
name Mali-T860 (ARM)
device (driver) version OpenCL 1.2 v1.r13p0-00rel0-git(a4271c9).31ba04af2d3c01618138bef3aed66c2c (1.2)
maximum threads per block 256
maximum threads per grid 16777216
number of multiprocessors 4 (256 compute elements)
clock rate 200MHz

Automatic parameters
threads per grid 0
optimizing kernels for GCN

END OF BUILD OUTPUT
Error -11 (Build program failure): clBuildProgram
ERROR: load_kernels(0) failed[/code]The environment can compile and successfully run a Hello World OpenCL program which uses a kernel. All I've tried so far is manually setting every GPUType in worktodo.ini. It errors at every instance of a ternary operator. I tried replacing one with an if statement but I'm fumbling as I know zero about OpenCL kernels and of course it didn't work. Any thoughts?[/QUOTE]I think it likely you'll need to do similar to what BDot did to make the Intel type, to allow running mfakto on Intel igps. Perhaps BDot could chime in with some guidance.
Consider independently confirming opencl is present and working on the device, and what its capabilities are, and testing your build process by some available sample programs. Perhaps there's something here you can use: [URL="https://www.cnx-software.com/2018/05/13/how-to-get-started-with-opencl-on-odroid-xu4-board-with-arm-mali-t628mp6-gpu/"]https://www.cnx-software.com/2018/05/13/how-to-get-started-with-opencl-on-odroid-xu4-board-with-arm-mali-t628mp6-gpu/ [/URL]
The mfakto header info is encouraging that opencl is there and reading a bit about the mali device.
I think you may be one of the first explorers in this new territory, as far as GIMPS goes. Good luck!

M344587487 2019-03-24 23:26

Do these seem like reasonable TF --perftest results for a Radeon VII? Just used --perftest on a single instance, don't know if that's how it should be done or how to compare them to the GPUs here: [URL]https://www.mersenne.ca/mfaktc.php?show=642[/URL]

[code]WARNING: Unknown GPU name, assuming GCN. Please post the device name "gfx906 (Advanced Micro Devices, Inc.)" to http://www.mersenneforum.org/showthread.php?t=15646 to have it added to mfakto. Set GPUType in mfakto.ini to select a GPU type yourself to avoid this warning.

5. GPU tf kernels

exponent=2000093 ... calibrating
exponent=2000093, 24575M FCs each, k=73783545359978, 29.889569 GHz-days (assignment), 0.050239 GHz-days (per test): ..............
cl_barrett32_76_gs [64-76]: 1893.28 ms ==> 13611.22M FCs/s ==> 2292.67 GHz-days/day
cl_barrett32_77_gs [64-77]: 1925.87 ms ==> 13380.87M FCs/s ==> 2253.87 GHz-days/day
cl_barrett15_69_gs [60-69]: 2082.08 ms ==> 12376.98M FCs/s ==> 2084.77 GHz-days/day
cl_barrett15_70_gs [60-69]: 2084.00 ms ==> 12365.52M FCs/s ==> 2082.85 GHz-days/day
cl_barrett32_87_gs [65-87]: 2195.95 ms ==> 11735.12M FCs/s ==> 1976.66 GHz-days/day
cl_barrett32_79_gs [64-79]: 2224.91 ms ==> 11582.40M FCs/s ==> 1950.94 GHz-days/day
cl_barrett32_88_gs [65-88]: 2230.25 ms ==> 11554.69M FCs/s ==> 1946.27 GHz-days/day
cl_barrett15_71_gs [60-70]: 2418.11 ms ==> 10656.98M FCs/s ==> 1795.06 GHz-days/day
cl_barrett15_73_gs [60-73]: 2508.72 ms ==> 10272.09M FCs/s ==> 1730.23 GHz-days/day
cl_barrett32_92_gs [65-92]: 2528.90 ms ==> 10190.14M FCs/s ==> 1716.42 GHz-days/day
cl_barrett15_74_gs [60-74]: 2553.34 ms ==> 10092.60M FCs/s ==> 1700.00 GHz-days/day
cl_barrett15_82_gs [60-81]: 2810.59 ms ==> 9168.81M FCs/s ==> 1544.39 GHz-days/day
cl_barrett15_83_gs [60-82]: 3277.90 ms ==> 7861.68M FCs/s ==> 1324.22 GHz-days/day
cl_barrett15_88_gs [60-87]: 3304.08 ms ==> 7799.38M FCs/s ==> 1313.73 GHz-days/day

Resulting speed for M2000093:
bit_min - bit_max GHz-days/day kernelname
60 - 64 2084.775 cl_barrett15_69_gs
64 - 76 2292.670 cl_barrett32_76_gs
76 - 77 2253.871 cl_barrett32_77_gs
77 - 87 1976.661 cl_barrett32_87_gs
87 - 88 1946.269 cl_barrett32_88_gs
88 - 92 1716.425 cl_barrett32_92_gs

exponent=39000037 ... calibrating
exponent=39000037, 24575M FCs each, k=3783943912403, 1.532868 GHz-days (assignment), 0.050239 GHz-days (per test): ..............
cl_barrett32_76_gs [64-76]: 2383.45 ms ==> 10811.99M FCs/s ==> 1821.17 GHz-days/day
cl_barrett32_77_gs [64-77]: 2418.97 ms ==> 10653.20M FCs/s ==> 1794.42 GHz-days/day
cl_barrett15_69_gs [60-69]: 2648.55 ms ==> 9729.77M FCs/s ==> 1638.88 GHz-days/day
cl_barrett15_70_gs [60-69]: 2649.91 ms ==> 9724.80M FCs/s ==> 1638.04 GHz-days/day
cl_barrett32_87_gs [65-87]: 2789.28 ms ==> 9238.87M FCs/s ==> 1556.19 GHz-days/day
cl_barrett32_79_gs [64-79]: 2832.18 ms ==> 9098.94M FCs/s ==> 1532.62 GHz-days/day
cl_barrett32_88_gs [65-88]: 2844.32 ms ==> 9060.11M FCs/s ==> 1526.08 GHz-days/day
cl_barrett15_71_gs [60-70]: 3122.32 ms ==> 8253.41M FCs/s ==> 1390.20 GHz-days/day
cl_barrett15_73_gs [60-73]: 3237.86 ms ==> 7958.90M FCs/s ==> 1340.60 GHz-days/day
cl_barrett32_92_gs [65-92]: 3242.44 ms ==> 7947.66M FCs/s ==> 1338.70 GHz-days/day
cl_barrett15_74_gs [60-74]: 3298.67 ms ==> 7812.17M FCs/s ==> 1315.88 GHz-days/day
cl_barrett15_82_gs [60-81]: 3606.80 ms ==> 7144.78M FCs/s ==> 1203.46 GHz-days/day
cl_barrett15_83_gs [60-82]: 4262.84 ms ==> 6045.22M FCs/s ==> 1018.26 GHz-days/day
cl_barrett15_88_gs [60-87]: 4287.53 ms ==> 6010.40M FCs/s ==> 1012.39 GHz-days/day

Resulting speed for M39000037:
bit_min - bit_max GHz-days/day kernelname
60 - 64 1638.880 cl_barrett15_69_gs
64 - 76 1821.170 cl_barrett32_76_gs
76 - 77 1794.423 cl_barrett32_77_gs
77 - 87 1556.193 cl_barrett32_87_gs
87 - 88 1526.083 cl_barrett32_88_gs
88 - 92 1338.702 cl_barrett32_92_gs

exponent=66362159 ... calibrating
exponent=66362159, 24575M FCs each, k=2223766598517, 0.900843 GHz-days (assignment), 0.050239 GHz-days (per test): ..............
cl_barrett32_76_gs [64-76]: 2383.68 ms ==> 10810.92M FCs/s ==> 1820.99 GHz-days/day
cl_barrett32_77_gs [64-77]: 2419.53 ms ==> 10650.73M FCs/s ==> 1794.01 GHz-days/day
cl_barrett15_69_gs [60-69]: 2649.23 ms ==> 9727.28M FCs/s ==> 1638.46 GHz-days/day
cl_barrett15_70_gs [60-69]: 2650.28 ms ==> 9723.44M FCs/s ==> 1637.81 GHz-days/day
cl_barrett32_87_gs [65-87]: 2789.70 ms ==> 9237.50M FCs/s ==> 1555.96 GHz-days/day
cl_barrett32_79_gs [64-79]: 2832.34 ms ==> 9098.41M FCs/s ==> 1532.53 GHz-days/day
cl_barrett32_88_gs [65-88]: 2844.57 ms ==> 9059.31M FCs/s ==> 1525.95 GHz-days/day
cl_barrett15_71_gs [60-70]: 3122.16 ms ==> 8253.83M FCs/s ==> 1390.27 GHz-days/day
cl_barrett15_73_gs [60-73]: 3238.22 ms ==> 7958.02M FCs/s ==> 1340.45 GHz-days/day
cl_barrett32_92_gs [65-92]: 3242.91 ms ==> 7946.52M FCs/s ==> 1338.51 GHz-days/day
cl_barrett15_74_gs [60-74]: 3298.96 ms ==> 7811.48M FCs/s ==> 1315.76 GHz-days/day
cl_barrett15_82_gs [60-81]: 3606.81 ms ==> 7144.77M FCs/s ==> 1203.46 GHz-days/day
cl_barrett15_83_gs [60-82]: 4263.03 ms ==> 6044.94M FCs/s ==> 1018.21 GHz-days/day
cl_barrett15_88_gs [60-87]: 4287.52 ms ==> 6010.43M FCs/s ==> 1012.39 GHz-days/day

Resulting speed for M66362159:
bit_min - bit_max GHz-days/day kernelname
60 - 64 1638.461 cl_barrett15_69_gs
64 - 76 1820.989 cl_barrett32_76_gs
76 - 77 1794.006 cl_barrett32_77_gs
77 - 87 1555.962 cl_barrett32_87_gs
87 - 88 1525.948 cl_barrett32_88_gs
88 - 92 1338.509 cl_barrett32_92_gs

exponent=74000077 ... calibrating
exponent=74000077, 24575M FCs each, k=1994240527475, 0.807863 GHz-days (assignment), 0.050239 GHz-days (per test): ..............
cl_barrett32_76_gs [64-76]: 2475.67 ms ==> 10409.22M FCs/s ==> 1753.33 GHz-days/day
cl_barrett32_77_gs [64-77]: 2507.11 ms ==> 10278.71M FCs/s ==> 1731.34 GHz-days/day
cl_barrett15_69_gs [60-69]: 2743.33 ms ==> 9393.64M FCs/s ==> 1582.26 GHz-days/day
cl_barrett15_70_gs [60-69]: 2744.54 ms ==> 9389.47M FCs/s ==> 1581.56 GHz-days/day
cl_barrett32_87_gs [65-87]: 2902.57 ms ==> 8878.27M FCs/s ==> 1495.45 GHz-days/day
cl_barrett32_79_gs [64-79]: 2947.52 ms ==> 8742.89M FCs/s ==> 1472.65 GHz-days/day
cl_barrett32_88_gs [65-88]: 2951.63 ms ==> 8730.70M FCs/s ==> 1470.60 GHz-days/day
cl_barrett15_71_gs [60-70]: 3223.58 ms ==> 7994.16M FCs/s ==> 1346.53 GHz-days/day
cl_barrett15_73_gs [60-73]: 3365.31 ms ==> 7657.49M FCs/s ==> 1289.83 GHz-days/day
cl_barrett32_92_gs [65-92]: 3379.26 ms ==> 7625.87M FCs/s ==> 1284.50 GHz-days/day
cl_barrett15_74_gs [60-74]: 3428.84 ms ==> 7515.61M FCs/s ==> 1265.93 GHz-days/day
cl_barrett15_82_gs [60-81]: 3742.41 ms ==> 6885.89M FCs/s ==> 1159.86 GHz-days/day
cl_barrett15_83_gs [60-82]: 4411.98 ms ==> 5840.86M FCs/s ==> 983.83 GHz-days/day
cl_barrett15_88_gs [60-87]: 4461.24 ms ==> 5776.37M FCs/s ==> 972.97 GHz-days/day

Resulting speed for M74000077:
bit_min - bit_max GHz-days/day kernelname
60 - 64 1582.262 cl_barrett15_69_gs
64 - 76 1753.327 cl_barrett32_76_gs
76 - 77 1731.343 cl_barrett32_77_gs
77 - 87 1495.453 cl_barrett32_87_gs
87 - 88 1470.596 cl_barrett32_88_gs
88 - 92 1284.500 cl_barrett32_92_gs

exponent=78000071 ... calibrating
exponent=78000071, 24575M FCs each, k=1891972028970, 0.766434 GHz-days (assignment), 0.050239 GHz-days (per test): ..............
cl_barrett32_76_gs [64-76]: 2480.96 ms ==> 10387.04M FCs/s ==> 1749.59 GHz-days/day
cl_barrett32_77_gs [64-77]: 2520.91 ms ==> 10222.41M FCs/s ==> 1721.86 GHz-days/day
cl_barrett15_69_gs [60-69]: 2763.75 ms ==> 9324.21M FCs/s ==> 1570.57 GHz-days/day
cl_barrett15_70_gs [60-69]: 2764.86 ms ==> 9320.49M FCs/s ==> 1569.94 GHz-days/day
cl_barrett32_87_gs [65-87]: 2909.00 ms ==> 8858.66M FCs/s ==> 1492.15 GHz-days/day
cl_barrett32_79_gs [64-79]: 2953.57 ms ==> 8724.97M FCs/s ==> 1469.63 GHz-days/day
cl_barrett32_88_gs [65-88]: 2968.85 ms ==> 8680.05M FCs/s ==> 1462.07 GHz-days/day
cl_barrett15_71_gs [60-70]: 3267.20 ms ==> 7887.43M FCs/s ==> 1328.56 GHz-days/day
cl_barrett32_92_gs [65-92]: 3385.53 ms ==> 7611.74M FCs/s ==> 1282.12 GHz-days/day
cl_barrett15_73_gs [60-73]: 3386.59 ms ==> 7609.36M FCs/s ==> 1281.72 GHz-days/day
cl_barrett15_74_gs [60-74]: 3449.81 ms ==> 7469.92M FCs/s ==> 1258.23 GHz-days/day
cl_barrett15_82_gs [60-81]: 3768.71 ms ==> 6837.83M FCs/s ==> 1151.76 GHz-days/day
cl_barrett15_83_gs [60-82]: 4465.07 ms ==> 5771.43M FCs/s ==> 972.14 GHz-days/day
cl_barrett15_88_gs [60-87]: 4487.44 ms ==> 5742.66M FCs/s ==> 967.29 GHz-days/day

Resulting speed for M78000071:
bit_min - bit_max GHz-days/day kernelname
60 - 64 1570.567 cl_barrett15_69_gs
64 - 76 1749.590 cl_barrett32_76_gs
76 - 77 1721.861 cl_barrett32_77_gs
77 - 87 1492.150 cl_barrett32_87_gs
87 - 88 1462.066 cl_barrett32_88_gs
88 - 92 1282.119 cl_barrett32_92_gs

exponent=332900047 ... calibrating
exponent=332900047, 24575M FCs each, k=443298082771, 0.179579 GHz-days (assignment), 0.050239 GHz-days (per test): ..............
cl_barrett32_76_gs [64-76]: 2679.23 ms ==> 9618.36M FCs/s ==> 1620.11 GHz-days/day
cl_barrett32_77_gs [64-77]: 2723.72 ms ==> 9461.25M FCs/s ==> 1593.65 GHz-days/day
cl_barrett15_69_gs [60-69]: 2993.98 ms ==> 8607.20M FCs/s ==> 1449.79 GHz-days/day
cl_barrett15_70_gs [60-69]: 2995.49 ms ==> 8602.87M FCs/s ==> 1449.07 GHz-days/day
cl_barrett32_87_gs [65-87]: 3147.90 ms ==> 8186.34M FCs/s ==> 1378.91 GHz-days/day
cl_barrett32_79_gs [64-79]: 3196.20 ms ==> 8062.63M FCs/s ==> 1358.07 GHz-days/day
cl_barrett32_88_gs [65-88]: 3216.82 ms ==> 8010.95M FCs/s ==> 1349.36 GHz-days/day
cl_barrett15_71_gs [60-70]: 3558.08 ms ==> 7242.61M FCs/s ==> 1219.94 GHz-days/day
cl_barrett32_92_gs [65-92]: 3672.30 ms ==> 7017.35M FCs/s ==> 1182.00 GHz-days/day
cl_barrett15_73_gs [60-73]: 3683.84 ms ==> 6995.36M FCs/s ==> 1178.30 GHz-days/day
cl_barrett15_74_gs [60-74]: 3753.13 ms ==> 6866.22M FCs/s ==> 1156.54 GHz-days/day
cl_barrett15_82_gs [60-81]: 4093.19 ms ==> 6295.78M FCs/s ==> 1060.46 GHz-days/day
cl_barrett15_83_gs [60-82]: 4869.88 ms ==> 5291.67M FCs/s ==> 891.33 GHz-days/day
cl_barrett15_88_gs [60-87]: 4887.28 ms ==> 5272.83M FCs/s ==> 888.15 GHz-days/day

Resulting speed for M332900047:
bit_min - bit_max GHz-days/day kernelname
60 - 64 1449.794 cl_barrett15_69_gs
64 - 76 1620.114 cl_barrett32_76_gs
76 - 77 1593.650 cl_barrett32_77_gs
77 - 87 1378.906 cl_barrett32_87_gs
87 - 88 1349.362 cl_barrett32_88_gs
88 - 92 1182.000 cl_barrett32_92_gs

exponent=999900079 ... calibrating
exponent=999900079, 24575M FCs each, k=147588699800, 0.059788 GHz-days (assignment), 0.050239 GHz-days (per test): ..............
cl_barrett32_76_gs [64-76]: 2768.51 ms ==> 9308.17M FCs/s ==> 1567.87 GHz-days/day
cl_barrett32_77_gs [64-77]: 2804.79 ms ==> 9187.79M FCs/s ==> 1547.59 GHz-days/day
cl_barrett15_69_gs [60-69]: 3079.74 ms ==> 8367.54M FCs/s ==> 1409.43 GHz-days/day
cl_barrett15_70_gs [60-69]: 3080.84 ms ==> 8364.54M FCs/s ==> 1408.92 GHz-days/day
cl_barrett32_87_gs [65-87]: 3259.31 ms ==> 7906.51M FCs/s ==> 1331.77 GHz-days/day
cl_barrett32_79_gs [64-79]: 3307.28 ms ==> 7791.84M FCs/s ==> 1312.46 GHz-days/day
cl_barrett32_88_gs [65-88]: 3315.69 ms ==> 7772.08M FCs/s ==> 1309.13 GHz-days/day
cl_barrett15_71_gs [60-70]: 3639.54 ms ==> 7080.51M FCs/s ==> 1192.64 GHz-days/day
cl_barrett15_73_gs [60-73]: 3801.08 ms ==> 6779.60M FCs/s ==> 1141.95 GHz-days/day
cl_barrett32_92_gs [65-92]: 3806.62 ms ==> 6769.74M FCs/s ==> 1140.29 GHz-days/day
cl_barrett15_74_gs [60-74]: 3873.38 ms ==> 6653.06M FCs/s ==> 1120.64 GHz-days/day
cl_barrett15_82_gs [60-81]: 4216.73 ms ==> 6111.32M FCs/s ==> 1029.39 GHz-days/day
cl_barrett15_83_gs [60-82]: 4995.54 ms ==> 5158.56M FCs/s ==> 868.91 GHz-days/day
cl_barrett15_88_gs [60-87]: 5051.22 ms ==> 5101.70M FCs/s ==> 859.33 GHz-days/day

Resulting speed for M999900079:
bit_min - bit_max GHz-days/day kernelname
60 - 64 1409.426 cl_barrett15_69_gs
64 - 76 1567.866 cl_barrett32_76_gs
76 - 77 1547.589 cl_barrett32_77_gs
77 - 87 1331.771 cl_barrett32_87_gs
87 - 88 1309.127 cl_barrett32_88_gs
88 - 92 1140.294 cl_barrett32_92_gs

exponent=2001862367 ... calibrating
exponent=2001862367, 24575M FCs each, k=73718331001, 0.029863 GHz-days (assignment), 0.050239 GHz-days (per test): ..............
cl_barrett32_76_gs [64-76]: 2879.20 ms ==> 8950.32M FCs/s ==> 1507.59 GHz-days/day
cl_barrett32_77_gs [64-77]: 2934.62 ms ==> 8781.31M FCs/s ==> 1479.12 GHz-days/day
cl_barrett15_69_gs [60-69]: 3235.08 ms ==> 7965.74M FCs/s ==> 1341.75 GHz-days/day
cl_barrett15_70_gs [60-69]: 3237.09 ms ==> 7960.79M FCs/s ==> 1340.91 GHz-days/day
cl_barrett32_87_gs [65-87]: 3391.29 ms ==> 7598.82M FCs/s ==> 1279.94 GHz-days/day
cl_barrett32_79_gs [64-79]: 3441.93 ms ==> 7487.02M FCs/s ==> 1261.11 GHz-days/day
cl_barrett32_88_gs [65-88]: 3474.05 ms ==> 7417.80M FCs/s ==> 1249.45 GHz-days/day
cl_barrett15_71_gs [60-70]: 3871.34 ms ==> 6656.57M FCs/s ==> 1121.23 GHz-days/day
cl_barrett32_92_gs [65-92]: 3961.01 ms ==> 6505.87M FCs/s ==> 1095.85 GHz-days/day
cl_barrett15_73_gs [60-73]: 3991.74 ms ==> 6455.79M FCs/s ==> 1087.41 GHz-days/day
cl_barrett15_74_gs [60-74]: 4067.63 ms ==> 6335.34M FCs/s ==> 1067.12 GHz-days/day
cl_barrett15_82_gs [60-81]: 4430.71 ms ==> 5816.18M FCs/s ==> 979.68 GHz-days/day
cl_barrett15_88_gs [60-87]: 5301.90 ms ==> 4860.48M FCs/s ==> 818.70 GHz-days/day
cl_barrett15_83_gs [60-82]: 5302.55 ms ==> 4859.89M FCs/s ==> 818.60 GHz-days/day

Resulting speed for M2001862367:
bit_min - bit_max GHz-days/day kernelname
60 - 64 1341.748 cl_barrett15_69_gs
64 - 76 1507.590 cl_barrett32_76_gs
76 - 77 1479.122 cl_barrett32_77_gs
77 - 87 1279.943 cl_barrett32_87_gs
87 - 88 1249.452 cl_barrett32_88_gs
88 - 92 1095.847 cl_barrett32_92_gs

exponent=4201971233 ... calibrating
exponent=4201971233, 24575M FCs each, k=35120172035, 0.014227 GHz-days (assignment), 0.050239 GHz-days (per test): ..............
cl_barrett32_76_gs [64-76]: 2961.94 ms ==> 8700.30M FCs/s ==> 1465.48 GHz-days/day
cl_barrett32_77_gs [64-77]: 3001.67 ms ==> 8585.16M FCs/s ==> 1446.08 GHz-days/day
cl_barrett15_69_gs [60-69]: 3300.24 ms ==> 7808.48M FCs/s ==> 1315.26 GHz-days/day
cl_barrett15_70_gs [60-69]: 3301.97 ms ==> 7804.37M FCs/s ==> 1314.57 GHz-days/day
cl_barrett32_87_gs [65-87]: 3495.23 ms ==> 7372.85M FCs/s ==> 1241.88 GHz-days/day
cl_barrett32_79_gs [64-79]: 3548.02 ms ==> 7263.14M FCs/s ==> 1223.40 GHz-days/day
cl_barrett32_88_gs [65-88]: 3555.39 ms ==> 7248.09M FCs/s ==> 1220.87 GHz-days/day
cl_barrett15_71_gs [60-70]: 3909.34 ms ==> 6591.86M FCs/s ==> 1110.33 GHz-days/day
cl_barrett32_92_gs [65-92]: 4087.63 ms ==> 6304.34M FCs/s ==> 1061.90 GHz-days/day
cl_barrett15_73_gs [60-73]: 4088.04 ms ==> 6303.70M FCs/s ==> 1061.79 GHz-days/day
cl_barrett15_74_gs [60-74]: 4167.08 ms ==> 6184.14M FCs/s ==> 1041.65 GHz-days/day
cl_barrett15_82_gs [60-81]: 4528.72 ms ==> 5690.31M FCs/s ==> 958.47 GHz-days/day
cl_barrett15_83_gs [60-82]: 5375.41 ms ==> 4794.02M FCs/s ==> 807.50 GHz-days/day
cl_barrett15_88_gs [60-87]: 5438.51 ms ==> 4738.39M FCs/s ==> 798.13 GHz-days/day

Resulting speed for M4201971233:
bit_min - bit_max GHz-days/day kernelname
60 - 64 1315.258 cl_barrett15_69_gs
64 - 76 1465.477 cl_barrett32_76_gs
76 - 77 1446.082 cl_barrett32_77_gs
77 - 87 1241.881 cl_barrett32_87_gs
87 - 88 1220.867 cl_barrett32_88_gs
88 - 92 1061.902 cl_barrett32_92_gs [/code]

ixfd64 2019-03-25 16:43

1 Attachment(s)
I got past the "unknown argument" errors by telling [c]make[/c] to use GCC instead of Clang. However, I'm getting a [I]ton[/I] of errors. Any idea how to resolve this?

I'm trying to compile on a Mac Pro.

ixfd64 2019-03-25 22:50

1 Attachment(s)
Some progress: I was able to compile mfakto for macOS after making an OS-specific makefile and adding macros to detect macOS systems. However, mfakto crashes with an error:

[CODE]OpenCL device info
name AMD Radeon HD - FirePro D700 Compute Engine (AMD)
device (driver) version OpenCL 1.2 (1.2 (Jun 29 2018 18:33:51))
maximum threads per block 256
maximum threads per grid 16777216
number of multiprocessors 32 (2048 compute elements)
clock rate 150MHz

Automatic parameters
threads per grid 0
optimizing kernels for GCN

Compiling kernels (build options: "-I. -DVECTOR_SIZE=2 -DGCN -O3 -DMORE_CLASSES -DCL_GPU_SIEVE").
BUILD OUTPUT

END OF BUILD OUTPUT
Error -43 (Invalid build options): clBuildProgram
ERROR: load_kernels(0) failed[/CODE]

Also, the program does not correctly detect the clock speed, which is supposed to be 850 MHz.

Any ideas?

[B]Update:[/B] I think I got the program to run. Hell yeah!

It turns out the [c]-O3[/c] flag isn't supported in this environment either. Disabling it resolved the [c]clBuildProgram[/c] error. mfakto still shows the wrong clock rate, but this doesn't seem to affect performance.

At any rate, attached is my macOS build. Please test it and let me know if it works. If there are no issues, I'll post the build instructions.

kriesel 2019-03-26 00:27

[QUOTE=ixfd64;511785]
It turns out the [c]-O3[/c] flag isn't supported in this environment either. Disabling it resolved the [c]clBuildProgram[/c] error. mfakto still shows the wrong clock rate, but this doesn't seem to affect performance.[/QUOTE]
No Mac here to test with.
The good news is -O3 would only affect the cpu side, a small fraction of the overall performance since mfaktx is primarily a gpu application.


All times are UTC. The time now is 22:00.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.