mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfakto: an OpenCL program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=15646)

kriesel 2018-01-21 19:38

NVIDIA GTX1070 OpenCL data as reported by GPU-Z
 
That certainly seems to explain the VectorSize=1 requirement given by Bdot for mfakto on NVIDIA.
[CODE]General
Platform Name NVIDIA CUDA
Platform Vendor NVIDIA Corporation
Platform Profile FULL_PROFILE
Platform Version OpenCL 1.2 CUDA 8.0.0
Vendor NVIDIA Corporation
Device Name GeForce GTX 1070
Version OpenCL 1.2 CUDA
Driver Version 378.66
C Version OpenCL C 1.2
Profile FULL_PROFILE
Global Memory Size 8192 MB
Clock Frequency 1708 MHz
Compute Units 15
Device Available Yes
Compiler Available Yes
Linker Available Yes
Preferred Synchronization Device
CMD Queue Properties Out of Order, Profiling
SVM Capabilities Coarse
DP Capability Denorm, INF NAN, Round Nearest, Round Zero, Round INF, FMA
SP Capability Denorm, INF NAN, Round Nearest, Round Zero, Round INF, FMA
Half FP Capability None
Address Bits 64
Preferred On-Device Queue 256 KB
Global Memory Cache 240 KB (RW Cache)
Global Memory Cacheline 0 KB
Local Memory Local (48 KB)
Memory Alignment 4096 bits
Built-in Kernels
Little Endian Yes
Error Correction No
Execution Capability Kernel
Unified Memory No
Image Support Yes

Limits
Max Device Events 2048
Max Device Queues 4
Max On-Device Queue 256 KB
Max Memory Allocation 2048 MB
Max Constant Buffer 64 KB
Max Constant Args 9
Max Read Image Args 256
Max Write Image Args 16
Max Samplers 32
Max Work Item Dims 3
Max Write Image Args 16

Native Vectors
Native Vector Width (CHAR) 1
Native Vector Width (SHORT) 1
Native Vector Width (INT) 1
Native Vector Width (LONG) 1
Native Vector Width (FLOAT) 1
Native Vector Width (DOUBLE) 1
Native Vector Width (HALF) N/A
Preferred Vector Width (CHAR) 1
Preferred Vector Width (SHORT) 1
Preferred Vector Width (INT) 1
Preferred Vector Width (LONG) 1
Preferred Vector Width (FLOAT) 1
Preferred Vector Width (DOUBLE) 1
Preferred Vector Width (HALF) N/A

Extensions
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
cl_khr_fp64
cl_khr_byte_addressable_store
cl_khr_icd cl_khr_gl_sharing
cl_nv_compiler_options
cl_nv_device_attribute_query
cl_nv_pragma_unroll
cl_nv_d3d9_sharing
cl_nv_d3d10_sharing
cl_khr_d3d10_sharing
cl_nv_d3d11_sharing
cl_nv_copy_opts[/CODE]

kriesel 2018-01-21 20:54

[QUOTE=xx005fs;478001]Don't really think DP performance matters that much for GpuOwL, because for my gpu overclocking the HBM memory makes it a lot faster and more efficient. Not 100% sure why tho.[/QUOTE]

Interesting. I interpret the GpuOwL author's post regarding new features in v1.5 to say that DP performance is very important for GpuOwL; sounds to me like it's the best one of the four transforms implemented. That's both sufficiently fast and provides sufficient bits of precision to be worth using at 4M length and above: [URL]http://www.mersenneforum.org/showpost.php?p=471318&postcount=224[/URL]
[URL="http://www.mersenneforum.org/showpost.php?p=471318&postcount=224"][/URL]

xx005fs 2018-01-22 01:40

[QUOTE=kriesel;478058]Interesting. I interpret the GpuOwL author's post regarding new features in v1.5 to say that DP performance is very important for GpuOwL; sounds to me like it's the best one of the four transforms implemented. That's both sufficiently fast and provides sufficient bits of precision to be worth using at 4M length and above: [URL]http://www.mersenneforum.org/showpost.php?p=471318&postcount=224[/URL]
[URL="http://www.mersenneforum.org/showpost.php?p=471318&postcount=224"][/URL][/QUOTE]

Double precision is definitely important, but memory speed (at least for Vega card) is just as important. Increasing the clock speed from 1400 to 1700MHz reduced from 3.6ms/it to 3.2ms/it on 1190MHz HBM, however, 800MHz HBM with 1700MHz increased it to 4ms/it. So they are equally important I guess.

SELROC 2018-02-28 19:50

mfakto compilation on debian
 
Hello I am trying to compile mfakto on debian stretch . It gives a great amount of errors, I can post the compiler trace if necessary, but I would like to know if you have some first-time suggestions.

SELROC

ixfd64 2018-03-05 21:45

Has anyone else tried using WINE to run mfakto on macOS?

I'm getting the following error:

[QUOTE]Compiling kernels.
Error 002a:fixme:msvcp:_Locinfo__Locinfo_ctor_cat_cstr (0x22f2a8 1 C) semi-stub
002a:fixme:msvcp:_Locinfo__Locinfo_ctor_cat_cstr (0x22eeb8 1 C) semi-stub
-43 (Invalid build options): clBuildProgram
ERROR: load_kernels(0) failed[/QUOTE]

henryzz 2018-03-05 23:22

Does opencl ever work in WINE?

kriesel 2018-04-27 03:24

mfakto downshift upon finding a factor
 
Anyone have an idea why mfakto dropped in indicated throughput immediately upon finding a factor? It's 3 for 3, on a new RX550, mfakto 0.15pre6, 64bit on Win7, that passed the full selftest. 183ghzd/day before, 89 after, on an RX550. The drop is persistent, continuing after several hours, and more than a 2:1 ratio. The ETA seems not affected, so maybe it's only a cosmetic effect. (Exponent, factor and bits were changed in the first most recent example below, not yet submitted since the bit level hasn't completed yet.)

[CODE]Apr 26 12:34 | 1785 38.8% | 512.54 3d11h | 183.47 38299 0.00%
Apr 26 12:42 | 1792 38.9% | 513.40 3d11h | 183.16 38299 0.00%
M1234567 has a factor: 123456789012134567 (72.843305 bits, 504.966873 GHz-d)
Apr 26 12:51 | 1801 39.0% | 511.66 3d11h | 88.82 38299 0.00%
Apr 26 12:59 | 1809 39.1% | 513.38 3d11h | 88.52 38299 0.00%[/CODE]Also, it seems to clear up with either completion of a worktodo line or a restart, or perhaps a bitlevel completion.
Ctrl-c and restart cleared it up, with the stop/start and short selftest costing about 15 minutes of throughput.
[CODE]Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Apr 26 22:28 | 2116 45.9% | 513.02 3d01h | 183.29 38299 0.00%[/CODE]I do not recall seeing such an effect in mfaktc, which I've run much more than mfakto.
A search through a 60MB sample of mfaktc screen output logged to file shows nothing like that.

Here's another mfakto example, with a more than 15:1 ratio indicated
[CODE]Apr 13 21:01 | 264 5.8% | 456.00 4d18h | 212.08 10045 0.00%
Apr 13 21:08 | 267 5.9% | 453.66 4d17h | 213.18 10045 0.00%
Apr 13 21:16 | 271 6.0% | 453.81 4d17h | 213.11 10045 0.00%
M111269 has a factor: 617778664352573195639 (69.065652 bits, 70.546139 GHz-d)
Apr 13 21:24 | 276 6.1% | 457.79 4d18h | 13.87 10045 0.00%
Apr 13 21:31 | 280 6.3% | 459.59 4d18h | 13.81 10045 0.00%
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Apr 13 21:39 | 291 6.4% | 459.27 4d18h | 13.82 10045 0.00%
[/CODE]Another example, which persisted to completion of this exponent's bit level, clearing up when going on to the next worktodo entry.
[CODE]Apr 13 11:11 | 3555 77.2% | 42.900 2h36m | 110.71 63018 0.00%
Apr 13 11:12 | 3564 77.3% | 44.406 2h41m | 106.96 63018 0.00%
Apr 13 11:13 | 3567 77.4% | 43.091 2h35m | 110.22 63018 0.00%
M290001377 has a factor: 96303240212210144213599 (76.350002 bits, 18.470592 GHz-d)
Apr 13 11:14 | 3568 77.5% | 44.628 2h40m | 37.25 63018 0.00%
Apr 13 11:14 | 3579 77.6% | 44.089 2h37m | 37.70 63018 0.00%
Apr 13 11:15 | 3583 77.7% | 43.331 2h34m | 38.36 63018 0.00%
[/CODE]

kriesel 2018-05-31 04:27

Reference material
 
I was offered "a blog area to consolidate all of your pdfs and guides and stuff" and accepted.
Feel free to have a look and suggest content. (G-rated only;)
General interest gpu related reference material [URL]http://www.mersenneforum.org/showthread.php?t=23371[/URL]
Mfakto OpenCl based factoring on gpus [URL]http://www.mersenneforum.org/showthread.php?t=23394[/URL]

Future updates to material previously posted in this thread will probably occur on the blog threads and not here. Having in-place update without a time limit makes it more manageable there.

James Heinrich 2018-06-23 23:04

Just as a matter of curiosity:
Both mfakto and mfaktc have a limit of exponent < 2[sup]32[/sup]. That's an obvious limit point, but how arbitrary or absolute is that limit? In the (probably distant) future, how easy or hard is it to extend the capabilities of mfakto to higher exponents?

kriesel 2018-06-24 21:52

[QUOTE=James Heinrich;490385]Just as a matter of curiosity:
Both mfakto and mfaktc have a limit of exponent < 2[sup]32[/sup]. That's an obvious limit point, but how arbitrary or absolute is that limit? In the (probably distant) future, how easy or hard is it to extend the capabilities of mfakto to higher exponents?[/QUOTE]

I looked only briefly, and in the main routines it seemed not too bad. But a small sample of CUDA interface code shows various u32 instructions. So probably it has to be gone through from one end to the other by someone who knows what they're doing, routine by routine, kernel by kernel. I nominate not-me.
Looking at prime95's p-1 code for other reasons, I noticed code for handling bounds values bigger than 2^32, which you'll find in ecm.c (containing both ecm and p-1 code).
There's certainly plenty of gpu trial factoring to do within the mersenne.org 10^9 exponent cap, much less 2^32-5, more than 4.2 times higher.

James Heinrich 2018-06-24 22:05

[QUOTE=kriesel;490452]I looked only briefly, and in the main routines it seemed not too bad. But a small sample of CUDA interface code shows various u32 instructions. So probably it has to be gone through from one end to the other by someone who knows what they're doing, routine by routine, kernel by kernel. I nominate not-me.[/QUOTE]Thanks for looking into it. That's kind of what I suspected. I also assume that rewritten code for larger exponents has the potential to be at least slightly slower.

Fortunately there's still a couple hundred million exponents below 2[sup]32[/sup] that need some more TF'ing first, so it's not really a high-priority problem. :smile:


All times are UTC. The time now is 22:42.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.