mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Factoring (https://www.mersenneforum.org/forumdisplay.php?f=19)
-   -   Faster GPU-ECM with CGBN (https://www.mersenneforum.org/showthread.php?t=27103)

WraithX 2021-08-28 22:06

[QUOTE=henryzz;586760]
Although with B1=20000 I still get:
[CODE]echo "(2^499-1)/20959" | ./ecm -gpu -cgbn -gpucurves 3584 -sigma 3:1000 20000
GMP-ECM 7.0.5-dev [configured with GMP 6.2.99, --enable-asm-redc, --enable-gpu, --enable-assert, --enable-openmp] [ECM]
Input number is (2^499-1)/20959 (146 digits)
Using B1=20000, B2=3804582, sigma=3:1000-3:4583 (3584 curves)
CUDA error (702) occurred: the launch timed out and was terminated
While running cudaDeviceSynchronize() (file cgbn_stage1.cu, line 731)[/CODE][/QUOTE]

What happens if you specify 0 for B2? Like this:
[CODE]echo "(2^499-1)/20959" | ./ecm -gpu -cgbn -gpucurves 3584 -sigma 3:1000 20000 0[/CODE]

henryzz 2021-08-28 22:40

[QUOTE=WraithX;586761]What happens if you specify 0 for B2? Like this:
[CODE]echo "(2^499-1)/20959" | ./ecm -gpu -cgbn -gpucurves 3584 -sigma 3:1000 20000 0[/CODE][/QUOTE]
The same thing.


If I run less curves at once it works. Possibly just that my gpu is pathetic (750 Ti):
[CODE]echo "(2^499-1)/20959" | ./ecm -gpu -cgbn -sigma 3:1000 20000
GMP-ECM 7.0.5-dev [configured with GMP 6.2.99, --enable-asm-redc, --enable-gpu, --enable-assert, --enable-openmp] [ECM]
Input number is (2^499-1)/20959 (146 digits)
Using B1=20000, B2=3804582, sigma=3:1000-3:1319 (320 curves)
Computing 320 Step 1 took 756ms of CPU time / 1269ms of GPU time
Computing 320 Step 2 on CPU took 7488ms[/CODE]

SethTro 2021-08-28 22:44

You might try changing in cgbn_stage1.cu

-#define S_BITS_PER_CALL 10000
+#define S_BITS_PER_CALL 1000


then running with -v which might tell you when the GPU died (and also might prevent timeouts)



[CODE]
$ echo "(2^499-1)/20959" | ./ecm -v -gpu -cgbn -gpucurves 3584 -sigma 3:1000 20000 0
GMP-ECM 7.0.5-dev [configured with GMP 6.2.99, --enable-asm-redc, --enable-gpu, --enable-assert] [ECM]
Input number is (2^499-1)/20959 (146 digits)
GPU: will use device 0: GeForce GTX 1080 Ti, compute capability 6.1, 28 MPs.
Using B1=20000, B2=0, sigma=3:1000-3:4583 (3584 curves)
Running CGBN<512,4> kernel<112,128> at bit 0/28820 (0.0%)...
Running CGBN<512,4> kernel<112,128> at bit 1000/28820 (3.5%)...
...
Running CGBN<512,4> kernel<112,128> at bit 27000/28820 (93.7%)...
Running CGBN<512,4> kernel<112,128> at bit 28000/28820 (97.2%)...
Copying results back to CPU ...
Computing 3584 Step 1 took 15ms of CPU time / 1105ms of GPU time
Throughput: 3244.848 curves per second (on average 0.31ms per Step 1)

[/CODE]

frmky 2021-08-28 22:50

[QUOTE=SethTro;586711]Glad you got a working binary! Would you mind measuring the speedup of echo "2^997-1" with -gpu vs -cgbn?[/QUOTE]
[CODE]$ echo "(2^997-1)" | ./ecm -gpu -sigma 3:1000 20000 0
GMP-ECM 7.0.5-dev [configured with GMP 6.2.1, --enable-asm-redc, --enable-gpu, --enable-assert] [ECM]
Input number is (2^997-1) (301 digits)
Using B1=20000, B2=0, sigma=3:1000-3:6119 (5120 curves)
GPU: Block: 32x32x1 Grid: 160x1x1 (5120 parallel curves)
Computing 5120 Step 1 took 183ms of CPU time / 5364ms of GPU time

$ echo "(2^997-1)" | ./ecm -gpu -cgbn -sigma 3:1000 20000 0
GMP-ECM 7.0.5-dev [configured with GMP 6.2.1, --enable-asm-redc, --enable-gpu, --enable-assert] [ECM]
Input number is (2^997-1) (301 digits)
Using B1=20000, B2=0, sigma=3:1000-3:6119 (5120 curves)
Computing 5120 Step 1 took 1284ms of CPU time / 3057ms of GPU time
[/CODE]

I'll try the configure changes later. Overnight I ran 2560 stage-1 curves on the C201 blocking the aliquot sequence starting at 3366 using B1=85e7. I'm working through stage 2 on those now.

frmky 2021-08-28 23:23

Those changes to acinclude.m4 aren't enough. It still can't find gmp.h during the test compile. We need to add a -I for the gmp include directory. And that breaks the build since it's trying to include libgmp.a during compile.

henryzz 2021-08-29 07:08

Reducing S_BITS_PER_CALL has fixed it for me. Thank you 😀

Gimarel 2021-08-29 12:53

Current git fails for inputs near 512 Bits. It seems that there is a condition the wrong way:[CODE]diff --git a/cgbn_stage1.cu b/cgbn_stage1.cu
index 1b512ecd..f67f8715 100644
--- a/cgbn_stage1.cu
+++ b/cgbn_stage1.cu
@@ -653,7 +653,7 @@ int run_cgbn(mpz_t *factors, int *array_stage_found,
#endif /* IS_DEV_BUILD */
for (int k_i = 0; k_i < available_kernels.size(); k_i++) {
uint32_t kernel_bits = available_kernels[k_i];
- if (kernel_bits + 6 >= mpz_sizeinbase(N, 2)) {
+ if (kernel_bits >= mpz_sizeinbase(N, 2) + 6) {
BITS = kernel_bits;
assert( BITS % 32 == 0 );
TPI = (BITS <= 512) ? 4 : (BITS <= 2048) ? 8 : (BITS <= 8192) ? 16 : 32;[/CODE]

SethTro 2021-08-29 22:55

[QUOTE=Gimarel;586805]Current git fails for inputs near 512 Bits. It seems that there is a condition the wrong way:[CODE]diff --git a/cgbn_stage1.cu b/cgbn_stage1.cu
index 1b512ecd..f67f8715 100644
--- a/cgbn_stage1.cu
+++ b/cgbn_stage1.cu
@@ -653,7 +653,7 @@ int run_cgbn(mpz_t *factors, int *array_stage_found,
#endif /* IS_DEV_BUILD */
for (int k_i = 0; k_i < available_kernels.size(); k_i++) {
uint32_t kernel_bits = available_kernels[k_i];
- if (kernel_bits + 6 >= mpz_sizeinbase(N, 2)) {
+ if (kernel_bits >= mpz_sizeinbase(N, 2) + 6) {
BITS = kernel_bits;
assert( BITS % 32 == 0 );
TPI = (BITS <= 512) ? 4 : (BITS <= 2048) ? 8 : (BITS <= 8192) ? 16 : 32;[/CODE][/QUOTE]

Whoops, totally backwards, coding is hard :p I'll fix it tonight.
Thanks for testing

chris2be8 2021-08-30 16:06

Has anyone checked ecm-cgbn can find factors? On my system with a sm_30 GPU I updated test.gpuecm to pass -cgbn to ecm. But it failed to find any factors when the test cases expected them to be found!

It is *probably* because sm_30 is too low for CGBN.

It will be a while before I can test my newer GPU. The system it's on is running an old version of Linux which doesn't support CUDA 9.0. (I've been working on a "if it works don't fix it" base since it's only used for computations.) Upgrading Linux will probably need a complete re-install which I'll need to plan for a time when I don't need it for a few hours/days. And I'd be happier if I was sure CGBN would work once I got it installed.

SethTro 2021-08-30 18:51

[QUOTE=chris2be8;586868]Has anyone checked ecm-cgbn can find factors? On my system with a sm_30 GPU I updated test.gpuecm to pass -cgbn to ecm. But it failed to find any factors when the test cases expected them to be found!

It is *probably* because sm_30 is too low for CGBN.

It will be a while before I can test my newer GPU. The system it's on is running an old version of Linux which doesn't support CUDA 9.0. (I've been working on a "if it works don't fix it" base since it's only used for computations.) Upgrading Linux will probably need a complete re-install which I'll need to plan for a time when I don't need it for a few hours/days. And I'd be happier if I was sure CGBN would work once I got it installed.[/QUOTE]

Yes, many of use have found the same test factor for (2^499-1)/20959 and I've verified several times that the residuals exactly match those produced by `-gpu`. I've also tested with `$ sage check_gpuecm.sage "./ecm -cgbn"`

frmky 2021-08-30 18:52

[QUOTE=chris2be8;586868]Has anyone checked ecm-cgbn can find factors?[/QUOTE]

Yes, test.gpuecm completes successfully both with and without -cgbn. I'm using a V100 with CUDA 11.3.


All times are UTC. The time now is 12:55.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.