mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Factoring

Reply
 
Thread Tools
Old 2021-08-28, 22:06   #23
WraithX
 
WraithX's Avatar
 
Mar 2006

2×35 Posts
Default

Quote:
Originally Posted by henryzz View Post
Although with B1=20000 I still get:
Code:
echo "(2^499-1)/20959" | ./ecm -gpu -cgbn -gpucurves 3584 -sigma 3:1000 20000
GMP-ECM 7.0.5-dev [configured with GMP 6.2.99, --enable-asm-redc, --enable-gpu, --enable-assert, --enable-openmp] [ECM]
Input number is (2^499-1)/20959 (146 digits)
Using B1=20000, B2=3804582, sigma=3:1000-3:4583 (3584 curves)
CUDA error (702) occurred: the launch timed out and was terminated
While running cudaDeviceSynchronize()   (file cgbn_stage1.cu, line 731)
What happens if you specify 0 for B2? Like this:
Code:
echo "(2^499-1)/20959" | ./ecm -gpu -cgbn -gpucurves 3584 -sigma 3:1000 20000 0
WraithX is offline   Reply With Quote
Old 2021-08-28, 22:40   #24
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

134418 Posts
Default

Quote:
Originally Posted by WraithX View Post
What happens if you specify 0 for B2? Like this:
Code:
echo "(2^499-1)/20959" | ./ecm -gpu -cgbn -gpucurves 3584 -sigma 3:1000 20000 0
The same thing.


If I run less curves at once it works. Possibly just that my gpu is pathetic (750 Ti):
Code:
echo "(2^499-1)/20959" | ./ecm -gpu -cgbn -sigma 3:1000 20000
GMP-ECM 7.0.5-dev [configured with GMP 6.2.99, --enable-asm-redc, --enable-gpu, --enable-assert, --enable-openmp] [ECM]
Input number is (2^499-1)/20959 (146 digits)
Using B1=20000, B2=3804582, sigma=3:1000-3:1319 (320 curves)
Computing 320 Step 1 took 756ms of CPU time / 1269ms of GPU time
Computing 320 Step 2 on CPU took 7488ms

Last fiddled with by henryzz on 2021-08-28 at 22:42
henryzz is online now   Reply With Quote
Old 2021-08-28, 22:44   #25
SethTro
 
SethTro's Avatar
 
"Seth"
Apr 2019

24·23 Posts
Default

You might try changing in cgbn_stage1.cu

-#define S_BITS_PER_CALL 10000
+#define S_BITS_PER_CALL 1000


then running with -v which might tell you when the GPU died (and also might prevent timeouts)



Code:
$ echo "(2^499-1)/20959" | ./ecm -v -gpu -cgbn -gpucurves 3584 -sigma 3:1000 20000 0
GMP-ECM 7.0.5-dev [configured with GMP 6.2.99, --enable-asm-redc, --enable-gpu, --enable-assert] [ECM]
Input number is (2^499-1)/20959 (146 digits)
GPU: will use device 0: GeForce GTX 1080 Ti, compute capability 6.1, 28 MPs.
Using B1=20000, B2=0, sigma=3:1000-3:4583 (3584 curves)
Running CGBN<512,4> kernel<112,128> at bit 0/28820 (0.0%)...
Running CGBN<512,4> kernel<112,128> at bit 1000/28820 (3.5%)...
...
Running CGBN<512,4> kernel<112,128> at bit 27000/28820 (93.7%)...
Running CGBN<512,4> kernel<112,128> at bit 28000/28820 (97.2%)...
Copying results back to CPU ...
Computing 3584 Step 1 took 15ms of CPU time / 1105ms of GPU time
Throughput: 3244.848 curves per second (on average 0.31ms per Step 1)
SethTro is offline   Reply With Quote
Old 2021-08-28, 22:50   #26
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

89216 Posts
Default

Quote:
Originally Posted by SethTro View Post
Glad you got a working binary! Would you mind measuring the speedup of echo "2^997-1" with -gpu vs -cgbn?
Code:
$ echo "(2^997-1)" | ./ecm -gpu -sigma 3:1000 20000 0
GMP-ECM 7.0.5-dev [configured with GMP 6.2.1, --enable-asm-redc, --enable-gpu, --enable-assert] [ECM]
Input number is (2^997-1) (301 digits)
Using B1=20000, B2=0, sigma=3:1000-3:6119 (5120 curves)
GPU: Block: 32x32x1 Grid: 160x1x1 (5120 parallel curves)
Computing 5120 Step 1 took 183ms of CPU time / 5364ms of GPU time

$ echo "(2^997-1)" | ./ecm -gpu -cgbn -sigma 3:1000 20000 0
GMP-ECM 7.0.5-dev [configured with GMP 6.2.1, --enable-asm-redc, --enable-gpu, --enable-assert] [ECM]
Input number is (2^997-1) (301 digits)
Using B1=20000, B2=0, sigma=3:1000-3:6119 (5120 curves)
Computing 5120 Step 1 took 1284ms of CPU time / 3057ms of GPU time
I'll try the configure changes later. Overnight I ran 2560 stage-1 curves on the C201 blocking the aliquot sequence starting at 3366 using B1=85e7. I'm working through stage 2 on those now.
frmky is offline   Reply With Quote
Old 2021-08-28, 23:23   #27
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

42228 Posts
Default

Those changes to acinclude.m4 aren't enough. It still can't find gmp.h during the test compile. We need to add a -I for the gmp include directory. And that breaks the build since it's trying to include libgmp.a during compile.

Last fiddled with by frmky on 2021-08-28 at 23:27
frmky is offline   Reply With Quote
Old 2021-08-29, 07:08   #28
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

31·191 Posts
Default

Reducing S_BITS_PER_CALL has fixed it for me. Thank you 😀
henryzz is online now   Reply With Quote
Old 2021-08-29, 12:53   #29
Gimarel
 
Apr 2010

23·23 Posts
Default

Current git fails for inputs near 512 Bits. It seems that there is a condition the wrong way:
Code:
diff --git a/cgbn_stage1.cu b/cgbn_stage1.cu
index 1b512ecd..f67f8715 100644
--- a/cgbn_stage1.cu
+++ b/cgbn_stage1.cu
@@ -653,7 +653,7 @@ int run_cgbn(mpz_t *factors, int *array_stage_found,
 #endif /* IS_DEV_BUILD */
   for (int k_i = 0; k_i < available_kernels.size(); k_i++) {
     uint32_t kernel_bits = available_kernels[k_i];
-    if (kernel_bits + 6 >=  mpz_sizeinbase(N, 2)) {
+    if (kernel_bits >=  mpz_sizeinbase(N, 2) + 6) {
       BITS = kernel_bits;
       assert( BITS % 32 == 0 );
       TPI = (BITS <= 512) ? 4 : (BITS <= 2048) ? 8 : (BITS <= 8192) ? 16 : 32;
Gimarel is offline   Reply With Quote
Old 2021-08-29, 22:55   #30
SethTro
 
SethTro's Avatar
 
"Seth"
Apr 2019

24×23 Posts
Default

Quote:
Originally Posted by Gimarel View Post
Current git fails for inputs near 512 Bits. It seems that there is a condition the wrong way:
Code:
diff --git a/cgbn_stage1.cu b/cgbn_stage1.cu
index 1b512ecd..f67f8715 100644
--- a/cgbn_stage1.cu
+++ b/cgbn_stage1.cu
@@ -653,7 +653,7 @@ int run_cgbn(mpz_t *factors, int *array_stage_found,
 #endif /* IS_DEV_BUILD */
   for (int k_i = 0; k_i < available_kernels.size(); k_i++) {
     uint32_t kernel_bits = available_kernels[k_i];
-    if (kernel_bits + 6 >=  mpz_sizeinbase(N, 2)) {
+    if (kernel_bits >=  mpz_sizeinbase(N, 2) + 6) {
       BITS = kernel_bits;
       assert( BITS % 32 == 0 );
       TPI = (BITS <= 512) ? 4 : (BITS <= 2048) ? 8 : (BITS <= 8192) ? 16 : 32;
Whoops, totally backwards, coding is hard :p I'll fix it tonight.
Thanks for testing
SethTro is offline   Reply With Quote
Old 2021-08-30, 16:06   #31
chris2be8
 
chris2be8's Avatar
 
Sep 2009

2·32·112 Posts
Default

Has anyone checked ecm-cgbn can find factors? On my system with a sm_30 GPU I updated test.gpuecm to pass -cgbn to ecm. But it failed to find any factors when the test cases expected them to be found!

It is *probably* because sm_30 is too low for CGBN.

It will be a while before I can test my newer GPU. The system it's on is running an old version of Linux which doesn't support CUDA 9.0. (I've been working on a "if it works don't fix it" base since it's only used for computations.) Upgrading Linux will probably need a complete re-install which I'll need to plan for a time when I don't need it for a few hours/days. And I'd be happier if I was sure CGBN would work once I got it installed.
chris2be8 is offline   Reply With Quote
Old 2021-08-30, 18:51   #32
SethTro
 
SethTro's Avatar
 
"Seth"
Apr 2019

17016 Posts
Default

Quote:
Originally Posted by chris2be8 View Post
Has anyone checked ecm-cgbn can find factors? On my system with a sm_30 GPU I updated test.gpuecm to pass -cgbn to ecm. But it failed to find any factors when the test cases expected them to be found!

It is *probably* because sm_30 is too low for CGBN.

It will be a while before I can test my newer GPU. The system it's on is running an old version of Linux which doesn't support CUDA 9.0. (I've been working on a "if it works don't fix it" base since it's only used for computations.) Upgrading Linux will probably need a complete re-install which I'll need to plan for a time when I don't need it for a few hours/days. And I'd be happier if I was sure CGBN would work once I got it installed.
Yes, many of use have found the same test factor for (2^499-1)/20959 and I've verified several times that the residuals exactly match those produced by `-gpu`. I've also tested with `$ sage check_gpuecm.sage "./ecm -cgbn"`

Last fiddled with by SethTro on 2021-08-30 at 18:51
SethTro is offline   Reply With Quote
Old 2021-08-30, 18:52   #33
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

1000100100102 Posts
Default

Quote:
Originally Posted by chris2be8 View Post
Has anyone checked ecm-cgbn can find factors?
Yes, test.gpuecm completes successfully both with and without -cgbn. I'm using a V100 with CUDA 11.3.
frmky is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
NTT faster than FFT? moytrage Software 50 2021-07-21 05:55
PRP on gpu is faster that on cpu indomit Information & Answers 4 2020-10-07 10:50
faster than LL? paulunderwood Miscellaneous Math 13 2016-08-02 00:05
My CPU is getting faster and faster ;-) lidocorc Software 2 2008-11-08 09:26
Faster than LL? clowns789 Miscellaneous Math 3 2004-05-27 23:39

All times are UTC. The time now is 14:11.


Mon Oct 25 14:11:01 UTC 2021 up 94 days, 8:40, 0 users, load averages: 1.31, 1.53, 1.43

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.