mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GMP-ECM (https://www.mersenneforum.org/forumdisplay.php?f=55)
-   -   ECM for CUDA GPUs in latest GMP-ECM ? (https://www.mersenneforum.org/showthread.php?t=16480)

xilman 2015-04-24 16:50

[QUOTE=chris2be8;400801]Has anyone tested if ecm-gpu works correctly if you reduce ECM_GPU_SIZE_DIGIT to 16 bits (the default's 32)? Or got any performance figures for it?[/QUOTE]I haven't, though I've experimented with 512 and 2048 bit arithmetic.

Why do you think it may be useful?

chris2be8 2015-04-25 16:03

[QUOTE=xilman;400805] Why do you think it may be useful?[/QUOTE]

I'm using my GPU for ECM pre-testing of GNFS targets around 100 digits. At that size it takes the GPU nearly as long to do stage 1 as it takes 1 core to do stage 2. So when I have done T30 and start T35 I end up with the CPU waiting for the GPU to finish stage 1 to the higher B1.

I've been trying to avoid that by only increasing B1 by about 40% between steps, but that's still too large and makes the script messy. Speeding up stage 1 on the GPU would be a much better option. Saving electricity would be nice too.

Chris

wombatman 2015-04-25 20:40

I just tried building ECM_GPU with the change you requested. It compiled without issue, but it fails the basic check that I was provided a while back. So, at the very least, doing a 512 bit version isn't a simple change.

lorgix 2015-04-25 21:58

[QUOTE=chris2be8;400899]I'm using my GPU for ECM pre-testing of GNFS targets around 100 digits. At that size it takes the GPU nearly as long to do stage 1 as it takes 1 core to do stage 2. So when I have done T30 and start T35 I end up with the CPU waiting for the GPU to finish stage 1 to the higher B1.

I've been trying to avoid that by only increasing B1 by about 40% between steps, but that's still too large and makes the script messy. Speeding up stage 1 on the GPU would be a much better option. Saving electricity would be nice too.

Chris[/QUOTE]
If stage1 & stage2 take the same amount of time you should just increase B1, or am I missing something?

chris2be8 2015-04-26 16:04

It's time doing stage on my GPU vs time doing stage 2 on my CPU that's the issue. The GPU takes the same time for any size of number while the CPU is faster for smaller numbers. So for smallish GNFS targets it's difficult to ensure the CPU never has to wait for the GPU to finish a set of stage 1 curves before it can start stage 2.

So if I run T30 curves with B1=25e4 and then start T35 with B1=1e6 the CPU will finish stage 2 on the last set from T30 long before the first set for T35 is ready. I can get round it by increasing B1 by no more than 30% at a time, but that's rather messy to code.

A version that runs stage 1 faster on smaller numbers would be a lot easier to handle. But if it won't work I'll have to live with what I've got.

Thanks for the information that it needs more than a simple change Wombatman.

Chris

fivemack 2015-04-26 16:07

[QUOTE=chris2be8;400899]I'm using my GPU for ECM pre-testing of GNFS targets around 100 digits. At that size it takes the GPU nearly as long to do stage 1 as it takes 1 core to do stage 2. So when I have done T30 and start T35 I end up with the CPU waiting for the GPU to finish stage 1 to the higher B1.[/quote]

If you get the GNFS targets in batches, it's worth multiplying them together into larger chunks before doing GPU-ECM with the current version; at 100 digits you might well be able to take a product of three.

lorgix 2015-04-26 16:34

[QUOTE=chris2be8;400971]It's time doing stage on my GPU vs time doing stage 2 on my CPU that's the issue. The GPU takes the same time for any size of number while the CPU is faster for smaller numbers. So for smallish GNFS targets it's difficult to ensure the CPU never has to wait for the GPU to finish a set of stage 1 curves before it can start stage 2.

So if I run T30 curves with B1=25e4 and then start T35 with B1=1e6 the CPU will finish stage 2 on the last set from T30 long before the first set for T35 is ready. I can get round it by increasing B1 by no more than 30% at a time, but that's rather messy to code.

A version that runs stage 1 faster on smaller numbers would be a lot easier to handle. But if it won't work I'll have to live with what I've got.

Thanks for the information that it needs more than a simple change Wombatman.

Chris[/QUOTE]
Aha, OK. You want to minimize wall-clock-time rather than optimize the efficiency of hardware utilization. That makes sense if work/wall-time is your main concern, and you have no other work for your CPU. I was thinking in terms of work/hardware-time.
I think many people would like a version with other than 1024 bit arithmetic. I sure would.

xilman 2015-04-26 18:08

[QUOTE=fivemack;400972]If you get the GNFS targets in batches, it's worth multiplying them together into larger chunks before doing GPU-ECM with the current version; at 100 digits you might well be able to take a product of three.[/QUOTE]That's exactly what I have been doing, though my GNFS targets were around 150 digits so I was running ECM on them in pairs.

Paul

chris2be8 2015-04-27 15:51

[QUOTE=fivemack;400972]If you get the GNFS targets in batches, it's worth multiplying them together into larger chunks before doing GPU-ECM with the current version; at 100 digits you might well be able to take a product of three.[/QUOTE]

Will that make stage 2 slower overall? I'm trying to get the most work out of the CPU, GPU time is nearly free for me. And it'll be a fair job to update my scripts to handle numbers in parallel.

Chris

Singularity 2015-05-23 22:05

GPU test fails with "Error cuda : too many resources requested for launch."
 
Good evening,

I have just downloaded the latest version of ecm and compiled it with --enable-gpu=sm_21 in order to try to run stage1 on my Nvidia GeForce GT 525M.

It compiles without issues and all the ECM tests pass during "make check". However when testing the GPU with "./test.gpuecm ./ecm" I get the following CUDA error:

[QUOTE]$./test.gpuecm ./ecm
GMP-ECM 7.0-dev [configured with GMP 6.0.0, --enable-asm-redc, --enable-gpu, --enable-assert] [ECM]
Input number is 458903930815802071188998938170281707063809443792768383215233 (60 digits)
Using B1=125, B2=0, sigma=3:227-3:258 (32 curves)
cudakernel.cu(216) : Error cuda : too many resources requested for launch.[/QUOTE]When I launch a manual test with "./ecm -v -gpu 125" I get the following more verbose output ending with the same error message:

[QUOTE]$ ./ecm -v -gpu 125
GMP-ECM 7.0-dev [configured with GMP 6.0.0, --enable-asm-redc, --enable-gpu, --enable-assert] [ECM]
Running on blackbox
458903930815802071188998938170281707063809443792768383215233
Input number is 458903930815802071188998938170281707063809443792768383215233 (60 digits)
Using MODMULN [mulredc:0, sqrredc:1]
Computing batch product (of 176 bits) of primes below B1=125 took 0ms
GPU: compiled for a NVIDIA GPU with compute capability 2.1.
GPU: will use device 0: GeForce GT 525M, compute capability 2.1, 2 MPs.
GPU: Selection and initialization of the device took 14ms
Using B1=125, B2=2706, sigma=3:2586393407-3:2586393470 (64 curves)
dF=8, k=6, d=60, d2=7, i0=-4
Expected number of curves to find a factor of n digits:
35 40 45 50 55 60 65 70 75 80
Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf
cudakernel.cu(216) : Error cuda : too many resources requested for launch.[/QUOTE]After some googling I seems the issue is that the cuda kernel is using too much registers/variables/ressources for my card to handle. Unfortunately I don't know if this is the expected behaviour on my card or how to fix it. Any help will be greatly appreciated.

I tried to reduce the number of simultaneous curves with the -gpucurves parameter hoping it would reduce the ressources to something acceptable. It turns out that the minimum value is 32 and I still get the same error.

In case it's helpful here is my version of nvcc:

[QUOTE]$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Mon_Feb_16_22:59:02_CST_2015
Cuda compilation tools, release 7.0, V7.0.27[/QUOTE]

lorgix 2015-05-24 13:26

I can't help you with that problem, but I think you should be using either -gpucurves 48 or 96.
48 if you want the system to be somewhat responsive.


All times are UTC. The time now is 22:06.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.