mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Msieve (https://www.mersenneforum.org/forumdisplay.php?f=83)
-   -   Msieve with GPU support (https://www.mersenneforum.org/showthread.php?t=12562)

Jeff Gilchrist 2010-01-19 21:05

[QUOTE=jasonp;202473]The same time limits are used for the CPU and GPU versions; whether it will finish on schedule depends on how much work gets poured into stage 2 over the course of the search (if stage 1 finds a hit, stage 2 is not interruptible while the optimization takes place). For relatively small problems like yours it should be close the specified limit, as stage 2 will never run for very long.[/QUOTE]

Ok what about for something like 158 digits, it is saying limit is 300.0 hours, how long should I let that run before I should see something decent to use?

Jeff.

jasonp 2010-01-20 00:06

The experience of others suggests that you won't get a good hit for about a day or so. Maybe we should be starting with A5 above 1 in that case...

Joshua2 2010-01-20 07:52

does the current gpu msieve search more of the space instead of taking less time and are there plans to change this?

jasonp 2010-01-20 13:35

I don't have the time to fiddle with the time limits on the poly search now; if you specify the range to be searched the time limit is 10-15 minutes per A5 and there is no overall time limit. If you don't limit the A5 range the overall time limit is also applied.

Jeff Gilchrist 2010-01-21 11:01

Anyone out there have experience running msieve.gpu on an NVIDIA Tesla S1070 GPU server? We have a cluster that has a number of servers, two CPU servers are connected to a GPU server (which has 4 GPUs) in them.

When I submit a job it can be assigned to any CPU server that is free. In my case two poly selection jobs were sent to the same CPU server which mean they are pointing to the same GPU server of 4 GPUs. Msieve itself says both are using GPU 0.

On a "normal" system, you can easily specify which GPU to use with -g but anyone know how this kind of system would work, are they both really using GPU 0 of the GPU server as well? Is there any way to tell if both my poly selection jobs are running on the same GPU?

jasonp 2010-01-21 13:40

Other than twice the runtime, I doubt there's a way to find out. The code asks CUDA to list the available GPUs, and believes whatever CUDA says. In an ordinary HPC environment, you'd use a linux utility to lock a GPU so that nobody else will use it, then applications can try all the GPUs until they find an unlocked one.

If there's also a batch submission system i front of your Tesla machines, without locking GPUs the most you can hope for is that two msieve instances with the same value of -g are not assigned to the same server. Maybe you can rename the binary for each -g value and forbid multiple identically-named binaries from running on the same server...

frmky 2010-01-22 01:27

The driver can be instructed to allow or forbid multiple processes on a single GPU. If it allows them, then there's no way (in CUDA 2.x, CUDA 3 may change this) to know if another process is also running on the GPU. If it forbids multiple processes, then the second process will return an error.

You state that "two CPU servers are connected to a single GPU server." From the users point of view, this is equivalent to a server with two GPU's installed, numbered 0 and 1. If both msieves report that they are running on GPU 0, then both really are and the driver is set to allow that. (Our S1070 has both ports connected to a single server, so the user sees four GPUS, 0-3.)

So, to avoid oversubscribing a GPU, the only thing you can do is restrict it to only one "-g 0" and one "-g 1" process per server, and hope no one else is using the GPU as well.

Jeff Gilchrist 2010-01-22 20:47

[QUOTE=frmky;202777]You state that "two CPU servers are connected to a single GPU server." From the users point of view, this is equivalent to a server with two GPU's installed, numbered 0 and 1. If both msieves report that they are running on GPU 0, then both really are and the driver is set to allow that. (Our S1070 has both ports connected to a single server, so the user sees four GPUS, 0-3.)

So, to avoid oversubscribing a GPU, the only thing you can do is restrict it to only one "-g 0" and one "-g 1" process per server, and hope no one else is using the GPU as well.[/QUOTE]

That seems to be the case, if I use -g 2 or -g 3 I get an error saying there is no GPU with that number, but it will let me connect to -g 0 or -g 1. So if 4 jobs get assigned to the same server, that means they are sharing the 2 GPUs so ideally I want to use -g 0 and -g1 on two different CPU servers.

Jeff.

frmky 2010-01-22 21:41

Yes, you want one -g 0 and one -g 1 on each of the two servers. This will use all four of the GPUs.

Jeff Gilchrist 2010-01-24 10:52

[QUOTE=frmky;202884]Yes, you want one -g 0 and one -g 1 on each of the two servers. This will use all four of the GPUs.[/QUOTE]

Thanks, it is a little more complicated since you can't specify which server in the submission tool, so I have to actually submit 10 jobs, 4 real and 6 fake since each server has 8 CPUs so will allow up to 8 jobs to run. This way I end up keeping the first 2 jobs on one server and the last 2 jobs on a second server that is hopefully not busy already with someone else's GPU tasks.

If only it supported OpenCL so I could run it at home... :bow:

Jeff.

Joshua2 2010-01-26 03:24

[code]C:\Users\Mark\Downloads\aliqueit108>msieve144_gpu -np 68053628937360040314784806
51829214338493080836379804529992054517869989193736891609130750957354816012909862
83981501471235470599890467656324475746758172995360464501323830394587222020759753
1
deadline: 800 seconds per coefficient
coeff 60-2400 7423459523 8165805475 8165805476 8982386023
------- 7423459523-8165805475 8165805476-8982386023
error (line 195): CUDA_ERROR_UNKNOWN[/code]
I tried twice and I got this error both times? I was runnning folding@home concurrently 1st, but I exited it for the 2nd try.


All times are UTC. The time now is 15:48.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.