mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   GPU LL Questions (https://www.mersenneforum.org/showthread.php?t=18450)

Primeinator 2013-08-09 02:53

GPU LL Questions
 
I am interested in using my GPU for first time, WR LL testing. I have an NVIDIA Geforce GT 620. I scanned the FAQ thread but I didn't read all 12 pages and did not see the answers to my questions there and so I thought I would ask them in a new thread. I apologize in advance for my ignorance regarding GPUs. They are not exactly my most knowledgeable area!

I have a liquid-cooled system and a 3.6 GHz i7 3820 overclocked to 3.8 GHz running four LL tests in the 55.6M range (will be WR once they finish current assignments). CPU temps are running at about 51 to 54 Celsius consistently but sometimes get to 56-58.

How many exponents can I test on the GPU simultaneously? Am I reading the information correctly to assume that a 60M exponent will take about 6 days on a low-medium end GPU?

Will my system see much harm in running P95 on the CPU and also using the GPU?

Thanks!

Kyle

Primeinator 2013-08-09 03:44

I have downloaded both CUDALucas and the cudart64.42.9.dll. I also had to download 7-zip to open the latter program. I extracted cudart; however, I still cannot open CUDALucas. When I try, I get an error message saying that cudart64.42.9.dll doesn't exist on my computer.

What am I doing incorrectly?

sdbardwick 2013-08-09 03:54

[QUOTE=Primeinator;348792] Am I reading the information correctly to assume that a 60M exponent will take about 6 days on a low-medium end GPU?

Thanks!

Kyle[/QUOTE]
I think your estimate is off; it might take 5-10 times longer than you estimate, depending on what flavor 620 you have.
Look [URL="http://www.mersenne.ca/cudalucas.php"]this page[/URL] over at mersenne.ca for details.

TheMawn 2013-08-09 03:58

I haven't found this forum to be too hostile toward repeated questions. Something I am rather grateful for myself. I will be following this thread quite closely as I may be using a GPU for LL tests soon also.

I would have thought six days would be right. Wasn't Curtisc's last prime checked out with an Nvidia GPU in three days, allegedly?

kracker 2013-08-09 04:04

[QUOTE=TheMawn;348799]I haven't found this forum to be too hostile toward repeated questions. Something I am rather grateful for myself. I will be following this thread quite closely as I may be using a GPU for LL tests soon also.

I would have thought six days would be right. Wasn't Curtisc's last prime checked out with an Nvidia GPU in three days, allegedly?[/QUOTE]
The people who doublechecked it included someone who used it to check on a gpu.(580)

EDIT: flashjh

sdbardwick 2013-08-09 04:04

[Quote]I would have thought six days would be right. Wasn't Curtisc's last prime checked out with an Nvidia GPU in three days, allegedly?[/QUOTE]

Depends on the GPU; the fastest is like 50x faster than the slowest.

Primeinator 2013-08-09 04:20

[QUOTE=sdbardwick;348798]I think your estimate is off; it might take 5-10 times longer than you estimate, depending on what flavor 620 you have.
Look [URL="http://www.mersenne.ca/cudalucas.php"]this page[/URL] over at mersenne.ca for details.[/QUOTE]

Thank you for your reply. I assume that because my drivers/devices does not put a version 2 by my GPU that it is version one. Regardless of the version, LL testing appears to take MUCH longer on the GPU than my CPU. Guess I should have gone with a better model.

LaurV 2013-08-09 04:42

On a 60M+ exponent:

- On a 580, depending on your card, clock, system, expect a time of 80 to 120 hours.

- On a 620 v2, depending on your card, clock, system, expect a time of 600 to 900 hours.

- On a 620 original, depending on your card, clock, system, expect a time of 1200 to 1800 hours.

Edit: this is not a joke, Keplers have 24 to 1 ratio of single vs double precision floats, they might be better at gaming or TF (depending on model, your mileage may vary), but they are lousy at LL. You can check for yourself how long time per iteration it needs, and see the ETA. If you do, you may post here so we know for sure in the future.

Primeinator 2013-08-09 05:05

[QUOTE=LaurV;348818]On a 60M+ exponent:

- On a 580, depending on your card, clock, system, expect a time of 80 to 120 hours.

- On a 620 v2, depending on your card, clock, system, expect a time of 600 to 900 hours.

- On a 620 original, depending on your card, clock, system, expect a time of 1200 to 1800 hours.

Edit: this is not a joke, Keplers have 24 to 1 ratio of single vs double precision floats, they might be better at gaming or TF (depending on model, your mileage may vary), but they are lousy at LL. You can check for yourself how long time per iteration it needs, and see the ETA. If you do, you may post here so we know for sure in the future.[/QUOTE]

This is pretty much what I gleaned from the chart (only slightly worse). Too bad. I would have loved to included my GPU in the efforts. I am thinking about getting a box with multiple CPUs or GPUs at some point in the future (nothing high end) but something that would be cost effective for testing lots of exponents at the LL wavefront). I need to do more research to see what would be best.

Thank you for your input and assistance.

Manpowre 2013-08-09 09:02

Note that 6xx/7xx boards have 1/24 reduced speed when using double precision. = CudaLucas..

For single precision = TrialFactoring MfactC,, its great.. so better use it to trial factor instead.

My 2x Titans has 1/3 reduced speed, and with 2680 cudacores, the latest 55m exponents is done in 60-65 hours. (believe it or not, it depends on the weather outside, as when room is warm, core clock goes down).

The 580 and 590 has 1/8 reduced speed.. with fewer cuda cores this architecture can easily compete with 680 and 690 since its not 1/24 speed with more cudacores.

Therefore I am purchasing second hand 590 boards right now from the local market.. as one 590 board = 2 fermi cores on same board can do the same over time as one Titan board.

Primeinator 2013-08-09 13:51

Pardon my ignorance... but why is this better if 1/24 is smaller than 1/8. Wouldn't a "1/8" speed reduction in the 590 be worse than the "1/24" reduction?

axn 2013-08-09 14:01

[QUOTE=Primeinator;348868]Pardon my ignorance... but why is this better if 1/24 is smaller than 1/8. Wouldn't a "1/8" speed reduction in the 590 be worse than the "1/24" reduction?[/QUOTE]

DP speed is reduced [B]to[/B] 1/8th of SP. Not "speed is reduced [B]by[/B] 1/8th of ..."

Primeinator 2013-08-09 14:57

[QUOTE=axn;348871]DP speed is reduced [B]to[/B] 1/8th of SP. Not "speed is reduced [B]by[/B] 1/8th of ..."[/QUOTE]

Ah. The mystery is solved! Thank you.

Manpowre 2013-08-09 20:08

with the 1/8 on the 580/590 chip and the reduced number of cuda cores, it still performe great compared to 6xx fermi chip and 7xx kepler chips with a huge number of cuda cores but 1/24 dp speed of sp.

Just got 2x 590 boards here today, fired up one in the machine that had a 6970 board doing TFing.. and 58m exponents are going to be done in 4.5 days and the card does 2 of these simultaniously.. one Titan card does 2 58m exponents in 6 days.

great bang for the bucks.

kladner 2013-08-09 20:20

[QUOTE=Manpowre;348901][SNIP].....great bang for the bucks.[/QUOTE]

And in the winter, great space heaters! Good thing your electricity is cheap.

Manpowre 2013-08-09 20:26

[QUOTE=kladner;348905]And in the winter, great space heaters! Good thing your electricity is cheap.[/QUOTE]

yeah,.. I have to move the machines out of the small office unfortunately.. into living room at some point. still 14 degrees C during night so for now ok, but once it goes to 5-9 degrees C, then they will heat up pretty good..

Manpowre 2013-08-16 12:23

[QUOTE=Primeinator;348868]Pardon my ignorance... but why is this better if 1/24 is smaller than 1/8. Wouldn't a "1/8" speed reduction in the 590 be worse than the "1/24" reduction?[/QUOTE]

No, 1/24 = 24 times slower when doing double precision which is exactly what cudalucas is doing to calculate.
1/8 = 8 times slower.. but 580/590 cards have fewer cudacores, but double precision is not so slow on that architecture.

Still 680/690 cards have 1/24 speed recuction with more cudacores they compete with 580/590 with fewer, but you can now purchase used 580/590 cards for less than $250.. which is great..

620 card has 1/24 speed reduction with fewer cudacores than 680, so naturally its going to be very slow..

Therefore 620 card should be used for single precision integers which is what MfaktC does.

Manpowre 2013-08-16 12:26

Im running 62m-63m exponents on the 590 card now, and it takes 125 hours.
Since this is a multi gpu card, I can run 2 instances.. so in reality, I get 2 exponents done in 125h.

One titan card does the same exponent in about 85 hours. which makes the 590 card a winner, as titan can do 2 exponents in 170h.

LaurV 2013-08-16 13:03

There is[URL="http://www.mersenne.ca/cudalucas.php?model=12"] nothing new[/URL] in what you say. :smile:

Primeinator 2013-08-16 14:00

Do 590s consume a lot of electricity compared to a semi upper end i7

kracker 2013-08-16 14:42

590=[URL="http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-590/specifications"]365W[/URL]
4770K=[URL="http://www.tomshardware.com/reviews/core-i7-4770k-haswell-review,3521.html"]84W[/URL]

Manpowre 2013-08-16 18:56

[QUOTE=kracker;349802]590=[URL="http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-590/specifications"]365W[/URL]
4770K=[URL="http://www.tomshardware.com/reviews/core-i7-4770k-haswell-review,3521.html"]84W[/URL][/QUOTE]

please correct me, but the 590 doesnt use 365w with cuda. as there is overhead here. there is memcopy back to memory after the FFT operation, then there is memcopy to card again, then normalization etc.. so that doesnt put the card at 365w. more like 190w-200w.

kracker 2013-08-16 20:25

[QUOTE=Manpowre;349834]please correct me, but the 590 doesnt use 365w with cuda. as there is overhead here. there is memcopy back to memory after the FFT operation, then there is memcopy to card again, then normalization etc.. so that doesnt put the card at 365w. more like 190w-200w.[/QUOTE]
Sorry, put TDP in there.

Nipal 2013-09-11 16:56

Can anybody explain what I did wrong here:
[URL="http://radikal.ru/fp/29f87ab6d36d4525a26be03132825a7d"][img]http://s005.radikal.ru/i210/1309/7e/f4dc98c7be0a.jpg[/img][/URL]
System: Win7 x64, 6Gb RAM, CUDA 5.5, Nvidia GeForce 780 with 980Mhz Core and 3Gb Memory.
This error appears sometimes, not often, but I can't be sure to leave my computer doing this task for a long time.
Is it wrong "-t" parameter? And which parapeters should I set in the ini-file?

Mini-Geek 2013-09-11 17:17

[QUOTE=Nipal;352736]Can anybody explain what I did wrong here:
[URL="http://radikal.ru/fp/29f87ab6d36d4525a26be03132825a7d"][img]http://s005.radikal.ru/i210/1309/7e/f4dc98c7be0a.jpg[/img][/URL]
System: Win7 x64, 6Gb RAM, CUDA 5.5, Nvidia GeForce 780 with 980Mhz Core and 3Gb Memory.
This error appears sometimes, not often, but I can't be sure to leave my computer doing this task for a long time.
Is it wrong "-t" parameter? And which parapeters should I set in the ini-file?[/QUOTE]

Either your FFT is too small or your GPU is a little bit unstable. You might try choosing a larger FFT or underclocking your core GPU speed.

Nipal 2013-09-11 19:06

My GPU [U]must[/U] be stable. It was bought about a week ago, and it did many TF assignments when I was at work.
At first I moved savefiles away to get correct results.
I tried to enlarge a FFT (it was 5242880 and even 10485760) , but I get the same error almost immidiatly after program start. It always was like this: "iteration = 1001>=1000 && err = ........". I tried to change a number of threads to 1024, with the FFT=4194304 but there were no changes with ETA.
Could it be because of program is compiled for CUDA 4.2 and I have CUDA 5.5 installed?
Could it be because of I have no parameters set in ini-file?
If neither 1st nor 2nd, then what's wrong?

owftheevil 2013-09-11 19:17

Probably the fft is too big. Try something closer to 3072000 for those exponents.

Nipal 2013-09-11 19:49

I let CuLu to decide which FFT to use. CuLu chose 3145728 and the [U]whole[/U] ETA is now less than one with FFT=4194304 and 6.5M+ completed iterations.
I'll leave them (CuLu and GPU) alone. ))
Thanks a lot for all your answers.

Mini-Geek 2013-09-11 20:47

[QUOTE=Nipal;352759]My GPU [U]must[/U] be stable. It was bought about a week ago, and it did many TF assignments when I was at work.[/QUOTE]

Even when a GPU is perfectly stable for TF, it can be too unstable for LL. My GPU was like this when it was new, too. I had [URL="http://www.mersenneforum.org/showthread.php?t=17598"]similar questions[/URL] when I was new to GPUs. Read there for a lot of details about GPU stability, from the perspective of a newbie (me) and those more familiar with it.
I underclocked my memory, and overclocked my core, and ended up with an LL-stable GPU. (I misremembered earlier and stated something incorrect - it's the memory that most likely needs to be underclocked, not the core) The LL is highly sensitive to memory instability, unlike TF and graphics rendering. Graphics card memory is clocked/tested so that the memory is stable enough for graphics display, but that's not stable enough for LLs.

LaurV 2013-09-12 03:11

First of all, (sorry for repeating what other said, but this way it will be better stressed!) doing LL tests uses the card totally different comparing with doing TF. It must be that some card works perfect for one type of the task, but doesn't work for the other. If you have the Prime95 package (not only the exe), please read the last passage of the "stress.txt" file. That applies to GPU testing too.

Now, the best way to make use of your card is to tune the FFT by yourself. I had lots of cards in my hands, and there are no two alike. To tune the FFT, you just launch "cudaLucas -cufftbench from upto step" variation. Using a step of 8k or so, would be enough , but if you are maniac, then you can try a step of 1k (caution, "big" cards, with lots of memory and threads don't work in steps of 1k, but this doesn't matter really, for the tuning part). Then you copy the results down in an excel table, sort them by ratio between the length and the time spent, and delete those which "waste the time" (the ratio is 10-20 times bigger).

Then, for each range of exponents you test, the "error" is the best indicator in choosing the size of the FFT. You should choose the size which gives you the best time per iteration, but keeping the error between 0.01 and 0.2. Higher than this interval you will get rounding errors, lower you will get wrong sums. In the photo you linked, the FFT is way too big.

Alternating, you can have less headache and let cuLu chose the FFT length for you, for a penalty between 2% and 15% of the time, depending of the range of the expo you crunch. There are more information in the cuLu thread (with tables and so) if you are enough patient to go through those 160 pages... (it should be chronologically placed somewhere at the end of 2012, beginning of 2013, when cuLu had many changes from v1.66 to v2.04).

nucleon 2013-09-14 02:35

Nipal,

Sounds like you're hitting the same error titan owners hit early on.

Do you have the latest nvidia beta drivers installed?

The default ram speed seems to cause problems for predominately dual precision floating point (DPFP). DPFP is what cuda lucas and cudapm1 use. TF uses predominately integers.

The latest beta drivers fix this. I'm now currently running my titans with the default ram speed. Prior to the latest beta drivers, the only fix was to decrease ram speed on the GPU. I sugget to use MSI afterburner to modify clock speeds.

In the nvidia driver panel, there's an option to allow to use beta drivers. select this, then ask the app to check for latest updates, and it should give you the option to use driver version 326.80 or greater.

-- Craig

Nipal 2013-09-14 15:34

[QUOTE=nucleon;352966]Nipal,

Sounds like you're hitting the same error titan owners hit early on.

Do you have the latest nvidia beta drivers installed?

The default ram speed seems to cause problems for predominately dual precision floating point (DPFP). DPFP is what cuda lucas and cudapm1 use. TF uses predominately integers.

The latest beta drivers fix this. I'm now currently running my titans with the default ram speed. Prior to the latest beta drivers, the only fix was to decrease ram speed on the GPU. I sugget to use MSI afterburner to modify clock speeds.

In the nvidia driver panel, there's an option to allow to use beta drivers. select this, then ask the app to check for latest updates, and it should give you the option to use driver version 326.80 or greater.

-- Craig[/QUOTE]

Thanks. I will try tomorrow morning (MSK), when my first LL-test will be completed (that's the one with 3145728). Now everything is working absolutly correctly. It has a 5.0 ms/iter average speed and it's running almost round-the-clock (I stopped it sometimes manually to play a couple games :smile:).

Nipal 2013-09-15 07:55

"1 of your Lucas-Lehmer assignments have been tested by another worker. These are highlighted in red below with a dagger (†).Please understand when you turn in the results these will be credited as a Double Check by PrimeNet"
What that means? I speak english not very good, and even translators didn't help me.
1. Does that mean that this exp. was checked before me by someone else? (So how did I get it?)
2. Does that mean that this exp. was given to someone else to do DC-test? (Then I will be calm :smile:)

Upd:
53063159 |No factors below2^72
P-1 |B1=630000, B2=18585000
Unverified LL|CE05D844E88621__ by "Nipal" on 2013-09-15
[B]Assigned |LL testing to "ANONYMOUS" on 2013-08-31[/B]
History |no factor to 2^64 by "ANONYMOUS" on 2008-08-04
...
History |no factor for M53063159 from 2^71 to 2^72 [mfaktc 0.18 barrett79_mul32] by "Chuck" on 2012-02-06
History |ce05d844e88621__ by "Nipal" on 2013-09-15

As I can see this exp was given to "ANONYMOUS" [U]before[/U] me to do LL-test (I assigned it at 2013-09-09).
How can it be?

kracker 2013-09-15 16:34

You're going to have to ask chalsall for that... I got that one one of my DC's as well a while ago.

Mini-Geek 2013-09-15 18:46

[QUOTE=Nipal;353038]"1 of your Lucas-Lehmer assignments have been tested by another worker. These are highlighted in red below with a dagger (†).Please understand when you turn in the results these will be credited as a Double Check by PrimeNet"
What that means? I speak english not very good, and even translators didn't help me.
1. Does that mean that this exp. was checked before me by someone else? (So how did I get it?)
2. Does that mean that this exp. was given to someone else to do DC-test? (Then I will be calm :smile:)[/QUOTE]

I think that GPUto72 doesn't know that you are "Nipal" on PrimeNet. IIRC they will automatically detect that you are "Nipal" after a few results, or you can [URL="http://www.gpu72.com/contact/"]contact[/URL] them to make sure you'll be recognized correctly.

As for the assignment order, GPUto72 shows its assignments as "ANONYMOUS", so the order of events was this:

2013-08-31: GPUto72 reserves the number (shows "ANONYMOUS")
2013-09-09: GPUto72 assigns you the number (since you didn't report this to PrimeNet, it still shows "ANONYMOUS")
2013-09-15: you complete the number (PrimeNet doesn't know that you took "ANONYMOUS"'s reservation, so it shows your completion and "ANONYMOUS"'s reservation separately)

Nipal 2013-10-19 19:58

I sent pm to Chalsall describing my problem about two weeks ago, but I didn't recieve an answer. "Red strings" are still in my assignment list.
I know that Chris is very busy. I don't want to bother him.
So, is here anyone else who has an access to GPU72 and can solve this problem?

chalsall 2013-10-19 20:18

[QUOTE=Nipal;356763]I know that Chris is very busy. I don't want to bother him.
So, is here anyone else who has an access to GPU72 and can solve this problem?[/QUOTE]

Sorry. Like you say, I'm rather busy...

The problem is that GPU72 needs at least three examples of completed work before it will automatically figure out your PrimeNet display name. You've only completed one LL, with one outstanding.

I have manually updated your User record with this knowledge.

Please let me know if the "red line" (not lines) doesn't disappear (and you get the credit) within six hours.

(Yes, I know the system could be smarter about this. My apologies -- I actually have to do "real" paying work as well.)

chalsall 2013-10-19 21:28

[QUOTE=chalsall;356765]Please let me know if the "red line" (not lines) doesn't disappear (and you get the credit) within six hours.[/QUOTE]

With this knowledge, the [URL="https://www.gpu72.com/reports/worker/dd68e6103188b11290a8a04d288b56ce/"]problem[/URL] appears to have been resolved.

Sorry about that. This was my mistake by not training "Spidy" as well as I might have. Thanks for being a "squeaky wheel".

Please let me know if you see any other issues. This hybrid solution space (PrimeNet and GPU72) is rather complex -- many SPEs are possible....

kracker 2013-10-19 22:07

[URL="https://www.gpu72.com/reports/worker/bc909e5ca77199335f07966f10ef83f0/"]?[/URL]

chalsall 2013-10-19 22:21

[QUOTE=kracker;356784][URL="https://www.gpu72.com/reports/worker/bc909e5ca77199335f07966f10ef83f0/"]?[/URL][/QUOTE]

Ah, bloody hell...

Thanks for pointing that out.

Nipal 2013-10-20 06:20

[QUOTE=chalsall;356765]
Please let me know if the "red line" (not lines)...[/QUOTE]
As I already said [URL="http://mersenneforum.org/showthread.php?t=18555"]somewhere[/URL] I don't speak very good english. When I said "red lines" I mean 2 my assignments pointed with a dagger. ))
[QUOTE=chalsall;356765]
... within six hours.[/QUOTE]
When I posted my last message it was already midnight on my clock (GMT+4, Moscow). Sorry, but I went to bed and was sleeping all that time. ))
[QUOTE=chalsall;356765]
...My apologies ...[/QUOTE]
I don't think that a man, who is very busy doing something useful for other people, have to apologize and to sorry for me (particularry). As one saying tells: "That man has no mistakes, that does nothing" (I tried to translate "Не ошибается тот, кто ничего не делает").
Thank you.


All times are UTC. The time now is 16:56.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.