mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

kladner 2016-11-06 20:01

[QUOTE=henryzz;446593]Doesn't the 460 have a much better single precision/double precision ratio than the 750ti?[/QUOTE]
THe 460 is CC 2.1. I considered the possibility that, but perhaps did not look closely enough. I just now did some searching for specific floating point capability, but despite lengthy charts in various places, I did not get the differences sorted out.

henryzz 2016-11-06 20:27

[QUOTE=kladner;446625]THe 460 is CC 2.1. I considered the possibility that, but perhaps did not look closely enough. I just now did some searching for specific floating point capability, but despite lengthy charts in various places, I did not get the differences sorted out.[/QUOTE]

[url]https://en.wikipedia.org/wiki/Fermi_(microarchitecture[/url]) states that consumer Fermi cards have 1/8 double precision speed.
[url]https://en.wikipedia.org/wiki/Kepler_(microarchitecture[/url]) states that consumer Kepler cards have 1/24 double precision speed.
[url]https://en.wikipedia.org/wiki/Maxwell_(microarchitecture[/url]) states that consumer Maxwell cards have 1/32 double precision speed.
[url]https://en.wikipedia.org/wiki/Pascal_(microarchitecture[/url]) basically states that consumer Pascal cards have 1/32 double precision speed.
The 460 is Fermi and 1/8 and the 750Ti is Maxwell and 1/32.

There are variations with some of the Titan cards and server cards which allow things like 1/2 or 1/3 depending on the generation.
The TDP of the 460 is also around 2.5x that of the 750Ti. The 750Ti is only a 60 watt card which is low for a gpu. It is almost as efficient as most of the 900 series gpus(Maxwell gen 2)

kladner 2016-11-06 20:29

Thanks! I looked at related sources, but obviously not carefully enough. Actually, I think I read the general Wiki on CUDA.

kladner 2016-11-07 05:22

[QUOTE=storm5510;446579]Disregard the request above. I found it with a bit more searching.

I changed the screen output options so I could see what was going on. I reserved one doublecheck from PrimNet. CUDALucas reports it can complete it in a little under six days.

Now for my quandary: Prime95 indicates it can do the test in the same amount of time. Six days for a LL test is nothing to sneeze at. CUDALucas did not seem to be utilizing my GPU as much as I thought it might. It's a GTX-750Ti. I could tell by observing the core temperature. mfaktc runs it in the upper 50's one the C scale. CUDALucas only made it into the low 40's.

Just in case anyone wonders about my setup, it all runs with CUDA 8.[/QUOTE]
[QUOTE]That card should do better. [/QUOTE]
My apologies. A more appropriate answer would have been, "That card is better suited to TF work."

henryzz 2016-11-07 15:42

[QUOTE=kladner;446641]My apologies. A more appropriate answer would have been, "That card is better suited to TF work."[/QUOTE]
As are most modern cards.

airsquirrels 2016-11-07 16:42

[QUOTE=henryzz;446658]As are most modern cards.[/QUOTE]

I see this pulled out over and over again about GPUs. "More suited to TF work"

I fail to see how this it backed up by much in terms of actual productivity (exponents cleared - not GhzDays, TF GhzDays are a joke) for many modern cards. I believe the math is "For exponents currently factored < [card specific crossover point], it is better to use your card to TF those exponents. For the giant piles of exponents needing double or first time checks already factored beyond those cross-over points, the card is just as productive doing LL work."

For most modern cards we are already at or nearly at those cross-over points with lots of room to spare, meaning it is equally productive for a person with such a GPU to choose LL from the already factored pile or TF work from the far future list of exponents needing more factoring. It is true that GPUs are far better than CPUs at TF, however we only need enough current generation GPUs doing TF to keep ahead of the CPU/other GPU LL demand pool.

What is especially interesting on the AMD side, and somewhat on the NVIDIA side, is that the older generations of cards did 200-300 GhzDay/Day of TF, but are close to the same as modern cards at LL work due to 1/2 or 1/3 DP units (40-50GhzDay/day). These cards are better suited to DC or LL work than TF compared to a modern card, as in they have a lower cross over point, however even modern cards have crossover points that we often have large reserves of exponents that are factored to that level.

My counter-argument to continuing to factor far ahead with current or aging GPUs is that it isn't particularly power efficient. By the time we actually need those exponents factored we will likely be 1-2 generations of GPUs newer, or current high-end GPUs will be more accessible and migrated down to be widely in use. To henryzz's point, the direction technology is going does seem to be widening the gap between TF and LL and thus raising the cross-over point. That means newer GPUs will likely be just slightly better at LL but perhaps 2x as good at TF. Better to use the GPUs we have now balanced between filling the reserves and checking the exponents and save far-future TF for the next generation of cards that will be even more efficient at it.

chalsall 2016-11-07 17:17

[QUOTE=airsquirrels;446664]Better to use the GPUs we have now balanced between filling the reserves and checking the exponents and save far-future TF for the next generation of cards that will be even more efficient at it.[/QUOTE]

+1!

For a while (read: several years) the GPU TF'ing effort was playing catch-up. Oliver and Bdot (and NVidia and AMD) completely changed the game with their programs and the respective hardware. Heck, we didn't even initially know how deep the GPU TF'ing should go until James stepped in with his analysis of the TF'ing vs. LL'ing and DC'ing cross-over points.

This might sound strange coming from the GPU72 guy, but just looking at the [URL="http://www.mersenne.org/primenet/"]Primenet Exponent Status Distribution Map[/URL] it is clear that the TF'ing is *well* ahead of the LL'ing, DC'ing and even the P-1'ing.

GPU TF'ing will always be needed, but this is a resource management and optimization problem. As GIMPS' stated goal is to find Mersenne Primes (not factors), perhaps it is time for more GPUs be directed to LL'ing or DC'ing (or Carl's P-1 GPU program).

But, as always, this is a volunteer effort. Everyone is encouraged to do whatever kind of work rocks their boat. At the end of the day all the work will be needed and useful.

kladner 2016-11-07 18:28

Thanks to everyone for the discussion and explanations. The GTX 460 may not be very efficient, but it has proven to be rock steady at DC work. I am more leery of CuLu on the 580 because of the odd glitch which affects CC 2.0 cards. I know it's not supposed to spoil the result, but I had more than one proven bad mismatch on that card. That probably means that I did not go far enough in "detuning" it for stability. :smile:

flashjh 2016-11-07 19:00

[QUOTE=kladner;446672]Thanks to everyone for the discussion and explanations. The GTX 460 may not be very efficient, but it has proven to be rock steady at DC work. I am more leery of CuLu on the 580 because of the odd glitch which affects CC 2.0 cards. I know it's not supposed to spoil the result, but I had more than one proven bad mismatch on that card. That probably means that I did not go far enough in "detuning" it for stability. :smile:[/QUOTE]

I have a couple of 580s sitting around that are very stable for all work. I can send them to anyone who would like them.

Batalov 2016-11-07 19:32

I could use one...

flashjh 2016-11-07 19:35

[QUOTE=Batalov;446675]I could use one...[/QUOTE]

Message me and lets get one to you :)


All times are UTC. The time now is 22:50.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.