mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2013-05-12, 07:06   #287
Manpowre
 
"Svein Johansen"
May 2013
Norway

3×67 Posts
Default

Quote:
Originally Posted by Karl M Johnson View Post
Which just proves Nvidia screwed all of us up.



The problem is not in the heat, it's either in lower-than-necessary voltage or defective memory chips. The former sounds more realistic.
Im not sure if they screwed us, as my two cards kind of behave identical to the thread here.

Both cudalucas threads, one for each card has been running stable now for 10 hours.. thats way more than yesterday. So downclocking the memory only seems like it did the trick. in 3h I can deliver my first double check.

But, there are potential with this card, getting water cooler and back plate for it, atleast backplate with heat absorber isnt expencive, so I am thinking about that for sure.

The most interesting with this chip is HyperQ, the ability for one kernel to spawn another one without reporting back to main thread on CPU. This means no CudaMemCopy has to be done, as I assume there is alot of overhead doing just that with CudaLucas. This is what I will look into for the summer. Reading about Cuda every day, and writing small programs and learning the methodologies... well see.. mabye Ill come up with a version of CudaLucas special made for GK110.. :)
Manpowre is offline   Reply With Quote
Old 2013-05-12, 08:19   #288
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

3·137 Posts
Default

IIRC features of Tesla boards like TCC, ECC, HyperQ and DMA are disabled for GeForce variants of Tesla GPUs, even for GTX Titan.
The only thing NV was "generous" enough not to artificially disable is the DP FP performance.
Also worth noting that under CUDALucas, the gpu core is at around 48°C, memory heatsinks on the back side are at around 40°C(yes, I measured that using a digital multimeter).
Should be obvious now that the heat is not the problem.

As for the screw up, NV did not conduct enough torture testing on GTX Titan.
As it turns out, the problem arises with double precision arithmetic(CuLu, CPm1, but not with mfaktc, cudamemtest etc).

Personally, I don't mind overvolting memory using the pencil method, I just don't know where to "draw".

Last fiddled with by Karl M Johnson on 2013-05-12 at 08:28
Karl M Johnson is offline   Reply With Quote
Old 2013-05-12, 08:41   #289
Manpowre
 
"Svein Johansen"
May 2013
Norway

20110 Posts
Default

Quote:
Originally Posted by Karl M Johnson View Post
IIRC features of Tesla boards like TCC, ECC, HyperQ and DMA are disabled for GeForce variants of Tesla GPUs, even for GTX Titan.
The only thing NV was "generous" enough not to artificially disable is the DP FP performance.
Also worth noting that under CUDALucas, the gpu core is at around 48°C, memory heatsinks on the back side are at around 40°C(yes, I measured that using a digital multimeter).
Should be obvious now that the heat is not the problem.

As for the screw up, NV did not conduct enough torture testing on GTX Titan.
As it turns out, the problem arises with double precision arithmetic(CuLu, CPm1, but not with mfaktc, cudamemtest etc).

Personally, I don't mind overvolting memory using the pencil method, I just don't know where to "draw".
Isnt there a big difference from measuring temp on top of the mem chip, and not inside the core ?
Manpowre is offline   Reply With Quote
Old 2013-05-12, 08:47   #290
Manpowre
 
"Svein Johansen"
May 2013
Norway

3·67 Posts
Default HyperQ

I didnt know hyperQ got disabled by nvidia on Titan board. Im pretty sure I saw in one ad that HyperQ was one of the benefits for the Titan.. mabye Im wrong..

ohh well, pretty happy with the setup.. it gives me a huge boost in performance, and got dev environment setup.. money isnt the issue, as I have good work, but I dont just go and buy Tesla board unless I really need it.. If HyperQ really is turned off for Titan.. well, I might find myself picking up last tesla once Maxwell is here.. direct mem adress on host mem.. thats incredible.

Atm I am writing my own cuda program to calculate an old algorithm I had interest for many years ago.. a exp 2 + b exp 2 + c exp 2 + d exp 2 = e exp 2.
Only 2 combinations known, so I was thinking to give it a weeks try to learn cuda programming well enough..
Manpowre is offline   Reply With Quote
Old 2013-05-12, 08:55   #291
Manpowre
 
"Svein Johansen"
May 2013
Norway

3×67 Posts
Default

Quote:
Originally Posted by Karl M Johnson View Post
IIRC features of Tesla boards like TCC, ECC, HyperQ and DMA are disabled for GeForce variants of Tesla GPUs, even for GTX Titan.
The only thing NV was "generous" enough not to artificially disable is the DP FP performance.
Also worth noting that under CUDALucas, the gpu core is at around 48°C, memory heatsinks on the back side are at around 40°C(yes, I measured that using a digital multimeter).
Should be obvious now that the heat is not the problem.

As for the screw up, NV did not conduct enough torture testing on GTX Titan.
As it turns out, the problem arises with double precision arithmetic(CuLu, CPm1, but not with mfaktc, cudamemtest etc).

Personally, I don't mind overvolting memory using the pencil method, I just don't know where to "draw".
You are 100% right, in the article I read at anandtech, it says first GK110 chip has Hyperq, but below it says that for the titan however its disabled.. dam.. what a disappointment.. thats really what I was looking forward to work on with my code..

Then, mabye a week more testing with the titans and they might get returned (got 14 days to return them) and order one Tesla card instead.. well see..
Manpowre is offline   Reply With Quote
Old 2013-05-12, 19:09   #292
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

2,663 Posts
Default

The Titan should support dynamic parallelism:
https://developer.nvidia.com/ultimat...evelopment-gpu
Try the example in the samples.

nVidia has done a horrible marketing job with the term HyperQ. In different contexts it has referred to:
1. Dynamic parallelism - the ability for a kernel to launch another kernel. The GTX Titan should support this.
2. Concurrent kernel execution - the ability for multiple streams in a single process to run simultaneously on the CUDA cores. This is supported in CC >= 2.0 (starting with Fermi) but HyperQ relaxes the restrictions. The GTX Titan should also support this.
3. Concurrent kernel execution from different processes - the ability for kernels launched from different processes on the computer to run simultaneously on the CUDA cores. This should be coming for the GTX Titan, but is still in development and not yet supported even for the Tesla K20.

The GTX Titan loses ECC memory, RDMA transfers, and perhaps most importantly significant burn-in testing to ensure the cores and memory are stable.
frmky is offline   Reply With Quote
Old 2013-05-12, 20:35   #293
Manpowre
 
"Svein Johansen"
May 2013
Norway

3×67 Posts
Default

Quote:
Originally Posted by frmky View Post
The Titan should support dynamic parallelism:
https://developer.nvidia.com/ultimat...evelopment-gpu
Try the example in the samples.

nVidia has done a horrible marketing job with the term HyperQ. In different contexts it has referred to:
1. Dynamic parallelism - the ability for a kernel to launch another kernel. The GTX Titan should support this.
2. Concurrent kernel execution - the ability for multiple streams in a single process to run simultaneously on the CUDA cores. This is supported in CC >= 2.0 (starting with Fermi) but HyperQ relaxes the restrictions. The GTX Titan should also support this.
3. Concurrent kernel execution from different processes - the ability for kernels launched from different processes on the computer to run simultaneously on the CUDA cores. This should be coming for the GTX Titan, but is still in development and not yet supported even for the Tesla K20.

The GTX Titan loses ECC memory, RDMA transfers, and perhaps most importantly significant burn-in testing to ensure the cores and memory are stable.
Exactly, I knew I read about this somewhere, not this article though, but this proves that with right programming of cudalucas, the Titan can be pushed even further without doing memcopy from device back to host and then back to device to start new threads.
Manpowre is offline   Reply With Quote
Old 2013-05-12, 20:47   #294
owftheevil
 
owftheevil's Avatar
 
"Carl Darby"
Oct 2012
Spring Mountains, Nevada

4738 Posts
Default

CudaLucas and CudaPm1 do no device to host to device memory transfers. At the beginning of a test, initialization data is copied from the host to the device. Occasionally the data on the device is copied back to the host to monitor progress or check the results of a completed test.
owftheevil is offline   Reply With Quote
Old 2013-05-12, 21:05   #295
Manpowre
 
"Svein Johansen"
May 2013
Norway

3·67 Posts
Default

Quote:
Originally Posted by owftheevil View Post
CudaLucas and CudaPm1 do no device to host to device memory transfers. At the beginning of a test, initialization data is copied from the host to the device. Occasionally the data on the device is copied back to the host to monitor progress or check the results of a completed test.
ok, I havent read all the code, just got it compiled and started indexing the code..

Dynamic Parallelism– addsthe capability fortheGPUto generate new work foritself,
synchronize on results, and controlthe scheduling ofthat work via dedicated, accelerated
hardware paths, all withoutinvolving the CPU. By providing the flexibility to adaptto the
amount and formof parallelismthrough the course of a program's execution, programmers can
exposemore varied kinds of parallel work andmake themost efficient use theGPUas a
computation evolves. This capability allowsless‐structured,more complex tasksto run easily
and effectively, enabling larger portions of an application to run entirely on theGPU. In addition,
programs are easierto create, and the CPUisfreed for othertasks.


 Hyper-Q–Hyper‐Qenablesmultiple CPUcoresto launch work on a singleGPU
simultaneously,thereby dramatically increasingGPUutilization and significantly reducing CPU
idle times.Hyper‐Qincreasesthe total number of connections(work queues) between the host
and theGK110GPUby allowing 32 simultaneous, hardware‐managed connections(compared to
the single connection available with Fermi).Hyper‐Qis a flexible solution that allowsseparate
connectionsfrommultiple CUDA streams,frommultiple Message Passing Interface (MPI)
processes, or even frommultiple threads within a process. Applicationsthat previously
encountered false serialization acrosstasks,thereby limiting achievedGPUutilization, can see
up to dramatic performance increase without changing any existing code.

Grid Management Unit – Enabling Dynamic Parallelismrequires an advanced,flexible
gridmanagement and dispatch controlsystem. The new GK110Grid ManagementUnit(GMU)
manages and prioritizes gridsto be executed on theGPU. TheGMUcan pause the dispatch of
new grids and queue pending and suspended grids untilthey are ready to execute, providing the
flexibility to enable powerfulruntimes,such asDynamic Parallelism. TheGMUensures both
CPU‐ andGPU‐generated workloads are properlymanaged and dispatched.

 NVIDIA GPUDirect™–NVIDIAGPUDirect™ is a capability that enablesGPUs within a
single computer, orGPUsin differentserverslocated across a network,to directly exchange
data without needing to go to CPU/systemmemory. The RDMA feature inGPUDirect allows
third party devicessuch as SSDs,NICs, and IB adaptersto directly accessmemory onmultiple
GPUs within the same system,significantly decreasing the latency of MPIsend and receive
messagesto/fromGPUmemory. It also reduces demands on systemmemory bandwidth and
freestheGPUDMA enginesfor use by other CUDA tasks. KeplerGK110 also supports other
GPUDirectfeaturesincluding Peer‐to‐Peer andGPUDirectfor Video.
Manpowre is offline   Reply With Quote
Old 2013-05-12, 21:35   #296
Manpowre
 
"Svein Johansen"
May 2013
Norway

C916 Posts
Default HyperQ works on Titan

I just compiled the simplehyperq sample from Nvidia cuda samples. I got this result:

C:\ProgramData\NVIDIA Corporation\CUDA Samples\v5.0\bin\win64\Release>simplehype
rq
starting hyperQ...
GPU Device 0: "GeForce GTX TITAN" with compute capability 3.5

> Detected Compute SM 3.5 hardware with 14 multi-processors
Expected time for serial execution of 32 sets of kernels = 0.640s
Expected time for fully concurrent execution of 32 sets of kernels = 0.020s
Measured time for sample = 0.053s

C:\ProgramData\NVIDIA Corporation\CUDA Samples\v5.0\bin\win64\Release>
Manpowre is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Titan's Best Choice Brain GPU Computing 30 2019-10-19 19:19
Titan Black ATH Hardware 15 2017-05-27 22:38
Is any GTX 750 the GeForce GTX 750 Ti owner here? pepi37 Hardware 12 2016-07-17 22:35
Nvidia announces Titan X ixfd64 GPU Computing 20 2015-04-28 00:27
2x AMD 7990 or 2x Nvidia Titan ?? Manpowre GPU Computing 27 2013-05-12 10:00

All times are UTC. The time now is 14:51.


Fri Jul 7 14:51:39 UTC 2023 up 323 days, 12:20, 0 users, load averages: 1.01, 1.11, 1.11

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔