![]() |
|
|
#287 | |
|
"Svein Johansen"
May 2013
Norway
3×67 Posts |
Quote:
Both cudalucas threads, one for each card has been running stable now for 10 hours.. thats way more than yesterday. So downclocking the memory only seems like it did the trick. in 3h I can deliver my first double check. But, there are potential with this card, getting water cooler and back plate for it, atleast backplate with heat absorber isnt expencive, so I am thinking about that for sure. The most interesting with this chip is HyperQ, the ability for one kernel to spawn another one without reporting back to main thread on CPU. This means no CudaMemCopy has to be done, as I assume there is alot of overhead doing just that with CudaLucas. This is what I will look into for the summer. Reading about Cuda every day, and writing small programs and learning the methodologies... well see.. mabye Ill come up with a version of CudaLucas special made for GK110.. :) |
|
|
|
|
|
|
#288 |
|
Mar 2010
3·137 Posts |
IIRC features of Tesla boards like TCC, ECC, HyperQ and DMA are disabled for GeForce variants of Tesla GPUs, even for GTX Titan.
The only thing NV was "generous" enough not to artificially disable is the DP FP performance. Also worth noting that under CUDALucas, the gpu core is at around 48°C, memory heatsinks on the back side are at around 40°C(yes, I measured that using a digital multimeter). Should be obvious now that the heat is not the problem. As for the screw up, NV did not conduct enough torture testing on GTX Titan. As it turns out, the problem arises with double precision arithmetic(CuLu, CPm1, but not with mfaktc, cudamemtest etc). Personally, I don't mind overvolting memory using the pencil method, I just don't know where to "draw". Last fiddled with by Karl M Johnson on 2013-05-12 at 08:28 |
|
|
|
|
|
#289 | |
|
"Svein Johansen"
May 2013
Norway
20110 Posts |
Quote:
|
|
|
|
|
|
|
#290 |
|
"Svein Johansen"
May 2013
Norway
3·67 Posts |
I didnt know hyperQ got disabled by nvidia on Titan board. Im pretty sure I saw in one ad that HyperQ was one of the benefits for the Titan.. mabye Im wrong..
ohh well, pretty happy with the setup.. it gives me a huge boost in performance, and got dev environment setup.. money isnt the issue, as I have good work, but I dont just go and buy Tesla board unless I really need it.. If HyperQ really is turned off for Titan.. well, I might find myself picking up last tesla once Maxwell is here.. direct mem adress on host mem.. thats incredible. Atm I am writing my own cuda program to calculate an old algorithm I had interest for many years ago.. a exp 2 + b exp 2 + c exp 2 + d exp 2 = e exp 2. Only 2 combinations known, so I was thinking to give it a weeks try to learn cuda programming well enough.. |
|
|
|
|
|
#291 | |
|
"Svein Johansen"
May 2013
Norway
3×67 Posts |
Quote:
Then, mabye a week more testing with the titans and they might get returned (got 14 days to return them) and order one Tesla card instead.. well see.. |
|
|
|
|
|
|
#292 |
|
Jul 2003
So Cal
2,663 Posts |
The Titan should support dynamic parallelism:
https://developer.nvidia.com/ultimat...evelopment-gpu Try the example in the samples. nVidia has done a horrible marketing job with the term HyperQ. In different contexts it has referred to: 1. Dynamic parallelism - the ability for a kernel to launch another kernel. The GTX Titan should support this. 2. Concurrent kernel execution - the ability for multiple streams in a single process to run simultaneously on the CUDA cores. This is supported in CC >= 2.0 (starting with Fermi) but HyperQ relaxes the restrictions. The GTX Titan should also support this. 3. Concurrent kernel execution from different processes - the ability for kernels launched from different processes on the computer to run simultaneously on the CUDA cores. This should be coming for the GTX Titan, but is still in development and not yet supported even for the Tesla K20. The GTX Titan loses ECC memory, RDMA transfers, and perhaps most importantly significant burn-in testing to ensure the cores and memory are stable. |
|
|
|
|
|
#293 | |
|
"Svein Johansen"
May 2013
Norway
3×67 Posts |
Quote:
|
|
|
|
|
|
|
#294 |
|
"Carl Darby"
Oct 2012
Spring Mountains, Nevada
4738 Posts |
CudaLucas and CudaPm1 do no device to host to device memory transfers. At the beginning of a test, initialization data is copied from the host to the device. Occasionally the data on the device is copied back to the host to monitor progress or check the results of a completed test.
|
|
|
|
|
|
#295 | |
|
"Svein Johansen"
May 2013
Norway
3·67 Posts |
Quote:
Dynamic Parallelism– addsthe capability fortheGPUto generate new work foritself, synchronize on results, and controlthe scheduling ofthat work via dedicated, accelerated hardware paths, all withoutinvolving the CPU. By providing the flexibility to adaptto the amount and formof parallelismthrough the course of a program's execution, programmers can exposemore varied kinds of parallel work andmake themost efficient use theGPUas a computation evolves. This capability allowsless‐structured,more complex tasksto run easily and effectively, enabling larger portions of an application to run entirely on theGPU. In addition, programs are easierto create, and the CPUisfreed for othertasks. Hyper-Q–Hyper‐Qenablesmultiple CPUcoresto launch work on a singleGPU simultaneously,thereby dramatically increasingGPUutilization and significantly reducing CPU idle times.Hyper‐Qincreasesthe total number of connections(work queues) between the host and theGK110GPUby allowing 32 simultaneous, hardware‐managed connections(compared to the single connection available with Fermi).Hyper‐Qis a flexible solution that allowsseparate connectionsfrommultiple CUDA streams,frommultiple Message Passing Interface (MPI) processes, or even frommultiple threads within a process. Applicationsthat previously encountered false serialization acrosstasks,thereby limiting achievedGPUutilization, can see up to dramatic performance increase without changing any existing code. Grid Management Unit – Enabling Dynamic Parallelismrequires an advanced,flexible gridmanagement and dispatch controlsystem. The new GK110Grid ManagementUnit(GMU) manages and prioritizes gridsto be executed on theGPU. TheGMUcan pause the dispatch of new grids and queue pending and suspended grids untilthey are ready to execute, providing the flexibility to enable powerfulruntimes,such asDynamic Parallelism. TheGMUensures both CPU‐ andGPU‐generated workloads are properlymanaged and dispatched. NVIDIA GPUDirect™–NVIDIAGPUDirect™ is a capability that enablesGPUs within a single computer, orGPUsin differentserverslocated across a network,to directly exchange data without needing to go to CPU/systemmemory. The RDMA feature inGPUDirect allows third party devicessuch as SSDs,NICs, and IB adaptersto directly accessmemory onmultiple GPUs within the same system,significantly decreasing the latency of MPIsend and receive messagesto/fromGPUmemory. It also reduces demands on systemmemory bandwidth and freestheGPUDMA enginesfor use by other CUDA tasks. KeplerGK110 also supports other GPUDirectfeaturesincluding Peer‐to‐Peer andGPUDirectfor Video. |
|
|
|
|
|
|
#296 |
|
"Svein Johansen"
May 2013
Norway
C916 Posts |
I just compiled the simplehyperq sample from Nvidia cuda samples. I got this result:
C:\ProgramData\NVIDIA Corporation\CUDA Samples\v5.0\bin\win64\Release>simplehype rq starting hyperQ... GPU Device 0: "GeForce GTX TITAN" with compute capability 3.5 > Detected Compute SM 3.5 hardware with 14 multi-processors Expected time for serial execution of 32 sets of kernels = 0.640s Expected time for fully concurrent execution of 32 sets of kernels = 0.020s Measured time for sample = 0.053s C:\ProgramData\NVIDIA Corporation\CUDA Samples\v5.0\bin\win64\Release> |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Titan's Best Choice | Brain | GPU Computing | 30 | 2019-10-19 19:19 |
| Titan Black | ATH | Hardware | 15 | 2017-05-27 22:38 |
| Is any GTX 750 the GeForce GTX 750 Ti owner here? | pepi37 | Hardware | 12 | 2016-07-17 22:35 |
| Nvidia announces Titan X | ixfd64 | GPU Computing | 20 | 2015-04-28 00:27 |
| 2x AMD 7990 or 2x Nvidia Titan ?? | Manpowre | GPU Computing | 27 | 2013-05-12 10:00 |