mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2017-08-12, 19:59   #12
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

24·3·163 Posts
Default device confirmation

Quote:
Originally Posted by preda View Post
Would it be useful to include a GPU name (id) with manual results? If the server would like that, I could add it to gpuOwL result format.

The rationale is that some user can have multiple GPUs. If some results later prove bad, it may be easier to pin-point the "bad GPU" if an id is attached. (Similar to the CPU-name for mprime)
Reporting performance of a test as having been performed on a specific gpu implies ensuring some accuracy of the report.

Does GpuOwl confirm by probing the device for identifying characteristics, at program start, or at time of generation of the result, that the intended gpu is the one that device selection by device number actually causes the program to use? I ran into the issue of changing device numbers described below on Windows with CUDALucas. It seems to me that some version of the issue might occur also in linux and could occur in applications other than CUDALucas. I've proposed verifying certain device characteristics match as an approach. Otherwise results may get misattributed to a different gpu name and physical gpu, or be the result of running alternately on multiple physical gpus, possibly without the user's knowledge.

In multiple-GPU systems, NVIDIA driver timeout or thermal limits or a combination may cause a device to disappear from the device count, even though Windows Device Manager shows it, and an already running instance of GPU-Z lists it and can display its parameters but not display its sensor readings. In a system with multiple gpus, if one drops out, the number to physical gpu device mapping changes, without user action or knowledge. That means the device number to physical GPU device mapping, embodied in application ini files' specific device number entries changes meaning. User action or batch wrappers may restart a run on a different device than intended as a result. Tests intended to be performed on a specified gpu may run for a time on a different gpu than intended. The logging and results land in the directory and file expected, helping mask the occurrence. If the models or speeds are the same the switch may go undetected. I have observed the remap cause two applications to run on the same gpu at the same time, whose ini files or batch files specify separate gpu device numbers. The remap may affect execution timing of two sessions sharing one gpu, or may cause a restarted session to fail if it requires more resources than available on a dissimilar card or a device number higher than the reduced Windows gpu count allows. Requesting a device number higher than active in the reduced count generates error message in CUDALucas and CUDAPm1,
device_number >= device_count ... exiting
(This is probably a driver problem)

Confirming unique device characteristics could allow greater confidence in execution.

Depending on system configuration, gpu BIOS string, model name, or the combination may or may not be unique enough for device confirmation, but are relatively permanent. PCIexpress bus and ID number combination are I think certain to be unique, but only relate to the unique GPU as its current location, which may change as possibilities for resolving thermal issues get explored. These parameters also have the advantage they can be easily obtained through utilities such as GPU-Z. Another identifier that has been proposed is the UUID available at least in 64-bit Windows.

CUDALucas v2.06beta 64-bit May 5 2017 build outputs and can log and could be modified to check at least the following:

CUDALucas v2.06beta 64-bit build, compiled May 5 2017 @ 13:00:15

binary compiled for CUDA 6.50
CUDA runtime version 6.50
CUDA driver version 8.0

------- DEVICE 0 -------
name GeForce GTX 1060 3GB
UUID GPU-5e2c5531-4684-57ec-6393-8b762f286c70
ECC Support? Disabled
Compatibility 6.1
clockRate (MHz) 1771
memClockRate (MHz) 4004
totalGlobalMem 3221225472
totalConstMem 65536
l2CacheSize 1572864
sharedMemPerBlock 49152
regsPerBlock 65536
warpSize 32
memPitch 2147483647
maxThreadsPerBlock 1024
maxThreadsPerMP 2048
multiProcessorCount 9
maxThreadsDim[3] 1024,1024,64
maxGridSize[3] 2147483647,65535,65535
textureAlignment 512
deviceOverlap 1
pciDeviceID 0
pciBusID 40

Manufacturer serial number would seem ideal, but at least for some apparently it can not be queried. It costs more to put that in a rom somewhere so presumably it is not done for consumer grade gpus. https://superuser.com/questions/4692...he-case#469220
kriesel is online now   Reply With Quote
Old 2017-08-13, 06:46   #13
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

22·3·112 Posts
Default

Right now gpuOwL does not attempt to fill in the GPU name or id automatically. By default it produces *no* UID:, but if the user specifies -uid foo/bar on the command line, it will just use that string (UID: foo/bar) coming from the user without validation or transformation.

To prevent user error, the only element now is logging at startup some basic info about the card, e.g.
"44x1080MHz Hawaii", but that's all.

Now, I don't know exactly how to get a better ID of the card using OpenCL. If such an ID could be obtained, I would at least print that on startup as well.

The second point is, should the software generate UID: automatically? the software still needs the user name from the user. So maybe the hardware id could be filled in automatically -- but I don't know exactly what info would be good to put there. Would "Hawaii-44x1080" automatically be a better string then what the user inputs, e.g. I put "390x"..?

I'm open to improve things in this area (it's not something difficult), but it's not clear to me yet what the solution is.
preda is offline   Reply With Quote
Old 2017-08-13, 13:28   #14
science_man_88
 
science_man_88's Avatar
 
"Forget I exist"
Jul 2009
Dartmouth NS

204158 Posts
Default

Quote:
Originally Posted by preda View Post
Right now gpuOwL does not attempt to fill in the GPU name or id automatically. By default it produces *no* UID:, but if the user specifies -uid foo/bar on the command line, it will just use that string (UID: foo/bar) coming from the user without validation or transformation.

To prevent user error, the only element now is logging at startup some basic info about the card, e.g.
"44x1080MHz Hawaii", but that's all.

Now, I don't know exactly how to get a better ID of the card using OpenCL. If such an ID could be obtained, I would at least print that on startup as well.

The second point is, should the software generate UID: automatically? the software still needs the user name from the user. So maybe the hardware id could be filled in automatically -- but I don't know exactly what info would be good to put there. Would "Hawaii-44x1080" automatically be a better string then what the user inputs, e.g. I put "390x"..?

I'm open to improve things in this area (it's not something difficult), but it's not clear to me yet what the solution is.
Quote:
Originally Posted by https://www.khronos.org/registry/OpenCL/sdk/1.0/docs/man/xhtml/clGetDeviceInfo.html
device
Refers to the device returned by clGetDeviceIDs.
maybe that site will help ?

Last fiddled with by science_man_88 on 2017-08-13 at 13:30
science_man_88 is online now   Reply With Quote
Old 2017-08-14, 04:35   #15
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

24×3×163 Posts
Default

Quote:
Originally Posted by preda View Post
Right now gpuOwL does not attempt to fill in the GPU name or id automatically. By default it produces *no* UID:, but if the user specifies -uid foo/bar on the command line, it will just use that string (UID: foo/bar) coming from the user without validation or transformation.

To prevent user error, the only element now is logging at startup some basic info about the card, e.g.
"44x1080MHz Hawaii", but that's all.

Now, I don't know exactly how to get a better ID of the card using OpenCL. If such an ID could be obtained, I would at least print that on startup as well.

The second point is, should the software generate UID: automatically? the software still needs the user name from the user. So maybe the hardware id could be filled in automatically -- but I don't know exactly what info would be good to put there. Would "Hawaii-44x1080" automatically be a better string then what the user inputs, e.g. I put "390x"..?

I'm open to improve things in this area (it's not something difficult), but it's not clear to me yet what the solution is.
Hardware ID should be unique in the user's fleet and entered in the ini file along with the user name. Ini file can contain identifying information about the gpu expected. Then querying device properties has something to check against to ensure the gpu in use is the one expected at that device number. Part of error control is traceability of results to specific hardware.

Last fiddled with by kriesel on 2017-08-14 at 04:51
kriesel is online now   Reply With Quote
Old 2017-08-16, 17:34   #16
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

11110100100002 Posts
Default

On linux or Windows you can retrieve a UUID per gpu, which is highly likely though not guaranteed to be unique. It's not invariant in time for a particular piece of hardware, if for example a GPU is removed from one system and installed in another; that may result in another UUID related to the one GPU. OS upgrades or reinstalls or driver upgrades or reinstalls are other occurrences that may create new UUIDs for the same hardware. https://en.wikipedia.org/wiki/Univer...que_identifier
kriesel is online now   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mersenne.ca and manual results Gordon mersenne.ca 3 2015-08-31 03:08
manual results ramgeis PrimeNet 8 2013-05-30 06:33
Loading of manual results into the DB mdettweiler No Prime Left Behind 43 2012-01-15 07:50
Manual Testing - Results Submission rogue Sierpinski/Riesel Base 5 5 2008-04-05 02:52
Manual Checkin of P-1 Results Unregistered PrimeNet 1 2004-05-18 03:15

All times are UTC. The time now is 15:01.


Fri Jul 7 15:01:09 UTC 2023 up 323 days, 12:29, 0 users, load averages: 1.41, 1.22, 1.15

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔