![]() |
![]() |
#3422 | |
Sep 2011
Germany
23·347 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#3423 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5·19·53 Posts |
![]()
BOINC use of mfaktc on multiple gpus should be straightforward.
The BOINC client needs to be able to interrogate the system, enumerate the OpenCL devices, like the lsgpu utility can and gpuowl does, and create a translation table between its 0-based device numbers and OpenCL platform & device # on platform combination, to use multiple gpus in Mfakto. A complicating factor is that there's some device identification overloading occurring. The correct form for first platform first device is 01, but 0, 00, 01, 1 all work to use the same device in mfakto. Platform numbering is zero based but OpenCL device numbering is not, from what I've seen. I've also seen inconsistent platform numbering between lsgpu and mfakto on a multiplatform Windows 10 test system, but consistent numbering on a single-platform Windows 7 test system. Nvidia-smi numbering does not match OpenCL numbering or order on the NVIDIA platform on a Win7 test system, but the CUDA device number order match the OpenCL device number order there. In general, numbering from one context to another is messy. Quote:
Code:
lsgpu, derived/modified from https://gist.github.com/CptFoobar/bcb513d87e574e69c2db 2 Platforms found. Platform 0 1 Device: GeForce GTX 1050 Ti 1.1 Vendor: NVIDIA Corporation 1.2 Type: CL_DEVICE_TYPE_GPU 1.3 Hardware version: OpenCL 1.2 CUDA 1.4 Software version: 451.67 1.5 OpenCL version: OpenCL C 1.2 1.6 Little Endian: Yes 1.7 Max Clock frequency: 1620 MHz 1.8 Image support available: Yes 1.9 Parallel compute units: 6 1.10 OpenCL Device Availability: Yes 1.11 OpenCL Compiler Availability: Yes 1.12 OpenCL Linker Availability: Yes Platform 1 1 Device: Intel(R) UHD Graphics 630 1.1 Vendor: Intel(R) Corporation 1.2 Type: CL_DEVICE_TYPE_GPU 1.3 Hardware version: OpenCL 2.1 NEO 1.4 Software version: 23.20.16.4973 1.5 OpenCL version: OpenCL C 2.1 1.6 Little Endian: Yes 1.7 Max Clock frequency: 1100 MHz 1.8 Image support available: Yes 1.9 Parallel compute units: 24 1.10 OpenCL Device Availability: Yes 1.11 OpenCL Compiler Availability: Yes 1.12 OpenCL Linker Availability: Yes 2 Device: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz 2.1 Vendor: Intel(R) Corporation 2.2 Type: CL_DEVICE_TYPE_CPU 2.3 Hardware version: OpenCL 2.1 (Build 611) 2.4 Software version: 7.6.0.611 2.5 OpenCL version: OpenCL C 2.0 2.6 Little Endian: Yes 2.7 Max Clock frequency: 2200 MHz 2.8 Image support available: Yes 2.9 Parallel compute units: 12 2.10 OpenCL Device Availability: Yes 2.11 OpenCL Compiler Availability: Yes 2.12 OpenCL Linker Availability: Yes End. Code:
for %%a in ( 0 1 2 3 4 ) do call mft %%a for %%a in ( 0 1 2 ) do for %%b in ( 0 1 2 3 4 ) do call mft %%a%%b Code:
:mft.bat set dev=%1 echo device spec -d %dev% for following run >>mfakto-x.txt mfakto -d %dev% >>mfakto-x.txt :making no assumptions about app behavior, try for 2 platforms 3 total devices, 0-3 for platform, 0-4 for device on a given platform : -d 0 gave uhd630 : -d 1 gave uhd630 : -d 2 Select device - ERROR: init_CL(3, 2) failed : -d 3 Select device - ERROR: init_CL(3, 3) failed : -d 4 Select device - ERROR: init_CL(3, 4) failed : -d 00 gave uhd630 : -d 01 gave uhd630 : -d 02 Select device - ERROR: init_CL(3, 2) failed : -d 03 Select device - ERROR: init_CL(3, 3) failed : -d 04 Select device - ERROR: init_CL(3, 4) failed : -d 10 gave uhd630 : -d 11 gave gtx1050ti and subsequent Error -5 (Out of resources): clEnqueueReadBuffer RES failed. : -d 12 Select device - ERROR: init_CL(3, 2) failed : -d 13 Select device - ERROR: init_CL(3, 3) failed : -d 14 Select device - ERROR: init_CL(3, 4) failed : -d 20 gave uhd630 : -d 21 gave uhd630 : -d 22 Select device - ERROR: init_CL(3, 2) failed : -d 23 Select device - ERROR: init_CL(3, 3) failed : -d 24 Select device - ERROR: init_CL(3, 4) failed : :Note: the mfakto.ini contents were left appropriate for the uhd630 throughout, so the gtx105ti may have had reason to fail :platform number is not matching lsgpu output :cpu opencl not encountered :note this is on Windows 10 Pro X64, i7-8750H with UHD 630 and GTX 1050Ti : -d 11 was previously uhd630 Last fiddled with by kriesel on 2020-10-27 at 22:02 |
|
![]() |
![]() |
![]() |
#3424 |
Sep 2011
Germany
23·347 Posts |
![]()
Another error with a GTX3080 (NVIDIA GeForce RTX 3080 (4095MB) driver: 456.38 OpenCL: 1.2):
http://srbase.my-firewall.org/sr5/re...ultid=22792002 ERROR: cudaGetLastError() returned 48: no kernel image is available for execution on the device self compiled by the user: CUDA version info binary compiled for CUDA 11.10 CUDA runtime version 11.10 CUDA driver version 11.10 with BOINC: CUDA version info binary compiled for CUDA 10.0 CUDA runtime version 10.0 CUDA driver version 11.10 Both are not working, Any help is much appreciated. |
![]() |
![]() |
![]() |
#3426 |
Oct 2020
416 Posts |
![]()
I have a 3080 and am the user with the SRBase issue Rebirther posted about. I can run any benchmarks that need ran if you can point me in the right direction on how to run them. I'm running Windows on the machine with the 3080.
|
![]() |
![]() |
![]() |
#3427 |
"Oliver"
Mar 2005
Germany
21278 Posts |
![]()
Finally got in touch with a RTX 3090:
Code:
mfaktc v0.22-pre8 (64bit built) [...] CUDA version info binary compiled for CUDA 11.10 CUDA runtime version 11.10 CUDA driver version 11.10 CUDA device info name GeForce RTX 3090 compute capability 8.6 max threads per block 1024 max shared memory per MP 102400 byte number of multiprocessors 82 clock rate (CUDA cores) 1755MHz memory clock rate: 9751MHz memory bus width: 384 bit [...] Starting trial factoring M66362159 from 2^74 to 2^75 (57.65 GHz-days) k_min = 142321062303420 k_max = 284642124610180 Using GPU kernel "barrett76_mul32_gs" Date Time | class Pct | time ETA | GHz-d/day Sieve Wait Nov 01 00:21 | 0 0.1% | 0.794 12m41s | 6535.09 82485 n.a.% Nov 01 00:21 | 4 0.2% | 0.788 12m35s | 6584.85 82485 n.a.% Nov 01 00:21 | 9 0.3% | 0.776 12m23s | 6686.67 82485 n.a.% [...] Nov 01 00:35 | 4617 100.0% | 0.849 0m00s | 6111.73 82485 n.a.% no factor for M66362159 from 2^74 to 2^75 [mfaktc 0.22-pre8 barrett76_mul32_gs CUDA 11.10 arch 8.0] B29A657C tf(): total time spent: 13m 28.241s Power consumption during the run was about 340-345 Watt as reported by nvidia-smi. Oliver |
![]() |
![]() |
![]() |
#3428 | |
"James Heinrich"
May 2004
ex-Northern Ontario
2×32×5×37 Posts |
![]() Quote:
https://www.mersenne.ca/mfaktc.php#benchmark For both cudalucas and gpuowl, if you could start a primality test on exponent 57885161 and let it run for a few iterations until the iteration time is stable (typically anywhere from 30k-100k iterations) and then email me the output (james@mersenne.ca) BTW: There will likely be a much more useful benchmark mode for gpuowl in a future version, but that's likely several months away yet, so for now that simple benchmark will suffice for comparison with other results. |
|
![]() |
![]() |
![]() |
#3429 | |
Jun 2003
493410 Posts |
![]() Quote:
It is weird that this is not working, though. Normally, the driver would take any available PTX and generate code for the CC dynamically. |
|
![]() |
![]() |
![]() |
#3430 | |
Oct 2020
22 Posts |
![]() Quote:
# generate code for various compute capabilities #NVCCFLAGS += --generate-code arch=compute_11,code=sm_11 # CC 1.1, 1.2 and 1.3 GPUs will use this code (1.0 is not possible for mfaktc) #NVCCFLAGS += --generate-code arch=compute_20,code=sm_20 # CC 2.x GPUs will use this code, one code fits all! #NVCCFLAGS += --generate-code arch=compute_30,code=sm_30 # all CC 3.x GPUs _COULD_ use this code #NVCCFLAGS += --generate-code arch=compute_35,code=sm_35 # but CC 3.5 (3.2?) _CAN_ use funnel shift which is useful for mfaktc #NVCCFLAGS += --generate-code arch=compute_50,code=sm_50 # CC 5.x GPUs will use this code NVCCFLAGS += --generate-code arch=compute_86,code=sm_86 # CC 5.x GPUs will use this code Last fiddled with by Icecold on 2020-11-01 at 05:58 |
|
![]() |
![]() |
![]() |
#3431 | |
Oct 2020
22 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#3432 | |
Jun 2003
2·2,467 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1668 | 2020-12-22 15:38 |
The P-1 factoring CUDA program | firejuggler | GPU Computing | 753 | 2020-12-12 18:07 |
gr-mfaktc: a CUDA program for generalized repunits prefactoring | MrRepunit | GPU Computing | 32 | 2020-11-11 19:56 |
mfaktc 0.21 - CUDA runtime wrong | keisentraut | Software | 2 | 2020-08-18 07:03 |
World's second-dumbest CUDA program | fivemack | Programming | 112 | 2015-02-12 22:51 |