mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2020-12-17, 19:44   #2630
Ethan (EO)
 
Ethan (EO)'s Avatar
 
"Ethan O'Connor"
Oct 2002
GIMPS since Jan 1996

5C16 Posts
Default

p102-100. Just about the same as a 1080ti, as expected. edit: About 10% slower than a 1080ti. - core count is 3200 vs 3584.

Code:
.\gpuowl-win.exe -device 1 -time -prp 57885161
2020-12-17 11:29:09 GpuOwl VERSION v7.2-21-g28dbf88
2020-12-17 11:29:14 P102-100-1 57885161 OK       800   0.00% 5727fe6a7225c273 2026 us/it + check 0.85s + save 0.07s; ETA 1d 08:35
2020-12-17 11:29:33 P102-100-1 57885161        10000   0.02% 91565f36715e33e3 2031 us/it
2020-12-17 11:29:54 P102-100-1 57885161        20000   0.03% f2c610087d02c3ea 2043 us/it
2020-12-17 11:30:14 P102-100-1 57885161        30000   0.05% fe1565094c7f7b47 2057 us/it
2020-12-17 11:30:19 P102-100-1 57885161 Stopping, please wait..
2020-12-17 11:30:20 P102-100-1 57885161 OK     32400   0.06% ce572a2ae80045f5 2057 us/it + check 0.86s + save 0.08s; ETA 1d 09:04
2020-12-17 11:30:20 P102-100-1 57885161 38.61% carryFused     :    784 us/call x 31197 calls
2020-12-17 11:30:20 P102-100-1 57885161 34.01% tailFusedSquare :    691 us/call x 31200 calls
2020-12-17 11:30:20 P102-100-1 57885161 13.61% fftMiddleOut   :    276 us/call x 31277 calls
2020-12-17 11:30:20 P102-100-1 57885161 13.54% fftMiddleIn    :    274 us/call x 31281 calls
2020-12-17 11:30:20 P102-100-1 57885161  0.12% tailFusedMul   :    970 us/call x    77 calls
2020-12-17 11:30:20 P102-100-1 57885161  0.05% fftP           :    386 us/call x    84 calls
2020-12-17 11:30:20 P102-100-1 57885161  0.04% fftW           :    340 us/call x    80 calls
2020-12-17 11:30:20 P102-100-1 57885161  0.01% carryA         :    110 us/call x    80 calls
These cards officially have 5GB GDDR5X, but they actually have 10x8Gb chips on board, and can be BIOS flashed to 10GB. There have been some great prices on them (<$100) on ebay recently as they are approaching EOL for profitable crypto mining and they don't have video out.

Last fiddled with by Ethan (EO) on 2020-12-17 at 19:53
Ethan (EO) is offline   Reply With Quote
Old 2020-12-17, 20:12   #2631
moebius
 
moebius's Avatar
 
Jul 2009
Germany

2×3×101 Posts
Default

Quote:
Originally Posted by Ethan (EO)
;566465.\gpuowl-win.exe -device 1 -time -prp 57885161
If you also want to give me the value for -prp 77936867, your value is something for James Heinrich's list. The iteration times for the 1080 TI with this exponent are about 2473 us / it
moebius is offline   Reply With Quote
Old 2020-12-17, 20:22   #2632
Ethan (EO)
 
Ethan (EO)'s Avatar
 
"Ethan O'Connor"
Oct 2002
GIMPS since Jan 1996

22·23 Posts
Default

Quote:
Originally Posted by moebius View Post
If you also want to give me the value for -prp 77936867, your value is something for James Heinrich's list. The iteration times for the 1080 TI with this exponent are about 2473 us / it
Here you go:

Code:
2020-12-17 12:16:51 P102-100-1 77936867        10000   0.01% fc4f135f7cf4ad29 2718 us/it
2020-12-17 12:17:18 P102-100-1 77936867        20000   0.03% 3cd1bd9d5e09cbc5 2737 us/it
2020-12-17 12:17:46 P102-100-1 77936867        30000   0.04% c4e0ff35e3290d98 2736 us/it
Very close to the core count ratio (which is also essentially the memory bus width ratio):

2736/2473 = 1.11
3584/3200 = 1.12
352/320 = 1.1
Ethan (EO) is offline   Reply With Quote
Old 2020-12-17, 20:39   #2633
moebius
 
moebius's Avatar
 
Jul 2009
Germany

2×3×101 Posts
Default

Quote:
Originally Posted by Ethan (EO) View Post
Here you go:
Thx.
Here is another benchmark value for James Heinrich for the Nvidia Tesla K-80 (best value of 5 different cards) with gpuowl 6.11.380 Ubuntu
Code:
2020-12-17 20:15:57 config: -carry short -use CARRY32,ORIG_SLOWTRIG,IN_WG=128,IN_SIZEX=16,IN_SPACING=4,OUT_WG=128,OUT_SIZEX=16,OUT_SPACING=4 -nospin -block 100 -maxAlloc 10000 -B1 750000 -rB2 20 -prp 57885161 
2020-12-17 20:15:57 device 0, unique id ''
2020-12-17 20:15:57 Tesla K80-0 57885161 FFT: 3M 1K:6:256 (18.40 bpw)
2020-12-17 20:15:57 Tesla K80-0 Expected maximum carry32: 42500000
2020-12-17 20:15:58 Tesla K80-0 OpenCL args "-DEXP=57885161u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=6u -DPM1=0 -DMAX_ACCURACY=1 -DWEIGHT_STEP_MINUS_1=0x1.07673850f37p-1 -DIWEIGHT_STEP_MINUS_1=-0x1.5bd9e39e14a3dp-2 -DCARRY32=1 -DIN_SIZEX=16 -DIN_SPACING=4 -DIN_WG=128 -DORIG_SLOWTRIG=1 -DOUT_SIZEX=16 -DOUT_SPACING=4 -DOUT_WG=128  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2020-12-17 20:17:10 Note: not found 'config.txt'
2020-12-17 20:17:10 config: -carry short -use CARRY32,ORIG_SLOWTRIG,IN_WG=128,IN_SIZEX=16,IN_SPACING=4,OUT_WG=128,OUT_SIZEX=16,OUT_SPACING=4 -nospin -block 100 -maxAlloc 10000 -B1 750000 -rB2 20 -prp 57885161 
2020-12-17 20:17:10 device 0, unique id ''
2020-12-17 20:17:10 Tesla K80-0 57885161 FFT: 3M 1K:6:256 (18.40 bpw)
2020-12-17 20:17:10 Tesla K80-0 Expected maximum carry32: 42500000
2020-12-17 20:17:10 Tesla K80-0 OpenCL args "-DEXP=57885161u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=6u -DPM1=0 -DMAX_ACCURACY=1 -DWEIGHT_STEP_MINUS_1=0x1.07673850f37p-1 -DIWEIGHT_STEP_MINUS_1=-0x1.5bd9e39e14a3dp-2 -DCARRY32=1 -DIN_SIZEX=16 -DIN_SPACING=4 -DIN_WG=128 -DORIG_SLOWTRIG=1 -DOUT_SIZEX=16 -DOUT_SPACING=4 -DOUT_WG=128  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2020-12-17 20:17:13 Tesla K80-0 

2020-12-17 20:17:13 Tesla K80-0 OpenCL compilation in 2.66 s
2020-12-17 20:17:14 Tesla K80-0 57885161 OK        0 loaded: blockSize 100, 0000000000000003
2020-12-17 20:17:14 Tesla K80-0 validating proof residues for power 8
2020-12-17 20:17:14 Tesla K80-0 Proof using power 8
2020-12-17 20:17:14 Tesla K80-0 57885161 OK      200   0.00%; 2099 us/it; ETA 1d 09:45; 08e8268acbd436a3 (check 0.31s)
2020-12-17 20:24:02 Tesla K80-0 57885161 OK   200000   0.35%; 2038 us/it; ETA 1d 08:39; de62d6db1ad5092d (check 0.30s)
2020-12-17 20:30:51 Tesla K80-0 57885161 OK   400000   0.69%; 2043 us/it; ETA 1d 08:38; 45e043b36f3556e1 (check 0.31s)
2020-12-17 20:32:04 Tesla K80-0 Stopping, please wait..
2020-12-17 20:32:04 Tesla K80-0 57885161 OK   435800   0.75%; 2042 us/it; ETA 1d 08:35; 22c6f0efd61bd8b9 (check 0.31s)
2020-12-17 20:32:04 Tesla K80-0 Exiting because "stop requested"
2020-12-17 20:32:04 Tesla K80-0 Bye

Last fiddled with by moebius on 2020-12-17 at 21:26
moebius is offline   Reply With Quote
Old 2020-12-17, 21:55   #2634
moebius
 
moebius's Avatar
 
Jul 2009
Germany

2·3·101 Posts
Default

Is gpuowl able to calculate together on the two GK210 chips of the K80, or should two instances have roughly the same throughput? Each chip seems to be able to use 12 GB of memory! The above benchmarks are for one instance.
https://wccftech.com/nvidia-tesla-k8...ision-compute/

Last fiddled with by moebius on 2020-12-17 at 22:19
moebius is offline   Reply With Quote
Old 2020-12-17, 22:54   #2635
mrh
 
"mrh"
Oct 2018
Temecula, ca

738 Posts
Default

Quote:
Originally Posted by moebius View Post
If you also want to give me the value for -prp 77936867, your value is something for James Heinrich's list. The iteration times for the 1080 TI with this exponent are about 2473 us / it
Could I ask why we use 77936867 for benchmarks? Just curious, I must have been absent that day.

-mike

Last fiddled with by mrh on 2020-12-17 at 22:54
mrh is offline   Reply With Quote
Old 2020-12-17, 23:06   #2636
moebius
 
moebius's Avatar
 
Jul 2009
Germany

10010111102 Posts
Default

Quote:
Originally Posted by mrh View Post
Could I ask why we use 77936867 for benchmarks?
I adopted the exponent from another Mike https://mersenneforum.org/showpost.p...postcount=2498
This is for my own guowl benchmark list, https://mersenneforum.org/showpost.p...postcount=2603 because I think that CudaLucas and gpuowl values ​​cannot necessarily be compared. The exponent lies roughly between the DC and first-time test wavefront.
moebius is offline   Reply With Quote
Old 2020-12-18, 01:03   #2637
mrh
 
"mrh"
Oct 2018
Temecula, ca

738 Posts
Default

Quote:
Originally Posted by moebius View Post
I adopted the exponent from another Mike https://mersenneforum.org/showpost.p...postcount=2498
This is for my own guowl benchmark list, https://mersenneforum.org/showpost.p...postcount=2603 because I think that CudaLucas and gpuowl values ​​cannot necessarily be compared. The exponent lies roughly between the DC and first-time test wavefront.
Thanks! That is a very useful list. Here is a short run with a 1070ti.

Code:
2020-12-17 17:00:22 GpuOwl VERSION v7.2-16-g1a50f11-dirty
2020-12-17 17:00:22 GpuOwl VERSION v7.2-16-g1a50f11-dirty
2020-12-17 17:00:22 config: -maxAlloc 8G
2020-12-17 17:00:22 config: -prp 77936867 
2020-12-17 17:00:22 device 0, unique id ''
2020-12-17 17:00:22 GeForce GTX 1070 Ti-0 77936867 FFT: 4M 1K:8:256 (18.58 bpw)
2020-12-17 17:00:22 GeForce GTX 1070 Ti-0 77936867 OpenCL args "-DEXP=77936867u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=8u -DMM_CHAIN=1u -DMM2_CHAIN=2u -DMAX_ACCURACY=1 -DWEIGHT_STEP_MINUS_1=0.33644726404543274 -DIWEIGHT_STEP_MINUS_1=-0.25174750481886216 -DIWEIGHTS={0,-0.25174750481886216,-0.44011820345520131,-0.16213409745771243,-0.37306474779553728,-0.061788266441989627,-0.29798072935699788,-0.47471232907613115,-0.21390437908665341,-0.41180199020062258,-0.11975874301407295,-0.3413572830988989,-0.014337887291734644,-0.26247586476052853,-0.44814572555075455,-0.17414732433395128,}  -cl-std=CL2.0 -cl-finite-math-only "
2020-12-17 17:00:22 GeForce GTX 1070 Ti-0 77936867 

2020-12-17 17:00:22 GeForce GTX 1070 Ti-0 77936867 OpenCL compilation in 0.00 s
2020-12-17 17:00:22 GeForce GTX 1070 Ti-0 77936867 maxAlloc: 8.0 GB
2020-12-17 17:00:22 GeForce GTX 1070 Ti-0 77936867 P1(0) 0 bits
2020-12-17 17:00:24 GeForce GTX 1070 Ti-0 77936867 OK     72400 on-load: blockSize 400, 70200e82a481024e
2020-12-17 17:00:24 GeForce GTX 1070 Ti-0 77936867 validating proof residues for power 8
2020-12-17 17:00:24 GeForce GTX 1070 Ti-0 77936867 Proof using power 8
2020-12-17 17:00:28 GeForce GTX 1070 Ti-0 77936867 OK     73200   0.09% bcb51c9036cb04da 3438 us/it + check 1.45s + save 0.12s; ETA 3d 02:21
2020-12-17 17:00:52 GeForce GTX 1070 Ti-0 77936867        80000   0.10% 8d76071d27ee4221 3448 us/it
2020-12-17 17:01:27 GeForce GTX 1070 Ti-0 77936867        90000   0.12% 0bacff453b2f470e 3464 us/it
2020-12-17 17:02:01 GeForce GTX 1070 Ti-0 77936867       100000   0.13% 6d7296b9e2830f50 3476 us/it
2020-12-17 17:02:04 GeForce GTX 1070 Ti-0 77936867 Stopping, please wait..
2020-12-17 17:02:06 GeForce GTX 1070 Ti-0 77936867 OK    100800   0.13% 1a5ad3d1c442af96 3501 us/it + check 1.46s + save 0.11s; ETA 3d 03:42
2020-12-17 17:02:06 GeForce GTX 1070 Ti-0 Exiting because "stop requested"
2020-12-17 17:02:06 GeForce GTX 1070 Ti-0 Bye
mrh is offline   Reply With Quote
Old 2020-12-18, 20:25   #2638
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

201F16 Posts
Default

5600 XT
Code:
2020-12-18 13:37:03 gfx1010-0 OpenCL compilation in 2.58 s
2020-12-18 13:37:04 gfx1010-0 77936867 OK        0 loaded: blockSize 400, 0000000000000003
2020-12-18 13:37:06 gfx1010-0 77936867 OK      800   0.00%; 1928 us/it; ETA 1d 17:44; 1579c241dc63eca6 (check 0.82s)
2020-12-18 13:43:38 gfx1010-0 77936867 OK   200000   0.26%; 1962 us/it; ETA 1d 18:22; f0b04b45b0855bd2 (check 0.83s)
2020-12-18 13:50:11 gfx1010-0 77936867 OK   400000   0.51%; 1961 us/it; ETA 1d 18:14; c03f94396a5aa29e (check 0.82s)
2020-12-18 13:56:45 gfx1010-0 77936867 OK   600000   0.77%; 1964 us/it; ETA 1d 18:11; b9decd65ca71b629 (check 0.82s)
2020-12-18 14:03:18 gfx1010-0 77936867 OK   800000   1.03%; 1964 us/it; ETA 1d 18:05; 21ebf3636148f663 (check 0.82s)
2020-12-18 14:09:52 gfx1010-0 77936867 OK  1000000   1.28%; 1964 us/it; ETA 1d 17:59; 9bf9d9e6bff4286e (check 0.83s)
Attached Thumbnails
Click image for larger version

Name:	1.PNG
Views:	56
Size:	43.6 KB
ID:	23986   Click image for larger version

Name:	2.PNG
Views:	55
Size:	30.1 KB
ID:	23987  
Xyzzy is offline   Reply With Quote
Old 2020-12-19, 00:28   #2639
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

100000000111112 Posts
Default

RX 560
Code:
2020-12-18 18:13:48 Baffin-0 OpenCL compilation in 1.74 s
2020-12-18 18:13:59 Baffin-0 77936867 OK      800   0.00%; 6983 us/it; ETA 6d 07:10; 1579c241dc63eca6 (check 2.79s)
2020-12-18 18:15:05 Baffin-0 77936867 OK    10000   0.01%; 6833 us/it; ETA 6d 03:55; fc4f135f7cf4ad29 (check 2.78s)
2020-12-18 18:16:16 Baffin-0 77936867 OK    20000   0.03%; 6832 us/it; ETA 6d 03:52; 3cd1bd9d5e09cbc5 (check 2.78s)
2020-12-18 18:17:27 Baffin-0 77936867 OK    30000   0.04%; 6832 us/it; ETA 6d 03:51; c4e0ff35e3290d98 (check 2.77s)
2020-12-18 18:18:38 Baffin-0 77936867 OK    40000   0.05%; 6831 us/it; ETA 6d 03:49; dffe1b1b0d748128 (check 2.78s)
2020-12-18 18:19:49 Baffin-0 77936867 OK    50000   0.06%; 6831 us/it; ETA 6d 03:48; 52e286945371ed29 (check 2.78s)
2020-12-18 18:21:01 Baffin-0 77936867 OK    60000   0.08%; 6832 us/it; ETA 6d 03:47; 0945da4dc08bdd95 (check 2.78s)
Attached Thumbnails
Click image for larger version

Name:	1.PNG
Views:	53
Size:	44.5 KB
ID:	23989   Click image for larger version

Name:	2.PNG
Views:	52
Size:	25.6 KB
ID:	23990  
Xyzzy is offline   Reply With Quote
Old 2020-12-19, 02:44   #2640
moebius
 
moebius's Avatar
 
Jul 2009
Germany

2×3×101 Posts
Default

Nvidia Tesla T4
Code:
2020-12-19 02:22:54 config: -carry short -use CARRY32,ORIG_SLOWTRIG,IN_WG=128,IN_SIZEX=16,IN_SPACING=4,OUT_WG=128,OUT_SIZEX=16,OUT_SPACING=4 -nospin -block 100 -maxAlloc 10000 -B1 750000 -rB2 20 -prp 57885161 
2020-12-19 02:22:54 device 0, unique id ''
2020-12-19 02:22:54 Tesla T4-0 57885161 FFT: 3M 1K:6:256 (18.40 bpw)
2020-12-19 02:22:54 Tesla T4-0 Expected maximum carry32: 42500000
2020-12-19 02:22:54 Tesla T4-0 OpenCL args "-DEXP=57885161u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=6u -DPM1=0 -DMAX_ACCURACY=1 -DWEIGHT_STEP_MINUS_1=0x1.07673850f37p-1 -DIWEIGHT_STEP_MINUS_1=-0x1.5bd9e39e14a3dp-2 -DCARRY32=1 -DIN_SIZEX=16 -DIN_SPACING=4 -DIN_WG=128 -DORIG_SLOWTRIG=1 -DOUT_SIZEX=16 -DOUT_SPACING=4 -DOUT_WG=128  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2020-12-19 02:22:54 Tesla T4-0 

2020-12-19 02:22:54 Tesla T4-0 OpenCL compilation in 0.01 s
2020-12-19 02:22:55 Tesla T4-0 57885161 OK   106400 loaded: blockSize 100, 4df015b749d81753
2020-12-19 02:22:55 Tesla T4-0 validating proof residues for power 8
2020-12-19 02:22:55 Tesla T4-0 Proof using power 8
2020-12-19 02:22:56 Tesla T4-0 57885161 OK   106600   0.18%; 3074 us/it; ETA 2d 01:20; 09d009921e54293a (check 0.42s)
2020-12-19 02:27:59 Tesla T4-0 57885161 OK   200000   0.35%; 3243 us/it; ETA 2d 03:58; de62d6db1ad5092d (check 0.41s)
2020-12-19 02:39:03 Tesla T4-0 57885161 OK   400000   0.69%; 3319 us/it; ETA 2d 04:59; 45e043b36f3556e1 (check 0.41s)
2020-12-19 02:39:55 Tesla T4-0 Stopping, please wait..
2020-12-19 02:39:56 Tesla T4-0 57885161 OK   415600   0.72%; 3318 us/it; ETA 2d 04:59; 82ffeed94bb310b9 (check 0.42s)
2020-12-19 02:39:56 Tesla T4-0 Exiting because "stop requested"
2020-12-19 02:39:56 Tesla T4-0 Bye
moebius is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 18:20.


Fri Jul 16 18:20:31 UTC 2021 up 49 days, 16:07, 1 user, load averages: 2.73, 2.59, 2.21

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.