mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2019-08-04, 21:07   #276
Lorenzo
 
Lorenzo's Avatar
 
Aug 2010
Republic of Belarus

2×89 Posts
Default

CPU Intel Core i3-8100
MB Gigabyte Z370P D3 (rev. 1.0)

Memory HyperX Fury 2x4GB DDR4 PC4-23400 HX429C17FBK2/8 (CL 17, running at default 2993Mhz, 1 rank)
OS Oracle Linux 7.6



Code:
[Work thread Aug 5 00:00] Using FMA3 FFT length 18M, Pass1=1536, Pass2=12K, clm=1, 4 threads
[Work thread Aug 5 00:00] p: 332220523.  Time: 26.325 ms.
[Work thread Aug 5 00:00] p: 332220523.  Time: 27.615 ms.
[Work thread Aug 5 00:00] p: 332220523.  Time: 27.743 ms.
[Work thread Aug 5 00:00] p: 332220523.  Time: 27.452 ms.
[Work thread Aug 5 00:00] p: 332220523.  Time: 28.561 ms.
[Work thread Aug 5 00:00] p: 332220523.  Time: 28.141 ms.
[Work thread Aug 5 00:00] p: 332220523.  Time: 27.199 ms.
[Work thread Aug 5 00:00] p: 332220523.  Time: 28.502 ms.
[Work thread Aug 5 00:00] p: 332220523.  Time: 27.887 ms.
[Work thread Aug 5 00:00] p: 332220523.  Time: 27.301 ms.
[Work thread Aug 5 00:00] Iterations: 10.  Total time: 0.277 sec.
[Work thread Aug 5 00:00] Estimated time to complete this exponent: 106 days, 9 hours, 43 minutes.

Code:
[Work thread Aug 5 00:01] Timing FFTs using 4 cores.
[Work thread Aug 5 00:01] Timing 25 iterations of 18432K FFT length.  Best time: 26.265 ms., avg time: 26.328 ms.
Lorenzo is offline   Reply With Quote
Old 2020-10-13, 19:44   #277
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

769 Posts
Default Ryzen 4700u, 2x16GB 3200 CL22 DR

Mobile APU, 8C8T with 2x4MiB L3 cache and Vega iGPU with 7 cores at 1600MHz

8 core 1 worker CPU test, iGPU idle:
Code:
[Work thread Oct 13 18:48] Starting primality test of M332220523 using FMA3 FFT length 18M, Pass1=1536, Pass2=12K, clm=2, 8 threads
[Work thread Oct 13 18:48] Iteration: 1000 / 332220523 [0.00%], ms/iter: 26.722, ETA: 102d 17:57
[Work thread Oct 13 18:49] Iteration: 2000 / 332220523 [0.00%], ms/iter: 26.736, ETA: 102d 19:16
[Work thread Oct 13 18:49] Iteration: 3000 / 332220523 [0.00%], ms/iter: 26.742, ETA: 102d 19:49
[Work thread Oct 13 18:50] Iteration: 4000 / 332220523 [0.00%], ms/iter: 26.727, ETA: 102d 18:28
[Work thread Oct 13 18:50] Iteration: 5000 / 332220523 [0.00%], ms/iter: 26.729, ETA: 102d 18:38
iGPU test, CPU idle
Code:
2020-10-13 19:03:16 gfx900-0 332220523 FFT: 18M 1K:9:1K (17.60 bpw)
2020-10-13 19:03:16 gfx900-0 Expected maximum carry32: 678B0000
2020-10-13 19:03:17 gfx900-0 OpenCL args "-DEXP=332220523u -DWIDTH=1024u -DSMALL_HEIGHT=1024u -DMIDDLE=9u -DPM1=0 -DAMDGPU=1 -DWEIGHT_STEP_MINUS_1=0x1.459b50988a95ep-2 -DIWEIGHT_STEP_MINUS_1=-0x1.ee19f3613f2aap-3  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2020-10-13 19:03:20 gfx900-0 OpenCL compilation in 2.99 s
2020-10-13 19:07:59 gfx900-0 332220523 OK        0 loaded: blockSize 400, 0000000000000003
2020-10-13 19:07:59 gfx900-0 validating proof residues for power 8
2020-10-13 19:07:59 gfx900-0 Proof using power 8
2020-10-13 19:10:35 gfx900-0 332220523 OK      800   0.00%; 58117 us/it; ETA 223d 11:16; b950798999630b08 (check 108.54s)
2020-10-13 20:05:58 gfx900-0 Stopping, please wait..
2020-10-13 20:08:00 gfx900-0 332220523 OK    54400   0.02%; 62445 us/it; ETA 240d 01:43; 02b31aa308f45426 (check 97.95s)
gpuowl thermal throttles the iGPU to ~1500MHz, mfakto doesn't thermal throttle.
M344587487 is offline   Reply With Quote
Old 2020-12-28, 23:11   #278
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

30116 Posts
Default RX 6900XT

Some lovely person installed OpenCL with ROCm 4.0 and did some big navi tests with a stock Asrock Phantom Gaming 6900XT, Ubuntu 20.04 kernel 5.4.
Code:
“[332220523, ]”
 2020-12-28 16:57:10 GpuOwl VERSION v7.2-21-g28dbf88
2020-12-28 16:57:10 GpuOwl VERSION v7.2-21-g28dbf88
2020-12-28 16:57:10 Note: not found 'config.txt'
2020-12-28 16:57:10 config: -prp 332220523 -iters 50000 
2020-12-28 16:57:10 device 0, unique id ''
2020-12-28 16:57:10 gfx1030-0 332220523 FFT: 18M 1K:9:1K (17.60 bpw)
2020-12-28 16:57:11 gfx1030-0 332220523 OpenCL args "-DEXP=332220523u -DWIDTH=1024u -DSMALL_HEIGHT=1024u -DMIDDLE=9u -DAMDGPU=1 -DWEIGHT_STEP_MINUS_1=0.31797529154814252 -DIWEIGHT_STEP_MINUS_1=-0.24126043453715812 -DIWEIGHTS={0,-0.24126043453715812,-0.42431427180125786,-0.12640892148665336,-0.337171884696568,-0.49708608381811953,-0.23683862754188784,-0.42095927188310595,-0.12131777912660047,-0.33330903355459196,-0.49415518582120899,-0.23239105099670423,-0.41758471958785059,-0.11619696644233313,-0.32942367036371434,-0.49120720704209719,}  -cl-std=CL2.0 -cl-finite-math-only "
2020-12-28 16:57:14 gfx1030-0 332220523 OpenCL compilation in 2.89 s
2020-12-28 16:57:14 gfx1030-0 332220523 maxAlloc: 0.0 GB
2020-12-28 16:57:14 gfx1030-0 332220523 You should use -maxAlloc if your GPU has more than 4GB memory. See help '-h'
2020-12-28 16:57:14 gfx1030-0 332220523 P1(0) 0 bits
2020-12-28 16:57:14 gfx1030-0 332220523 PRP starting from beginning
2020-12-28 16:57:16 gfx1030-0 332220523 OK         0 on-load: blockSize 400, 0000000000000003
2020-12-28 16:57:16 gfx1030-0 332220523 validating proof residues for power 8
2020-12-28 16:57:16 gfx1030-0 332220523 Proof using power 8
2020-12-28 16:57:21 gfx1030-0 332220523 OK       800   0.00% b950798999630b08 3954 us/it + check 1.92s + save 0.49s; ETA 15d 04:53
2020-12-28 16:57:58 gfx1030-0 332220523        10000   0.00% 503cd91d7b8e30e5 3969 us/it
2020-12-28 16:58:38 gfx1030-0 332220523        20000   0.01% f2d3ffbb3586c527 3978 us/it
2020-12-28 16:59:18 gfx1030-0 332220523        30000   0.01% e7846100baf7ce53 3977 us/it
2020-12-28 16:59:57 gfx1030-0 332220523        40000   0.01% e305c82567149969 3969 us/it
2020-12-28 17:00:37 gfx1030-0 332220523 Stopping, please wait..
2020-12-28 17:00:39 gfx1030-0 332220523 OK     50000   0.02% 72885d5ee0a11128 3974 us/it + check 1.90s + save 0.50s; ETA 15d 06:39
2020-12-28 17:00:39 gfx1030-0 Exiting because "stop requested"
2020-12-28 17:00:39 gfx1030-0 Bye

[332220523, ]
2020-12-28 17:24:57 GpuOwl VERSION v7.2-21-g28dbf88
2020-12-28 17:24:57 GpuOwl VERSION v7.2-21-g28dbf88
2020-12-28 17:24:57 Note: not found 'config.txt'
2020-12-28 17:24:57 config: -prp 332220523 -iters 50000 
2020-12-28 17:24:57 device 0, unique id ''
2020-12-28 17:24:57 gfx1030-0 332220523 FFT: 18M 1K:9:1K (17.60 bpw)
2020-12-28 17:24:57 gfx1030-0 332220523 OpenCL args "-DEXP=332220523u -DWIDTH=1024u -DSMALL_HEIGHT=1024u -DMIDDLE=9u -DAMDGPU=1 -DWEIGHT_STEP_MINUS_1=0.31797529154814252 -DIWEIGHT_STEP_MINUS_1=-0.24126043453715812 -DIWEIGHTS={0,-0.24126043453715812,-0.42431427180125786,-0.12640892148665336,-0.337171884696568,-0.49708608381811953,-0.23683862754188784,-0.42095927188310595,-0.12131777912660047,-0.33330903355459196,-0.49415518582120899,-0.23239105099670423,-0.41758471958785059,-0.11619696644233313,-0.32942367036371434,-0.49120720704209719,}  -cl-std=CL2.0 -cl-finite-math-only "
2020-12-28 17:25:00 gfx1030-0 332220523 OpenCL compilation in 2.78 s
2020-12-28 17:25:00 gfx1030-0 332220523 maxAlloc: 0.0 GB
2020-12-28 17:25:00 gfx1030-0 332220523 You should use -maxAlloc if your GPU has more than 4GB memory. See help '-h'
2020-12-28 17:25:00 gfx1030-0 332220523 P1(0) 0 bits
2020-12-28 17:25:02 gfx1030-0 332220523 OK     50000 on-load: blockSize 400, 72885d5ee0a11128
2020-12-28 17:25:02 gfx1030-0 332220523 validating proof residues for power 8
2020-12-28 17:25:02 gfx1030-0 332220523 Proof using power 8
2020-12-28 17:25:08 gfx1030-0 332220523 OK     50800   0.02% fa7748403e931351 3973 us/it + check 1.89s + save 0.48s; ETA 15d 06:33
2020-12-28 17:25:44 gfx1030-0 332220523        60000   0.02% 13dc37f49383ffae 3958 us/it
2020-12-28 17:26:24 gfx1030-0 332220523        70000   0.02% 0b1cf415e07e7046 3965 us/it
2020-12-28 17:27:03 gfx1030-0 332220523        80000   0.02% 36bac23d7b4324d4 3953 us/it
2020-12-28 17:27:43 gfx1030-0 332220523        90000   0.03% aef93d7157c0804e 3954 us/it
2020-12-28 17:28:23 gfx1030-0 332220523 Stopping, please wait..
2020-12-28 17:28:25 gfx1030-0 332220523 OK    100000   0.03% 951c94f813216db9 3969 us/it + check 1.90s + save 0.51s; ETA 15d 06:12
2020-12-28 17:28:25 gfx1030-0 Exiting because "stop requested"
2020-12-28 17:28:25 gfx1030-0 Bye
Can someone redo this test with a Radeon VII please? A lot of work has gone into gpuowl recently so my 18 month old result should no longer be a good point of comparison. At first blush big navi is not as quick as an R7, which is no surprise, but it's not a million miles off either.
M344587487 is offline   Reply With Quote
Old 2020-12-29, 00:31   #279
Viliam Furik
 
"Viliam Furík"
Jul 2018
Martin, Slovakia

24·52 Posts
Default

Radeon VII, core clock 1600 MHz, memory clock 1100 MHz, using gpuOwl v6.11-380, because for some reason the 7.x is running slower, and I didn't look into it yet.

Code:
2020-12-29 01:25:05 gpuowl v6.11-380-g79ea0cc
2020-12-29 01:25:05 config: -device 1
2020-12-29 01:25:05 config: -proof 8
2020-12-29 01:25:05 config: -nospin
2020-12-29 01:25:05 config: -jacobi 500000
2020-12-29 01:25:05 config: -maxAlloc 12288
2020-12-29 01:25:05 config: -prp 332220523 -iters 50000
2020-12-29 01:25:05 device 1, unique id ''
2020-12-29 01:25:05 gfx906-1 332220523 FFT: 18M 1K:9:1K (17.60 bpw)
2020-12-29 01:25:05 gfx906-1 Expected maximum carry32: 678B0000
2020-12-29 01:25:07 gfx906-1 OpenCL args "-DEXP=332220523u -DWIDTH=1024u -DSMALL_HEIGHT=1024u -DMIDDLE=9u -DPM1=0 -DAMDGPU=1 -DWEIGHT_STEP_MINUS_1=0xa.2cda84c454afp-5 -DIWEIGHT_STEP_MINUS_1=-0xf.70cf9b09f955p-6  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2020-12-29 01:25:07 gfx906-1 ASM compilation failed, retrying compilation using NO_ASM
2020-12-29 01:25:13 gfx906-1 OpenCL compilation in 6.25 s
2020-12-29 01:25:15 gfx906-1 332220523 OK        0 loaded: blockSize 400, 0000000000000003
2020-12-29 01:25:15 gfx906-1 validating proof residues for power 8
2020-12-29 01:25:15 gfx906-1 Proof using power 8
2020-12-29 01:25:20 gfx906-1 332220523 OK      800   0.00%; 3714 us/it; ETA 14d 06:45; b950798999630b08 (check 1.78s)
2020-12-29 01:28:21 gfx906-1 Stopping, please wait..
2020-12-29 01:28:24 gfx906-1 332220523 OK    50000   0.02%; 3710 us/it; ETA 14d 06:19; 72885d5ee0a11128 (check 1.80s)
2020-12-29 01:28:24 gfx906-1 Exiting because "stop requested"
2020-12-29 01:28:24 gfx906-1 Bye
Viliam Furik is online now   Reply With Quote
Old 2020-12-29, 01:03   #280
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

2·7·257 Posts
Default

Asus R7 defaults:

Code:
2020-12-29 00:57:04 GpuOwl VERSION v7.2-16-g1a50f11-dirty
2020-12-29 00:57:04 GpuOwl VERSION v7.2-16-g1a50f11-dirty
2020-12-29 00:57:04 Note: not found 'config.txt'
2020-12-29 00:57:04 config: -user paul -prp 332220523 
2020-12-29 00:57:04 device 0, unique id 'f582388172fd5d41'
2020-12-29 00:57:04 f582388172fd5d41 332220523 FFT: 18M 1K:9:1K (17.60 bpw)
2020-12-29 00:57:05 f582388172fd5d41 332220523 OpenCL args "-DEXP=332220523u -DWIDTH=1024u -DSMALL_HEIGHT=1024u -DMIDDLE=9u -DAMDGPU=1 -DWEIGHT_STEP_MINUS_1=0.31797529154814252 -DIWEIGHT_STEP_MINUS_1=-0.24126043453715812 -DIWEIGHTS={0,-0.24126043453715812,-0.42431427180125786,-0.12640892148665336,-0.337171884696568,-0.49708608381811953,-0.23683862754188784,-0.42095927188310595,-0.12131777912660047,-0.33330903355459196,-0.49415518582120899,-0.23239105099670423,-0.41758471958785059,-0.11619696644233313,-0.32942367036371434,-0.49120720704209719,}  -cl-std=CL2.0 -cl-finite-math-only "
2020-12-29 00:57:09 f582388172fd5d41 332220523 OpenCL compilation in 3.71 s
2020-12-29 00:57:09 f582388172fd5d41 332220523 maxAlloc: 0.0 GB
2020-12-29 00:57:09 f582388172fd5d41 332220523 You should use -maxAlloc if your GPU has more than 4GB memory. See help '-h'
2020-12-29 00:57:09 f582388172fd5d41 332220523 P1(0) 0 bits
2020-12-29 00:57:09 f582388172fd5d41 332220523 PRP starting from beginning
2020-12-29 00:57:10 f582388172fd5d41 332220523 OK         0 on-load: blockSize 400, 0000000000000003
2020-12-29 00:57:10 f582388172fd5d41 332220523 validating proof residues for power 8
2020-12-29 00:57:10 f582388172fd5d41 332220523 Proof using power 8
2020-12-29 00:57:15 f582388172fd5d41 332220523 OK       800   0.00% b950798999630b08 2630 us/it + check 1.44s + save 0.88s; ETA 10d 02:41
2020-12-29 00:57:39 f582388172fd5d41 332220523        10000   0.00% 503cd91d7b8e30e5 2635 us/it
2020-12-29 00:58:05 f582388172fd5d41 332220523        20000   0.01% f2d3ffbb3586c527 2641 us/it
2020-12-29 00:58:11 f582388172fd5d41 332220523 Stopping, please wait..
2020-12-29 00:58:13 f582388172fd5d41 332220523 OK     22000   0.01% 676b1123c911c231 2643 us/it + check 1.46s + save 0.87s; ETA 10d 03:53
2020-12-29 00:58:13 f582388172fd5d41 Exiting because "stop requested"
2020-12-29 00:58:13 f582388172fd5d41 Bye
paulunderwood is offline   Reply With Quote
Old 2020-12-29, 01:09   #281
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

769 Posts
Default

Quote:
Originally Posted by Viliam Furik View Post
Radeon VII, core clock 1600 MHz, memory clock 1100 MHz, using gpuOwl v6.11-380, because for some reason the 7.x is running slower, and I didn't look into it yet.

Interesting thank you. My old tuned result is slightly quicker thanks to the 1200 memory overclock but I thought as gpuowl has been worked on a lot the timings would have been optimised more by now. Probably all optimisation has gone into the wavefront FFTs as it should. From the forum v7 seems a little slower in general but I haven't confirmed that first hand.

As you've validated that the old data is reasonable, and if big navi optimises similarly to R7 (unlikely but for now roll with it), a tuned 6900XT [I]might[/I] have estimated tuned timings of ~4300 us/it, ~86% throughput compared to your tuned R7.


Quote:
Originally Posted by paulunderwood View Post
Asus R7 defaults:
That is a massive difference between your benchmarks. Is it because Viliam's ASM compile failed and defaulted to NO_ASM?

Last fiddled with by M344587487 on 2020-12-29 at 01:19 Reason: edit
M344587487 is offline   Reply With Quote
Old 2020-12-29, 01:21   #282
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

3·5·491 Posts
Default

Quote:
Originally Posted by M344587487 View Post
That is a massive difference between your benchmarks. Is it because Viliam's ASM compile failed and defaulted to NO_ASM?
Failed ASM implies Windows. AMD's Windows OpenCL support for the Radeon VII is awful.
Prime95 is online now   Reply With Quote
Old 2020-12-29, 01:43   #283
moebius
 
moebius's Avatar
 
Jul 2009
Germany

10001000112 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Failed ASM implies Windows. AMD's Windows OpenCL support for the Radeon VII is awful.
He can try out my gpuowl 6-11.380 binary for Ubuntu 18.04 if he wants. It is well tested and compiled at google colab without errors.
So far, all PRP and LL tests I've done have been correct https://mersenneforum.org/showpost.p...1&postcount=40
moebius is offline   Reply With Quote
Old 2020-12-29, 02:50   #284
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

769 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Failed ASM implies Windows. AMD's Windows OpenCL support for the Radeon VII is awful.
Awesome, so gpuowl has been greatly optimised in the past year as I thought. It does yield an interesting data point, the 6900XT has ~84% the throughput at 79M but only ~66% the throughput at 332M. I guess that means some of the following could be true:
  • The test at 79M is not memory bound thanks to the cache, or perhaps not as memory bound as it should be
  • There may be some lower exponent range that fits fully in cache and absolutely flies similar to what we see with CPUs
  • The larger memory requirement of 332M causes the cache to be overwhelmed to the point that it's much less useful
  • In scenarios where cache is king, lowering memory clock for power savings might be a counter-intuitive yet functional optimisation
  • Are cache and memory clocks in sync such that a memory OC is a cache OC? Or is cache static? Or does it have its own clock lever?
  • I'm an idiot trying to think at 3am and failing
M344587487 is offline   Reply With Quote
Old 2020-12-29, 03:54   #285
moebius
 
moebius's Avatar
 
Jul 2009
Germany

547 Posts
Default

Quote:
Originally Posted by M344587487 View Post
Awesome, so gpuowl has been greatly optimised in the past year as I thought. It does yield an interesting data point, the 6900XT has ~84% the throughput at 79M but only ~66% the throughput at 332M.
The range >= 332 to 334M is currently only slightly processed. More than 5-10 LL / PRP or Factor Found results per day are not sent to the server in this range. The Big Navi generation can certainly provide good services at the PRP Wavefront. For the really big chunks, the computing power has to increase considerably.
moebius is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Perpetual benchmark thread... Xyzzy Hardware 847 2021-01-28 15:39
Sieve Benchmark Thread Historian Twin Prime Search 105 2013-02-05 01:35
LLR benchmark thread Oddball Riesel Prime Search 5 2010-08-02 00:11
sr5sieve Benchmark thread axn Sierpinski/Riesel Base 5 25 2010-05-28 23:57
Old Hardware Thread E_tron Hardware 0 2004-06-18 03:32

All times are UTC. The time now is 07:58.

Tue Mar 9 07:58:02 UTC 2021 up 96 days, 4:09, 0 users, load averages: 2.19, 1.98, 1.85

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.