mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfakto: an OpenCL program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=15646)

kracker 2020-10-09 23:35

My guess would be drivers. (mfakto on my i5-4670k also uses a full cpu core, while on a i3-8100 it uses almost no resources)

James Heinrich 2020-10-11 19:20

[QUOTE=kracker;559378]mfakto on my ... i3-8100[/QUOTE]Thanks for posting that -- my "other" computer has an i3-8100 and I didn't know it could run mfakto. :redface:
Free extra 21Ghd/d, thanks! :smile:

Viliam Furik 2020-10-13 09:51

Radeon VII - poor performance against promised numbers
 
It was probably discussed earlier, but can anyone tell me, why can't I squeeze more than 1300 GHz-D/D (2.6 TFLOPS) from Radeon VII, despite it promising 13.8 TFLOPS (about 10 % more than RTX 2080Ti) of SP throughput?

Also in gpuOwl, only about 0.75 TFLOPS despite AMD promising 3.46 TFLOPS.

It seems to always be a fifth of what it should be according to the [URL="https://www.amd.com/en/products/graphics/amd-radeon-vii"]official AMD page[/URL].

James Heinrich 2020-10-13 10:42

[QUOTE=Viliam Furik;559728]It was probably discussed earlier, but can anyone tell me, why can't I squeeze more than 1300 GHz-D/D (2.6 TFLOPS) from Radeon VII, despite it promising 13.8 TFLOPS (about 10 % more than RTX 2080Ti) of SP throughput?[/QUOTE]I'm not sure how you're calculating FLOPS performance in mfakto/gpuowl?

The mfakto performance you quote seems to be in line with expected [url=https://www.mersenne.ca/mfaktc.php?filter=Radeon%20VII|RTX%202080%20Ti]mfakto performance[/url] (noting that my chart shows stock-clock performance). Also note AMD's somewhat deceptive practice of quoting "peak performance" numbers, which mean performance at 1750 boost instead of 1400 stock. The numbers on my pages are all stock-clock values.

kriesel 2020-10-13 12:58

[QUOTE=Viliam Furik;559728]It was probably discussed earlier, but can anyone tell me, why can't I squeeze more than 1300 GHz-D/D (2.6 TFLOPS) from Radeon VII, despite it promising 13.8 TFLOPS (about 10 % more than RTX 2080Ti) of SP throughput?

Also in gpuOwl, only about 0.75 TFLOPS despite AMD promising 3.46 TFLOPS.

It seems to always be a fifth of what it should be according to the [URL="https://www.amd.com/en/products/graphics/amd-radeon-vii"]official AMD page[/URL].[/QUOTE]If you're using mfakto as the measure of sp performance, note that mfakto & mfaktc use int32, not fp32.

Also the AMD spec sheet says "peak". Gpuowl is memory bandwidth constrained per Mihai. Which would make sustained rate difficult to maintain at peak rate.

[URL]https://www.amd.com/en/products/graphics/amd-radeon-vii[/URL]

[URL]https://www.techpowerup.com/gpu-specs/radeon-vii.c3358[/URL]

LaurV 2020-10-13 14:19

[QUOTE=kriesel;559734]If you're using mfakto as the measure of sp performance, note that mfakto & mfaktc use int32, not fp32[/QUOTE]
This!
Radeon VII has a poor integer performance.

axn 2020-10-13 14:21

[QUOTE=James Heinrich;559730]I'm not sure how you're calculating FLOPS performance in mfakto/gpuowl?[/QUOTE]
He's using standard Primenet conversion 1Gd/d = 2GFLOPS

kriesel 2020-10-27 01:23

lsgpu
 
A small utility to list the OpenCL platforms and devices on them and a bit of description.
See [URL="https://www.mersenneforum.org/showpost.php?p=488474&postcount=6"]https://www.mersenneforum.org/showpo...74&postcount=6[/URL]

Could help the occasional mfakto installation startup.

DrobinsonPE 2020-12-06 16:08

After switching to Windows 10, I got mfakto to run on my ASRock Deskmini A300W, AMD A8-9600, 16GB DDR-4, SSD.

[CODE]Selftest statistics
number of tests 34026
successful tests 34026

selftest PASSED!

mfakto 0.15pre7-MGW (64bit build)
OpenCL device info
name Bristol Ridge (Advanced Micro Devices, Inc.)
device (driver) version OpenCL 2.0 AMD-APP (2841.19) (2841.19)
maximum threads per block 1024
maximum threads per grid 1073741824
number of multiprocessors 6 (384 compute elements)
clock rate 900 MHz

Automatic parameters
threads per grid 0
optimizing kernels for GCN

Loading binary kernel file mfakto_Kernels.elf
Compiling kernels.
GPUSievePrimes (adjusted) 81206
GPUsieve minimum exponent 1037054
Started a simple selftest ...
Selftest statistics
number of tests 30
successful tests 30

selftest PASSED!

got assignment: exp=212335483 bit_min=73 bit_max=74 (9.01 GHz-days)
Starting trial factoring M212335483 from 2^73 to 2^74 (9.01 GHz-days)
Using GPU kernel "cl_barrett15_74_gs_2"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Dec 05 23:13 | 4617 100.0% | 11.685 0m00s | 69.39 81206 0.00%
no factor for M212335483 from 2^73 to 2^74 [mfakto 0.15pre7-MGW cl_barrett15_74_gs_2]
tf(): total time spent: 3h 6m 46.416s (69.46 GHz-days / day)
[/CODE]

Next up is a little tuning to see if it can get above 70 GHz-days / day.

DrobinsonPE 2020-12-07 09:01

Here is the results of mfakto tuning on the ASRock Deskmini A300W, AMD A8-9600, Radeon R7 IGPU, 16GB DDR-4, SSD, Windows 10, mfakto 0.15pre7, Prime95 v30.3 b6 system.

[CODE]mfakto tuning.
AMD A8-9600, Radeon R7 IGPU, 16GB ram, SSD, Windows 10, mfakto 0.15pre7
exp=103122301 bit 73 to 74
Initial settings and speed.
GPUSieveProcessSize=24, GPUSieveSize=96, GPUSievePrimes=81157, 67.97GHz-day
Final settings and speed.
GPUSieveProcessSize=32, GPUSieveSize=128, GPUSievePrimes=179766, 70.39GHz-day
Speed with prime95 also running a P-1 stage 2, 70.32GHz-day

Step 1, vary GPUSieveProcessSize
Possible values: 8, 16, 24, 32
# Also must divide GPUSieveSize * 1024
# Default: GPUSieveProcessSize=24

GPUSieveProcessSize=8 64.93GHz-day
GPUSieveProcessSize=16 67.10GHz-day
GPUSieveProcessSize=24 67.97GHz-day
GPUSieveProcessSize=32 68.68GHz-day *

Step 2: vary GPUSieveSize with GPUSieveProcessSize=32
# Minimum: GPUSieveSize=4
# Maximum: GPUSieveSize=128
# Default: GPUSieveSize=96
GPUSieveSize=32 68.36GHz-day
GPUSieveSize=64 68.54GHz-day
GPUSieveSize=96 68.68GHz-day
GPUSieveSize=128 68.70GHz-day *

Step 3: vary GPUSievePrimes with GPUSieveSize=128, GPUSieveProcessSize=32
# Minimum: GPUSievePrimes=54
# Maximum: GPUSievePrimes=1075766
# Default: GPUSievePrimes=81157
GPUSievePrimes=21814 62.88GHz-day
GPUSievePrimes=67894 68.35GHz-day
GPUSievePrimes=81157 68.70GHz-day
GPUSievePrimes=99894 69.10GHz-day
GPUSievePrimes=120374 69.52GHz-day
GPUSievePrimes=139830 69.84GHz-day
GPUSievePrimes=160310 70.14GHz-day
GPUSievePrimes=179766 70.39GHz-day*
GPUSievePrimes=200246 70.22GHz-day
[/CODE]

Dylan14 2020-12-10 06:07

I have created a PKGBUILD for this software for use on Arch. You can find it here: [url]https://aur.archlinux.org/packages/mfakto/[/url]


All times are UTC. The time now is 14:55.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.