mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfakto: an OpenCL program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=15646)

kriesel 2018-01-17 16:05

Unknown gpu name
 
Hi, just getting started with a foray into OpenCL GPU computing.

mfakto 0.15pre6-Win (64bit build)
with GPUType=AUTO, for an MSI Radeon RX550, issued the message,

[CODE]Select device - Get device info:
WARNING: Unknown GPU name, assuming GCN. Please post the device name "gfx804 (Advanced Micro Devices, Inc.)" to http://www.mersenneforum.org/showthread.php?t=15646 to have it added to mfakto. Set GPUType in mfakto.ini to select a GPU type yourself to avoid this warning.[/CODE]

So there you have it.

It's not clear to me what GPUType I should put in the ini file directive, after reading that and the ini file. Recommendations?

Stampeder 2018-01-17 16:16

[QUOTE=Bdot;477288]But as mentioned, for this GPU I'd recommend CPU-Sieving anyway.[/QUOTE]

Hi Bdot!

Thanks for shedding some light on my situation! I eventually came to the same conclusion as your recommendation, and I've been able to get it running up to about 31-33 GHz-day rate after adjusting some of the GPU values in the mkfato ini file (from 26-28).

I'm happy with what I have for now. Cheers! :smile:

kriesel 2018-01-17 16:46

GPU underutilized by Mfakto according to GPU-Z
 
On an MSI Radeon Rx550, in early usage, Mfakto does not fully occupy the GPU, per GPU loading indications of GPU-Z 2.5.0; GPU loading bounces from near 100% to 0, and averages ~80% for typical exponents in the 70-72 bit range, improving to an average of about 96% at 75-bit. This is with GPU sieving on, and mostly default ini file settings. Ghz-d/day values given in the output are around 112-130. Is it normal to have to run multiple instances even with GPU sieving enabled for full throughput at such bit depths?

If I run clLucas at the same time, Mfakto throughput slows down of course, and GPU loading goes up but not near 99% like I'm accustomed to seeing with CUDA applications on NVIDIA, including for single instances per GPU.

James Heinrich 2018-01-17 17:03

Perhaps just as an experiment, perhaps see how mfakto behaves in lower bit depths and/or shorter-running exponents (e.g. Factor=1008717761,67,69)? I have no problem running my RX480 to 100% usage.

kriesel 2018-01-18 01:32

1 Attachment(s)
[QUOTE=James Heinrich;477753]Perhaps just as an experiment, perhaps see how mfakto behaves in lower bit depths and/or shorter-running exponents (e.g. Factor=1008717761,67,69)? I have no problem running my RX480 to 100% usage.[/QUOTE]

Thanks, I'll give a higher exponent a try later.

For a primenet assignment such as 162307933,73,74, here's attached, what GPU-Z's sensor tab looks like. Running it at 2.5 seconds update produced that graphic. Faster updates indicate the GPU goes idle every several seconds and then resumes. Running the previous exponent assigned at bit depths 72-76, the 75-76 bit range was better behaved; less notching over time, and a higher average utilization.

It's interesting that the latest available GPU-Z version shows zero clock rates for this GPU. It's actually running around 125 Ghz-d/Day, and is the one I recently benchmarked for your TF benchmark page.

Also, this GPU has no display duties. (Another, old, slow NVS295 card not suitable for gpu computing is handling that on the rare occasion I use the console.)

kriesel 2018-01-18 01:43

1 Attachment(s)
Same exponent and bit depth, GPU-Z interval 1-second image attachment. Instantaneous numerical value display shows GPU loading as low as 1%.

kriesel 2018-01-18 02:14

RX 550 GPU going idle briefly in mfakto
 
1 Attachment(s)
Mfakto of 900033181,78,79 also has load drops shown by GPU-Z, but much less drastic. Compare the right portion of the graph, produced with that exponent, on the attachment here, to the left portion from the preceding lower exponent and bit level. The higher exponent and bit depth is running at about 97% per GPU-Z. This is with a single instance.

Today's graphics are from GPU-Z v2.6.0. I upgraded from 2.5.0, but it did not resolve the zero clock rates seen on this GPU's sensor readings.

(Oops, previous two image attachments were misnamed with mfakt[B]C[/B] and produced running mfakt[B]O[/B]. Been using mfaktc so much and mfakto so little the c follows automatically when I type!)

kriesel 2018-01-18 21:21

Variable throughput and seemingly somewhat independent of GPU load
 
Continuing with the Radeon RX550 (low profile 2gb model),
900033181,78,79 showed ~96-99% GPU loading in GPU-Z, and 97ghzd/day in screen output. GPU loading can go up while ghzd/day goes down. I've seen fluctuation with about 110-130ghzd/day for the 162M exponents. Periodically there is a one-class drop in ghzd/day. (vertical space inserted below for emphasis)
There's also a considerable difference in speed between bit levels or restarts. It's the only thing running on the car in all cases.

[CODE]Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Jan 18 13:18 | 4515 97.9% | 8.517 2m50s | 124.55 81206 0.00%
Jan 18 13:18 | 4520 98.0% | 8.516 2m42s | 124.56 81206 0.00%
Jan 18 13:18 | 4523 98.1% | 8.532 2m34s | 124.33 81206 0.00%
Jan 18 13:18 | 4527 98.2% | 8.516 2m25s | 124.56 81206 0.00%
Jan 18 13:18 | 4532 98.3% | 8.532 2m17s | 124.33 81206 0.00%
Jan 18 13:18 | 4536 98.4% | 8.579 2m09s | 123.65 81206 0.00%
Jan 18 13:18 | 4547 98.5% | 8.516 1m59s | 124.56 81206 0.00%
Jan 18 13:19 | 4548 98.6% | 8.719 1m53s | 121.66 81206 0.00%
Jan 18 13:19 | 4551 98.8% | 8.516 1m42s | 124.56 81206 0.00%

Jan 18 13:19 | 4560 98.9% | 9.218 1m41s | 115.08 81206 0.00%

Jan 18 13:19 | 4568 99.0% | 8.532 1m25s | 124.33 81206 0.00%
Jan 18 13:19 | 4571 99.1% | 8.532 1m17s | 124.33 81206 0.00%
Jan 18 13:19 | 4575 99.2% | 8.516 1m08s | 124.56 81206 0.00%
Jan 18 13:19 | 4580 99.3% | 8.517 1m00s | 124.55 81206 0.00%
Jan 18 13:20 | 4587 99.4% | 8.533 0m51s | 124.31 81206 0.00%
Jan 18 13:20 | 4592 99.5% | 8.515 0m43s | 124.58 81206 0.00%
Jan 18 13:20 | 4595 99.6% | 8.532 0m34s | 124.33 81206 0.00%
Jan 18 13:20 | 4596 99.7% | 8.516 0m26s | 124.56 81206 0.00%
Jan 18 13:20 | 4607 99.8% | 8.530 0m17s | 124.36 81206 0.00%
Jan 18 13:20 | 4608 99.9% | 8.532 0m09s | 124.33 81206 0.00%
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Jan 18 13:20 | 4611 100.0% | 8.532 0m00s | 124.33 81206 0.00%
no factor for M162307933 from 2^73 to 2^74 [mfakto 0.15pre6-Win cl_barrett15_74_gs_4]
tf(): time spent since restart: 31m 33.386s
estimated total time spent: 2h 17m 4.663s (123.82 GHz-days / day)

Starting trial factoring M162307933 from 2^74 to 2^75 (23.57GHz-days)
Using GPU kernel "cl_barrett15_82_gs_4"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Jan 18 13:21 | 0 0.1% | 20.214 5h23m | 104.95 81206 0.00%
Jan 18 13:21 | 3 0.2% | 20.216 5h22m | 104.94 81206 0.00%
Jan 18 13:21 | 8 0.3% | 20.216 5h22m | 104.94 81206 0.00%
Jan 18 13:22 | 11 0.4% | 20.216 5h22m | 104.94 81206 0.00%
Jan 18 13:22 | 12 0.5% | 20.216 5h21m | 104.94 81206 0.00%
Jan 18 13:23 | 15 0.6% | 20.216 5h21m | 104.94 81206 0.00%
Jan 18 13:23 | 23 0.7% | 20.215 5h21m | 104.95 81206 0.00%
Jan 18 13:23 | 32 0.8% | 20.216 5h20m | 104.94 81206 0.00%
Jan 18 13:24 | 35 0.9% | 20.216 5h20m | 104.94 81206 0.00%
Jan 18 13:24 | 36 1.0% | 20.216 5h20m | 104.94 81206 0.00%
Jan 18 13:24 | 47 1.1% | 20.216 5h19m | 104.94 81206 0.00%
Jan 18 13:25 | 56 1.3% | 20.216 5h19m | 104.94 81206 0.00%
Jan 18 13:25 | 60 1.4% | 20.215 5h19m | 104.95 81206 0.00%
Jan 18 13:25 | 63 1.5% | 20.310 5h20m | 104.46 81206 0.00%
Jan 18 13:26 | 68 1.6% | 20.216 5h18m | 104.94 81206 0.00%
Jan 18 13:26 | 71 1.7% | 20.621 5h24m | 102.88 81206 0.00%
Jan 18 13:26 | 72 1.8% | 20.217 5h17m | 104.94 81206 0.00%

Jan 18 13:27 | 75 1.9% | 21.620 5h39m | 98.13 81206 0.00%

Jan 18 13:27 | 80 2.0% | 20.215 5h17m | 104.95 81206 0.00%
Jan 18 13:27 | 87 2.1% | 20.216 5h16m | 104.94 81206 0.00%
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Jan 18 13:28 | 92 2.2% | 20.216 5h16m | 104.94 81206 0.00%
Jan 18 13:28 | 96 2.3% | 20.216 5h16m | 104.94 81206 0.00%
Jan 18 13:28 | 107 2.4% | 20.216 5h15m | 104.94 81206 0.00%
Jan 18 13:29 | 108 2.5% | 20.216 5h15m | 104.94 81206 0.00%
Jan 18 13:29 | 116 2.6% | 20.214 5h15m | 104.95 81206 0.00%
Jan 18 13:29 | 120 2.7% | 20.217 5h14m | 104.94 81206 0.00%
Jan 18 13:30 | 123 2.8% | 20.216 5h14m | 104.94 81206 0.00%
Jan 18 13:30 | 131 2.9% | 20.216 5h14m | 104.94 81206 0.00%
Jan 18 13:30 | 135 3.0% | 20.216 5h13m | 104.94 81206 0.00%
Jan 18 13:31 | 140 3.1% | 20.215 5h13m | 104.95 81206 0.00%
Jan 18 13:31 | 143 3.2% | 20.215 5h13m | 104.95 81206 0.00%
Jan 18 13:31 | 147 3.3% | 20.216 5h12m | 104.94 81206 0.00%
Jan 18 13:32 | 152 3.4% | 20.216 5h12m | 104.94 81206 0.00%
Jan 18 13:32 | 155 3.5% | 20.309 5h13m | 104.46 81206 0.00%
Jan 18 13:32 | 156 3.6% | 20.216 5h11m | 104.94 81206 0.00%
Jan 18 13:33 | 168 3.8% | 20.622 5h17m | 102.88 81206 0.00%
Jan 18 13:33 | 171 3.9% | 20.215 5h10m | 104.95 81206 0.00%

Jan 18 13:33 | 176 4.0% | 21.620 5h32m | 98.13 81206 0.00%

Jan 18 13:34 | 180 4.1% | 20.216 5h10m | 104.94 81206 0.00%
Jan 18 13:34 | 191 4.2% | 20.216 5h09m | 104.94 81206 0.00%
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait[/CODE]

I'm satisfied this is not due to thermal limits, since it's in a well ventilated workstation case with the cover off also, ambient temperature is ~20C, and its fans are not obstructed by any other cards adjacent.

kriesel 2018-01-18 22:13

[QUOTE=SnoProblem;438148]There are a few different automation programs/scripts.

I created [URL="https://github.com/Kunde21/MersenneManager"]these programs[/URL] to solve my own automation problems.

Later I found [URL="https://github.com/teknohog/primetools"]these Python scripts[/URL] and [URL="http://www.mersenneforum.org/misfit/"]this management program[/URL] that have similar functionality.[/QUOTE]

Interesting. I was not aware of "Mersenne manager", despite having searched a bit for such things last year.
I've added it to the table of available client management software at [URL="http://www.mersenneforum.org/showthread.php?t=22450"]www.mersenneforum.org/showthread.php?t=22450[/URL]

Minor suggestion: line 136 of TFmanager says,
[CODE]flag.UintVar(&sett.Devices[0].Device, "dev", sett.Devices[0].Device, "OpenCL device number for clLucas (default 0)")[/CODE]I suggest referencing Mfakto there instead of clLucas.

Also, IIRC, OpenCL apps I've encountered use two digits to reference devices; first digit platform, second digit unit within the platform category, and is one-based not zero-based. (First device of first platform is -d 11. To keep us on our toes, CUDA apps like Mfaktc, CUDAPm1, and CUDALucas are zero-based and single digit device references; first device -d 0, so attempts to run -d 1 on a single-CUDA-device system fail.)

ixfd64 2018-01-19 02:27

[QUOTE=Bdot;477289]This could indicate that the single kernel invocations take longer than some timeout that the OS is willing to wait until it can use the GPU for showing the screen. Try lowering the GPUSieve* parameters as that will shorten the time the GPU is blocked.[/QUOTE]

What parameters would you recommend?

The current settings are as follows:
[LIST][*][c]GPUSievePrimes=81157[/c][*][c]GPUSieveSize=96[/c][*][c]GPUSieveProcessSize=24[/c][/LIST]
I'm using an AMD R7 M460, which is an entry-level card.

xx005fs 2018-01-19 06:22

Tflop Conversion
 
What is the estimated Teraflop conversion to GHz-day?


All times are UTC. The time now is 22:42.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.