![]() |
[QUOTE=paulunderwood;533664]Quoting from [URL]https://github.com/preda/gpuowl[/URL]
... Is that all outdated?[/QUOTE]8 to 9 months outdated, yes; announcement of Gpuowl v6.4 by Preda: [URL]https://www.mersenneforum.org/showpost.php?p=513288&postcount=1056[/URL] |
700M P-1 on gpuowl / P100 / colab
It took ~1.74 days of run time, several colab sessions, with a Fan Ming-provided executable. [URL]https://www.mersenne.org/report_exponent/?exp_lo=700000031&full=1[/URL] Current projections from runtime scaling and buffer count trend is higher data points will take 2-4 days each, and throughout the mersenne.org range will be possible. The run times can probably be improved upon; I'm not using any of the performance enhancing T2_shuffle or merged-middle -use options during these runs.
|
Hello!
How to switch gpuOwl to show the traditional "ms/it" instead us/sq? |
[QUOTE=Lorenzo;533824]Hello!
How to switch gpuOwl to show the traditional "ms/it" instead us/sq?[/QUOTE] Edit source code and recompile. |
[QUOTE=kriesel;533825]Edit source code and recompile.[/QUOTE]
uhhhh. Too many efforts to me :( |
[QUOTE=Lorenzo;533824]Hello!
How to switch gpuOwl to show the traditional "ms/it" instead us/sq?[/QUOTE] I don't know about a switch but... [CODE]grep -B3 -A3 "us/it" Gpu.cpp static string makeLogStr(u32 E, string_view status, u32 k, u64 res, float secsPerIt, u32 nIters) { char buf[256]; snprintf(buf, sizeof(buf), "%u %2s %8d %6.2f%%; %4.0f us/it; ETA %s; %s", E, status.data(), k, k / float(nIters) * 100, secsPerIt * 1'000'000, getETA(k, nIters, secsPerIt).c_str(), hex(res).c_str()); [/CODE] Change: %4.0f us/it ---> %4.3f ms/it 1'000'000 ---> 1'000 And recompile. Or just divide by 1000 in your head. :crank: |
[QUOTE=paulunderwood;533827]Or just divide by 1000 in your head. :crank:[/QUOTE]
Thank you! This is what I'm looking for :smile: |
[QUOTE=kriesel;533825]Edit source code and recompile.[/QUOTE]
[QUOTE]"The more you overtake the plumbing, the easier it is to stop up the drain." ~Jimmy Doohan.[/QUOTE]The version I am using has a 9/29/2019 date stamp. I have only ran P-1's with it and there have been no problems. I would be reluctant to replace it with anything newer. I feel it needs to update the screen more often, but I live with it. It produces correct results, as far as I know. This is the important part. |
CARRY32 and CARRY64
A new optimization has been contributed by George, it consists in using only 32bits to store the carry-out from a word after the convolution. The theoretical analysis of whether this carry value does fit in 32bits or not is not very clear AFAIK, but the rough idea is that the higher the FFT size, the larger the expected value of the carry is. The new CARRY32 has been tested quite a bit at the wavefront (5M FFT) and never produced an error, OTOH the situation may be different at higher FFT sizes.
The performance gain is significant at about 3-5%. Given the above, CARRY32 is now enabled by default. To get the old behavior one can supply "-use CARRY64" to gpuowl. PRP should detect a carry overflow (when using CARRY32) if that occurs (and report the usual error, and retry, and get a repetitive error 3 times and stop). OTOH P-1 has no check; probably it's safer to keep using CARRY64 when doing P-1, especially when using FFT sizes larger than 5M (which is the FFT that was tested a lot for now). If anybody sees an error which seems to be caused by CARRY32 (at any FFT size), please report it. |
gpuowl-v6.11-112-gf1b00d1 Windows build
2 Attachment(s)
This should have the -use CARRY32 default that Preda described above. I've only gone as far as running -h on it so far. Build again had the usual shower of warnings.
Just when I think we're at diminishing returns or at the end of optimizations, George provides another pleasant surprise. |
[QUOTE=preda;534207]A new optimization has been contributed by George, it consists in using only 32bits to store the carry-out from a word after the convolution. The theoretical analysis of whether this carry value does fit in 32bits or not is not very clear AFAIK, but the rough idea is that the higher the FFT size, the larger the expected value of the carry is. The new CARRY32 has been tested quite a bit at the wavefront (5M FFT) and never produced an error, OTOH the situation may be different at higher FFT sizes.
The performance gain is significant at about 3-5%. Given the above, CARRY32 is now enabled by default. To get the old behavior one can supply "-use CARRY64" to gpuowl. PRP should detect a carry overflow (when using CARRY32) if that occurs (and report the usual error, and retry, and get a repetitive error 3 times and stop). OTOH P-1 has no check; probably it's safer to keep using CARRY64 when doing P-1, especially when using FFT sizes larger than 5M (which is the FFT that was tested a lot for now). If anybody sees an error which seems to be caused by CARRY32 (at any FFT size), please report it.[/QUOTE] I don't understand it. I git cloned gpuowl and compiled, and it runs slower than before 1240 us. vs. 750 us. What am I doing wrong? |
| All times are UTC. The time now is 23:13. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.