mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2019-12-27, 23:34   #1673
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·29·127 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
Quoting from https://github.com/preda/gpuowl
...
Is that all outdated?
8 to 9 months outdated, yes; announcement of Gpuowl v6.4 by Preda: https://www.mersenneforum.org/showpo...postcount=1056
kriesel is offline   Reply With Quote
Old 2019-12-31, 02:59   #1674
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2×29×127 Posts
Default 700M P-1 on gpuowl / P100 / colab

It took ~1.74 days of run time, several colab sessions, with a Fan Ming-provided executable. https://www.mersenne.org/report_expo...0000031&full=1 Current projections from runtime scaling and buffer count trend is higher data points will take 2-4 days each, and throughout the mersenne.org range will be possible. The run times can probably be improved upon; I'm not using any of the performance enhancing T2_shuffle or merged-middle -use options during these runs.

Last fiddled with by kriesel on 2019-12-31 at 03:01
kriesel is offline   Reply With Quote
Old 2019-12-31, 12:10   #1675
Lorenzo
 
Lorenzo's Avatar
 
Aug 2010
Republic of Belarus

2·89 Posts
Default

Hello!
How to switch gpuOwl to show the traditional "ms/it" instead us/sq?
Lorenzo is offline   Reply With Quote
Old 2019-12-31, 12:32   #1676
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

163068 Posts
Default

Quote:
Originally Posted by Lorenzo View Post
Hello!
How to switch gpuOwl to show the traditional "ms/it" instead us/sq?
Edit source code and recompile.

Last fiddled with by kriesel on 2019-12-31 at 12:33
kriesel is offline   Reply With Quote
Old 2019-12-31, 12:44   #1677
Lorenzo
 
Lorenzo's Avatar
 
Aug 2010
Republic of Belarus

2·89 Posts
Default

Quote:
Originally Posted by kriesel View Post
Edit source code and recompile.
uhhhh. Too many efforts to me :(
Lorenzo is offline   Reply With Quote
Old 2019-12-31, 12:50   #1678
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

118816 Posts
Default

Quote:
Originally Posted by Lorenzo View Post
Hello!
How to switch gpuOwl to show the traditional "ms/it" instead us/sq?
I don't know about a switch but...


Code:
grep -B3 -A3 "us/it"  Gpu.cpp 
static string makeLogStr(u32 E, string_view status, u32 k, u64 res, float secsPerIt, u32 nIters) {
  char buf[256];
  
  snprintf(buf, sizeof(buf), "%u %2s %8d %6.2f%%; %4.0f us/it; ETA %s; %s",
           E, status.data(), k, k / float(nIters) * 100,
           secsPerIt * 1'000'000, getETA(k, nIters, secsPerIt).c_str(),
           hex(res).c_str());
Change:

%4.0f us/it ---> %4.3f ms/it
1'000'000 ---> 1'000

And recompile.

Or just divide by 1000 in your head.

Last fiddled with by paulunderwood on 2019-12-31 at 12:52
paulunderwood is offline   Reply With Quote
Old 2020-01-03, 09:03   #1679
Lorenzo
 
Lorenzo's Avatar
 
Aug 2010
Republic of Belarus

2628 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
Or just divide by 1000 in your head.
Thank you! This is what I'm looking for
Lorenzo is offline   Reply With Quote
Old 2020-01-03, 23:45   #1680
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009
Not U. + S.A.

7·192 Posts
Default

Quote:
Originally Posted by kriesel View Post
Edit source code and recompile.
Quote:
"The more you overtake the plumbing, the easier it is to stop up the drain." ~Jimmy Doohan.
The version I am using has a 9/29/2019 date stamp. I have only ran P-1's with it and there have been no problems. I would be reluctant to replace it with anything newer. I feel it needs to update the screen more often, but I live with it. It produces correct results, as far as I know. This is the important part.
storm5510 is offline   Reply With Quote
Old 2020-01-04, 10:51   #1681
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

22·192 Posts
Default CARRY32 and CARRY64

A new optimization has been contributed by George, it consists in using only 32bits to store the carry-out from a word after the convolution. The theoretical analysis of whether this carry value does fit in 32bits or not is not very clear AFAIK, but the rough idea is that the higher the FFT size, the larger the expected value of the carry is. The new CARRY32 has been tested quite a bit at the wavefront (5M FFT) and never produced an error, OTOH the situation may be different at higher FFT sizes.

The performance gain is significant at about 3-5%. Given the above, CARRY32 is now enabled by default. To get the old behavior one can supply "-use CARRY64" to gpuowl.

PRP should detect a carry overflow (when using CARRY32) if that occurs (and report the usual error, and retry, and get a repetitive error 3 times and stop).

OTOH P-1 has no check; probably it's safer to keep using CARRY64 when doing P-1, especially when using FFT sizes larger than 5M (which is the FFT that was tested a lot for now).

If anybody sees an error which seems to be caused by CARRY32 (at any FFT size), please report it.
preda is online now   Reply With Quote
Old 2020-01-04, 15:01   #1682
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2×29×127 Posts
Default gpuowl-v6.11-112-gf1b00d1 Windows build

This should have the -use CARRY32 default that Preda described above. I've only gone as far as running -h on it so far. Build again had the usual shower of warnings.

Just when I think we're at diminishing returns or at the end of optimizations, George provides another pleasant surprise.
Attached Files
File Type: 7z gpuowl-v6.11-112-gf1b00d1.7z (436.7 KB, 305 views)
File Type: txt make-warnings.txt (5.5 KB, 234 views)

Last fiddled with by kriesel on 2020-01-04 at 15:02
kriesel is offline   Reply With Quote
Old 2020-01-04, 16:08   #1683
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

23×3×11×17 Posts
Default

Quote:
Originally Posted by preda View Post
A new optimization has been contributed by George, it consists in using only 32bits to store the carry-out from a word after the convolution. The theoretical analysis of whether this carry value does fit in 32bits or not is not very clear AFAIK, but the rough idea is that the higher the FFT size, the larger the expected value of the carry is. The new CARRY32 has been tested quite a bit at the wavefront (5M FFT) and never produced an error, OTOH the situation may be different at higher FFT sizes.

The performance gain is significant at about 3-5%. Given the above, CARRY32 is now enabled by default. To get the old behavior one can supply "-use CARRY64" to gpuowl.

PRP should detect a carry overflow (when using CARRY32) if that occurs (and report the usual error, and retry, and get a repetitive error 3 times and stop).

OTOH P-1 has no check; probably it's safer to keep using CARRY64 when doing P-1, especially when using FFT sizes larger than 5M (which is the FFT that was tested a lot for now).

If anybody sees an error which seems to be caused by CARRY32 (at any FFT size), please report it.
I don't understand it. I git cloned gpuowl and compiled, and it runs slower than before 1240 us. vs. 750 us. What am I doing wrong?

Last fiddled with by paulunderwood on 2020-01-04 at 16:11
paulunderwood is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1719 2023-01-16 15:51
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 21:28.


Tue Jan 31 21:28:29 UTC 2023 up 166 days, 18:57, 0 users, load averages: 1.26, 1.19, 1.08

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔