mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2020-01-04, 16:55   #1684
PhilF
 
PhilF's Avatar
 
"6800 descendent"
Feb 2005
Colorado

2E216 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
I don't understand it. I git cloned gpuowl and compiled, and it runs slower than before 1240 us. vs. 750 us. What am I doing wrong?
I can confirm it works for me on a Radeon VII. I pulled this version before this was even posted, so was using CARRY32 during my tuning without realizing it. With a 5632K FFT, I was getting 888us/it. I placed -use CARRY64 on the command line and the timing slowed to 910us/it.

It just keeps getting better all the time!

NOTE: I installed AMD's ROCm drivers with the --opencl=pal and --headless options, which installs the lightest weight drivers possible. I am using an i7 CPU and motherboard that has built-in video, so that is what I'm using for the console. There's no monitor connected to the Radeon VII at all. Like George said, these Linux drivers are light years ahead of the Windows drivers.
PhilF is offline   Reply With Quote
Old 2020-01-04, 20:04   #1685
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

41·199 Posts
Default

Quote:
Originally Posted by PhilF View Post
With a 5632K FFT, I was getting 888us/it.
I think you should be getting under 800us. Are you overclocking memory yet?

Thanks for the --headless idea -- I'll try that soon.
Prime95 is online now   Reply With Quote
Old 2020-01-04, 20:42   #1686
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

41·199 Posts
Default

To expand on the new CARRY32 feature. The size of the carry increases as FFTs get larger and as the exponent approaches the limit of the current FFT size. I did test 2000 iterations of an exponent over 1 billion near the upper end of a 56M FFT. The maximum carry I saw was 80% of a fatal overflow value. Thus, I think the new code is safe for some time to come though we really should do some more research.

Also, the new code stores carries in a different order to be more AMD-friendly. One can get the old memory layout with "-use OLD_CARRY_LAYOUT". That layout might be better on nVidia or it might be irrelevant. CARRY32 and CARRY64 both work with the new and old memory layout.

To activate the old code "-use CARRY64,OLD_CARRY_LAYOUT"
Prime95 is online now   Reply With Quote
Old 2020-01-04, 21:41   #1687
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

26448 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
I don't understand it. I git cloned gpuowl and compiled, and it runs slower than before 1240 us. vs. 750 us. What am I doing wrong?
I don't know, but the ROCm compiler can generate surprising results sometimes. What version of ROCm are you using, and what FFT size?

One way to attempt to debug this is:
- run with CARRY64, do you recover the normal perfermance you had before?
- produce a ISA dump with CARRY64 (using -dump <folder>)
- produce another dump with CARRY32
- compare the .s files from the two dumps. This can be facilitated by the delta.sh script in gpuowl/tools/ which produces a partially agregated instruction counts

Anothe interesting bit of information is to run with -time in before/after cases, and see which kernel has a massive slowdown.

One more thing to keep an eye on is thermal throttling by the GPU. If you keep the hottest tempearature (spot) at under 98C (e.g. 90, 95) there should be little/no thermal throttling.

Last fiddled with by preda on 2020-01-04 at 21:44
preda is offline   Reply With Quote
Old 2020-01-05, 19:44   #1688
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

41×199 Posts
Default

@preda: Feature request:

It seems that Ben Delo's big increase in PRP firepower makes it impossible for P-1'ers to stay ahead of the PRP wavefronts. This means we may get assigned an exponent that hasn't had any P-1 done.

Can we change the default behavior of gpuowl to do a P-1 test on the exponent if needed? For first implementation, don't worry about optimal bounds, we can add that later. P-1 has about a 5% chance of finding a factor. For me a PRP test take 18 hours, so investing up to 54 minutes of P-1 makes sense. Looking at recent P-1 results turned into primenet, prime95 chose bounds around B1=745000, B2=14713750 for a 96M exponent. I've no idea how long that takes on my GPU -- maybe I'll go test that now.
Prime95 is online now   Reply With Quote
Old 2020-01-05, 20:28   #1689
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

41·199 Posts
Default

Quote:
Originally Posted by Prime95 View Post
I've no idea how long that takes on my GPU -- maybe I'll go test that now.
I tested B1=750000, B2=20*B1 on a 5M FFT expo and it took 26 minutes. Clearly a worthwhile investment if no P-1 has been done before (PRP lines in worktodo that do not end in ",0") .

Bonus. My test found a factor! So the P-1 code still works and another exponent bites the dust.
Prime95 is online now   Reply With Quote
Old 2020-01-05, 21:04   #1690
PhilF
 
PhilF's Avatar
 
"6800 descendent"
Feb 2005
Colorado

2×32×41 Posts
Default

Quote:
Originally Posted by Prime95 View Post
I tested B1=750000, B2=20*B1 on a 5M FFT expo and it took 26 minutes. Clearly a worthwhile investment if no P-1 has been done before (PRP lines in worktodo that do not end in ",0") .

Bonus. My test found a factor! So the P-1 code still works and another exponent bites the dust.
Cool!

I was just assigned a few Cat 4 exponents in the 103M range, TF'ed to 74 bits with no P-1 at all. With a Radeon VII, should I TF it higher first, or skip that and do some P-1 first, or both?
PhilF is offline   Reply With Quote
Old 2020-01-05, 21:31   #1691
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

22×192 Posts
Default

Quote:
Originally Posted by Prime95 View Post
@preda: Feature request:

It seems that Ben Delo's big increase in PRP firepower makes it impossible for P-1'ers to stay ahead of the PRP wavefronts. This means we may get assigned an exponent that hasn't had any P-1 done.

Can we change the default behavior of gpuowl to do a P-1 test on the exponent if needed? For first implementation, don't worry about optimal bounds, we can add that later. P-1 has about a 5% chance of finding a factor. For me a PRP test take 18 hours, so investing up to 54 minutes of P-1 makes sense. Looking at recent P-1 results turned into primenet, prime95 chose bounds around B1=745000, B2=14713750 for a 96M exponent. I've no idea how long that takes on my GPU -- maybe I'll go test that now.
Understood; I'm looking into this, estimated 1-2days.
preda is offline   Reply With Quote
Old 2020-01-05, 21:49   #1692
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

41·199 Posts
Default

Quote:
Originally Posted by PhilF View Post
Cool!

I was just assigned a few Cat 4 exponents in the 103M range, TF'ed to 74 bits with no P-1 at all. With a Radeon VII, should I TF it higher first, or skip that and do some P-1 first, or both?
Skip the TF, just P-1.
Prime95 is online now   Reply With Quote
Old 2020-01-05, 22:03   #1693
PhilF
 
PhilF's Avatar
 
"6800 descendent"
Feb 2005
Colorado

2·32·41 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Skip the TF, just P-1.
OK, thanks.

BTW, in regards to my memory timing, I had a chance to play with it today without success. Even overclocked to just 1050 produced errors.
PhilF is offline   Reply With Quote
Old 2020-01-05, 22:06   #1694
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

101101001002 Posts
Default

Quote:
Originally Posted by PhilF View Post
OK, thanks.

BTW, in regards to my memory timing, I had a chance to play with it today without success. Even overclocked to just 1050 produced errors.
Did you undervolt? that could also be the reason for errors.
preda is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1719 2023-01-16 15:51
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 04:33.


Fri Feb 3 04:33:26 UTC 2023 up 169 days, 2:01, 1 user, load averages: 0.86, 1.08, 1.02

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔