![]() |
![]() |
#1684 | |
"6800 descendent"
Feb 2005
Colorado
2E216 Posts |
![]() Quote:
It just keeps getting better all the time! NOTE: I installed AMD's ROCm drivers with the --opencl=pal and --headless options, which installs the lightest weight drivers possible. I am using an i7 CPU and motherboard that has built-in video, so that is what I'm using for the console. There's no monitor connected to the Radeon VII at all. Like George said, these Linux drivers are light years ahead of the Windows drivers. |
|
![]() |
![]() |
![]() |
#1685 |
P90 years forever!
Aug 2002
Yeehaw, FL
41·199 Posts |
![]() |
![]() |
![]() |
![]() |
#1686 |
P90 years forever!
Aug 2002
Yeehaw, FL
41·199 Posts |
![]()
To expand on the new CARRY32 feature. The size of the carry increases as FFTs get larger and as the exponent approaches the limit of the current FFT size. I did test 2000 iterations of an exponent over 1 billion near the upper end of a 56M FFT. The maximum carry I saw was 80% of a fatal overflow value. Thus, I think the new code is safe for some time to come though we really should do some more research.
Also, the new code stores carries in a different order to be more AMD-friendly. One can get the old memory layout with "-use OLD_CARRY_LAYOUT". That layout might be better on nVidia or it might be irrelevant. CARRY32 and CARRY64 both work with the new and old memory layout. To activate the old code "-use CARRY64,OLD_CARRY_LAYOUT" |
![]() |
![]() |
![]() |
#1687 | |
"Mihai Preda"
Apr 2015
26448 Posts |
![]() Quote:
One way to attempt to debug this is: - run with CARRY64, do you recover the normal perfermance you had before? - produce a ISA dump with CARRY64 (using -dump <folder>) - produce another dump with CARRY32 - compare the .s files from the two dumps. This can be facilitated by the delta.sh script in gpuowl/tools/ which produces a partially agregated instruction counts Anothe interesting bit of information is to run with -time in before/after cases, and see which kernel has a massive slowdown. One more thing to keep an eye on is thermal throttling by the GPU. If you keep the hottest tempearature (spot) at under 98C (e.g. 90, 95) there should be little/no thermal throttling. Last fiddled with by preda on 2020-01-04 at 21:44 |
|
![]() |
![]() |
![]() |
#1688 |
P90 years forever!
Aug 2002
Yeehaw, FL
41×199 Posts |
![]()
@preda: Feature request:
It seems that Ben Delo's big increase in PRP firepower makes it impossible for P-1'ers to stay ahead of the PRP wavefronts. This means we may get assigned an exponent that hasn't had any P-1 done. Can we change the default behavior of gpuowl to do a P-1 test on the exponent if needed? For first implementation, don't worry about optimal bounds, we can add that later. P-1 has about a 5% chance of finding a factor. For me a PRP test take 18 hours, so investing up to 54 minutes of P-1 makes sense. Looking at recent P-1 results turned into primenet, prime95 chose bounds around B1=745000, B2=14713750 for a 96M exponent. I've no idea how long that takes on my GPU -- maybe I'll go test that now. |
![]() |
![]() |
![]() |
#1689 | |
P90 years forever!
Aug 2002
Yeehaw, FL
41·199 Posts |
![]() Quote:
Bonus. My test found a factor! So the P-1 code still works and another exponent bites the dust. |
|
![]() |
![]() |
![]() |
#1690 | |
"6800 descendent"
Feb 2005
Colorado
2×32×41 Posts |
![]() Quote:
I was just assigned a few Cat 4 exponents in the 103M range, TF'ed to 74 bits with no P-1 at all. With a Radeon VII, should I TF it higher first, or skip that and do some P-1 first, or both? |
|
![]() |
![]() |
![]() |
#1691 | |
"Mihai Preda"
Apr 2015
22×192 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#1692 |
P90 years forever!
Aug 2002
Yeehaw, FL
41·199 Posts |
![]() |
![]() |
![]() |
![]() |
#1693 |
"6800 descendent"
Feb 2005
Colorado
2·32·41 Posts |
![]() |
![]() |
![]() |
![]() |
#1694 |
"Mihai Preda"
Apr 2015
101101001002 Posts |
![]() |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1719 | 2023-01-16 15:51 |
GPUOWL AMD Windows OpenCL issues | xx005fs | GpuOwl | 0 | 2019-07-26 21:37 |
Testing an expression for primality | 1260 | Software | 17 | 2015-08-28 01:35 |
Testing Mersenne cofactors for primality? | CRGreathouse | Computer Science & Computational Number Theory | 18 | 2013-06-08 19:12 |
Primality-testing program with multiple types of moduli (PFGW-related) | Unregistered | Information & Answers | 4 | 2006-10-04 22:38 |