mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2020-01-18, 18:29   #1794
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

2·23·29 Posts
Default

Quote:
Originally Posted by kriesel View Post
Commit changes seem to be rocm-specific.
Yes as I said, I tested (i.e. measured) with ROCm 2.10. Should not regress on other platforms, but I'm looking for feedback on this. If a regression is detected (e.g. on Nvidia) I'll switch the change on/off as appropriate.
preda is offline   Reply With Quote
Old 2020-01-18, 18:55   #1795
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

32·72·11 Posts
Default

Quote:
Originally Posted by preda View Post
Yes as I said, I tested (i.e. measured) with ROCm 2.10. Should not regress on other platforms, but I'm looking for feedback on this. If a regression is detected (e.g. on Nvidia) I'll switch the change on/off as appropriate.
Coded for and test on are two separate distinctions. Thanks for all you do.
This is timely, as I was just considering rolling through a slew of gpu models with gpuowl minor updates and -use options timing script updates on PRP.
kriesel is offline   Reply With Quote
Old 2020-01-18, 20:28   #1796
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

113638 Posts
Default gpuowl-v6.11-134-g1e0ce1d Windows x64 build

Only ran -h so far, but here it is. This is the latest commit at the moment, that has Preda's P-1 stage 2 tweak.
Attached Files
File Type: txt build-log.txt (6.0 KB, 45 views)
File Type: 7z gpuowl-v6.11-134-g1e0ce1d.7z (446.6 KB, 72 views)
kriesel is offline   Reply With Quote
Old 2020-01-20, 04:10   #1797
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

32×72×11 Posts
Default RX550 -use testing

Code:
gpuowl v6.11-134
RX550 4GB
Win7 x64
exponent 92400689 PRP
5M fft
-iters 10000 -time

NO_ASM 14491
NO_ASM,UNROLL_ALL 14492
NO_ASM,UNROLL_NONE 14364
NO_ASM,UNROLL_WIDTH 14363
NO_ASM,UNROLL_HEIGHT 14360 *
NO_ASM,UNROLL_MIDDLEMUL1 14412
NO_ASM,UNROLL_MIDDLEMUL2 14363

NO_ASM,UNROLL_WIDTH,UNROLL_HEIGHT 14369
NO_ASM,UNROLL_WIDTH,UNROLL_MIDDLEMUL2 14363
NO_ASM,UNROLL_HEIGHT,UNROLL_MIDDLEMUL2 14361
NO_ASM,UNROLL_WIDTH,UNROLL_HEIGHT,UNROLL_MIDDLEMUL2 14362

NO_ASM,MERGED_MIDDLE,WORKINGIN 19729
NO_ASM,MERGED_MIDDLE,WORKINGIN 19730
NO_ASM,MERGED_MIDDLE,WORKINGIN1 14683
NO_ASM,MERGED_MIDDLE,WORKINGIN1A 14573
NO_ASM,MERGED_MIDDLE,WORKINGIN2 14849
NO_ASM,MERGED_MIDDLE,WORKINGIN3 15175
NO_ASM,MERGED_MIDDLE,WORKINGIN4 19404
NO_ASM,MERGED_MIDDLE,WORKINGIN5 14487 *

NO_ASM,MERGED_MIDDLE,WORKINGOUT 32143
NO_ASM,MERGED_MIDDLE,WORKINGOUT0 17920
NO_ASM,MERGED_MIDDLE,WORKINGOUT1 14866
NO_ASM,MERGED_MIDDLE,WORKINGOUT1A 14825
NO_ASM,MERGED_MIDDLE,WORKINGOUT2 14395 *
NO_ASM,MERGED_MIDDLE,WORKINGOUT3 14496
NO_ASM,MERGED_MIDDLE,WORKINGOUT4 15450
NO_ASM,MERGED_MIDDLE,WORKINGOUT5 15736


NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE_WIDTH 14554
NO_ASM,MERGED_MIDDLE,%wgkin%,%wkgout%,T2_SHUFFLE_MIDDLE 14319 *
NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE_HEIGHT 14364
NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE_REVERSELINE 14394
NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE 14483

NO_ASM,MERGED_MIDDLE,%wgkin%,%wkgout%,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE 14309 *
NO_ASM,MERGED_MIDDLE,%wgkin%,%wkgout%,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE,T2_SHUFFLE_MIDDLE 18362

NO_ASM,MERGED_MIDDLE,%wgkin%,%wkgout%,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,CARRY32 14326 *
NO_ASM,MERGED_MIDDLE,%wgkin%,%wkgout%,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE,T2_SHUFFLE_MIDDLE,CARRY64 14965

%allotheroptions%,FANCY_MIDDLEMUL1 14320
%allotheroptions%,MORE_SQUARES_MIDDLEMUL1 14398
%allotheroptions%,CHEBYSHEV_METHOD EE on load
%allotheroptions%,CHEBYSHEV_METHOD_FMA EE on load
%allotheroptions%,ORIGINAL_METHOD 14318  *
%allotheroptions%,ORIGINAL_TWEAKED 14321

%allotheroptions%,ORIG_MIDDLEMUL2 14315
%allotheroptions%,CHEBYSHEV_MIDDLEMUL2 14309 *

%allotheroptions%,ORIG_SLOWTRIG 14772
%allotheroptions%,NEW_SLOWTRIG 14306 
%allotheroptions%,MORE_ACCURATE 14309
%allotheroptions%,LESS_ACCURATE 14184 *

NO_ASM,UNROLL_HEIGHT,MERGED_MIDDLE,WORKINGIN5,WORKINGOUT2,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,CARRY32,ORIGINAL_METHOD,LESS_ACCURATE 14152 *
14491/14152= 1.024
kriesel is offline   Reply With Quote
Old 2020-01-21, 00:38   #1798
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

2·3·19·31 Posts
Default ending in 2

I have just downloaded a bunch of world record PRPs and some end in 0 and others end in 2. Will "program":{"name":"gpuowl", "version":"v6.11-124-g267cc60"} perform P-1 automatically on those ending with "2" or do I need to upgrade gpuOwl?
paulunderwood is offline   Reply With Quote
Old 2020-01-21, 00:46   #1799
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

113638 Posts
Default gpuowl v6.11-134-g1e0ce1d RX480 -use option timings

Code:
gpuowl v6.11-134-g1e0ce1d
RX480 8GB
Win7 x64
exponent 92162731 PRP
5M fft
-iters 10000 -time

NO_ASM 3372, 3374
NO_ASM,UNROLL_ALL 3375
NO_ASM,UNROLL_NONE 3349
NO_ASM,UNROLL_WIDTH 3351
NO_ASM,UNROLL_HEIGHT 3344 *
NO_ASM,UNROLL_MIDDLEMUL1 3352
NO_ASM,UNROLL_MIDDLEMUL2 3373

NO_ASM,UNROLL_WIDTH,UNROLL_HEIGHT 3337 *
NO_ASM,UNROLL_WIDTH,UNROLL_MIDDLEMUL2 3374
NO_ASM,UNROLL_HEIGHT,UNROLL_MIDDLEMUL2 3365
NO_ASM,UNROLL_WIDTH,UNROLL_HEIGHT,UNROLL_MIDDLEMUL2 3370

NO_ASM,MERGED_MIDDLE,WORKINGIN 5991
NO_ASM,MERGED_MIDDLE,WORKINGIN 6011
NO_ASM,MERGED_MIDDLE,WORKINGIN1 3397 *
NO_ASM,MERGED_MIDDLE,WORKINGIN1A 3426
NO_ASM,MERGED_MIDDLE,WORKINGIN2 3478
NO_ASM,MERGED_MIDDLE,WORKINGIN3 3473
NO_ASM,MERGED_MIDDLE,WORKINGIN4 3821
NO_ASM,MERGED_MIDDLE,WORKINGIN5 3365

NO_ASM,MERGED_MIDDLE,WORKINGOUT 5835
NO_ASM,MERGED_MIDDLE,WORKINGOUT0 4543
NO_ASM,MERGED_MIDDLE,WORKINGOUT1 3352 *
NO_ASM,MERGED_MIDDLE,WORKINGOUT1A 3384
NO_ASM,MERGED_MIDDLE,WORKINGOUT2 3739
NO_ASM,MERGED_MIDDLE,WORKINGOUT3 3365
NO_ASM,MERGED_MIDDLE,WORKINGOUT4 3468
NO_ASM,MERGED_MIDDLE,WORKINGOUT5 3427


NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE_WIDTH 3383 *
NO_ASM,MERGED_MIDDLE,%wgkin%,%wkgout%,T2_SHUFFLE_MIDDLE 3394
NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE_HEIGHT 3390
NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE_REVERSELINE 3395
NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE 3436

set allotheroptions=NO_ASM,UNROLL_HEIGHT,UNROLL_WIDTH,WORKINGIN1,WORKINGOUT1
%allotheroptions%,T2_SHUFFLE_WIDTH,T2_SHUFFLE_HEIGHT 3341 *
%allotheroptions%,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_WIDTH 3353
%allotheroptions%,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_WIDTH,T2_REVERSELINE 3351

set allotheroptions=NO_ASM,MERGED_MIDDLE,UNROLL_HEIGHT,UNROLL_WIDTH,WORKINGIN1,WORKINGOUT1,T2_SHUFFLE_WIDTH,T2_SHUFFLE_HEIGHT
%allotheroptions%,CARRY32 3356 *
%allotheroptions%,CARRY64 3479

set allotheroptions=NO_ASM,MERGED_MIDDLE,UNROLL_HEIGHT,UNROLL_WIDTH,WORKINGIN1,WORKINGOUT1,T2_SHUFFLE_WIDTH,T2_SHUFFLE_HEIGHT,CARRY32
%allotheroptions%,FANCY_MIDDLEMUL1 3349
%allotheroptions%,MORE_SQUARES_MIDDLEMUL1 3341 *
%allotheroptions%,CHEBYSHEV_METHOD EE on load
%allotheroptions%,CHEBYSHEV_METHOD_FMA 3350
%allotheroptions%,ORIGINAL_METHOD 3356
%allotheroptions%,ORIGINAL_TWEAKED 3348

%allotheroptions%,ORIG_MIDDLEMUL2 3434
%allotheroptions%,CHEBYSHEV_MIDDLEMUL2 3357 *

%allotheroptions%,ORIG_SLOWTRIG EE
%allotheroptions%,NEW_SLOWTRIG  3360 *
%allotheroptions%,MORE_ACCURATE 3362
%allotheroptions%,LESS_ACCURATE EE

NO_ASM,UNROLL_HEIGHT,UNROLL_WIDTH,MERGED_MIDDLE,WORKINGIN1,WORKINGOUT1,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_WIDTH,CARRY32,MORE_SQUARES_MIDDLEMUL1,CHEBYSHEV_MIDDLEMUL2,NEW_SLOWTRIG
kriesel is offline   Reply With Quote
Old 2020-01-21, 01:09   #1800
PhilF
 
PhilF's Avatar
 
Feb 2005
Colorado

11038 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
I have just downloaded a bunch of world record PRPs and some end in 0 and others end in 2. Will "program":{"name":"gpuowl", "version":"v6.11-124-g267cc60"} perform P-1 automatically on those ending with "2" or do I need to upgrade gpuOwl?
Yes, that version is after the automatic P-1 commit, so you should be fine. However, the latest commit from 2 days ago implements a change that results in a 33% speed improvement in P-1 stage 2.
PhilF is offline   Reply With Quote
Old 2020-01-21, 01:21   #1801
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

2×3×19×31 Posts
Default

Quote:
Originally Posted by PhilF View Post
Yes, that version is after the automatic P-1 commit, so you should be fine. However, the latest commit from 2 days ago implements a change that results in a 33% speed improvement in P-1 stage 2.
Got it!
paulunderwood is offline   Reply With Quote
Old 2020-01-21, 07:55   #1802
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

2·23·29 Posts
Default

Quote:
Originally Posted by PhilF View Post
Yes, that version is after the automatic P-1 commit, so you should be fine. However, the latest commit from 2 days ago implements a change that results in a 33% speed improvement in P-1 stage 2.
Correction, it's 33% speed-up of one kernel (tailFusedMulDelta) that was taking up 45% of stage-2 time before. So it's more like a 12% speed-up of the stage2 (I hope).
preda is offline   Reply With Quote
Old 2020-01-21, 10:02   #1803
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

32×72×11 Posts
Default gpuowl v6.11-134-g1e0ce1d -use options on GTX1080

Code:
gpuowl v6.11-134-g1e0ce1d
GTX1080 8GB
Win7 x64
exponent 91996859 PRP
5M fft
-iters 10000 -time

NO_ASM 4541, 4560
NO_ASM,UNROLL_ALL 4542
NO_ASM,UNROLL_NONE BUILD_PROGRAM_FAILURE
NO_ASM,UNROLL_WIDTH BUILD_PROGRAM_FAILURE
NO_ASM,UNROLL_HEIGHT BUILD_PROGRAM_FAILURE
NO_ASM,UNROLL_MIDDLEMUL1 BUILD_PROGRAM_FAILURE
NO_ASM,UNROLL_MIDDLEMUL2 BUILD_PROGRAM_FAILURE

NO_ASM,UNROLL_WIDTH,UNROLL_HEIGHT BUILD_PROGRAM_FAILURE
NO_ASM,UNROLL_WIDTH,UNROLL_MIDDLEMUL2 BUILD_PROGRAM_FAILURE
NO_ASM,UNROLL_HEIGHT,UNROLL_MIDDLEMUL2 BUILD_PROGRAM_FAILURE
NO_ASM,UNROLL_WIDTH,UNROLL_HEIGHT,UNROLL_MIDDLEMUL2 4554

NO_ASM,MERGED_MIDDLE,WORKINGIN 4590
NO_ASM,MERGED_MIDDLE,WORKINGIN 5006
NO_ASM,MERGED_MIDDLE,WORKINGIN1 4574
NO_ASM,MERGED_MIDDLE,WORKINGIN1A 4666
NO_ASM,MERGED_MIDDLE,WORKINGIN2 4541
NO_ASM,MERGED_MIDDLE,WORKINGIN3 4548
NO_ASM,MERGED_MIDDLE,WORKINGIN4 4539 *
NO_ASM,MERGED_MIDDLE,WORKINGIN5 4594

NO_ASM,MERGED_MIDDLE,WORKINGOUT 4615
NO_ASM,MERGED_MIDDLE,WORKINGOUT0 4622
NO_ASM,MERGED_MIDDLE,WORKINGOUT1 4587
NO_ASM,MERGED_MIDDLE,WORKINGOUT1A 4654
NO_ASM,MERGED_MIDDLE,WORKINGOUT2 4614
NO_ASM,MERGED_MIDDLE,WORKINGOUT3 4587
NO_ASM,MERGED_MIDDLE,WORKINGOUT4 4555 *
NO_ASM,MERGED_MIDDLE,WORKINGOUT5 4602

NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4 4533 *
NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE_WIDTH 4599
NO_ASM,MERGED_MIDDLE,%wgkin%,%wkgout%,T2_SHUFFLE_MIDDLE 4646
NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE_HEIGHT 4591 
NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE_REVERSELINE 4605
NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE 4548

set allotheroptions=NO_ASM,UNROLL_ALL,WORKINGIN4,WORKINGOUT4 
%allotheroptions%,T2_SHUFFLE_WIDTH,T2_SHUFFLE_HEIGHT 4537 
%allotheroptions%,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_WIDTH 4558
%allotheroptions%,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_WIDTH,T2_REVERSELINE 4517 *

set allotheroptions=NO_ASM,MERGED_MIDDLE,UNROLL_ALL,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_WIDTH,T2_REVERSELINE
%allotheroptions%,CARRY32 4698
%allotheroptions%,CARRY64 4559 *

set 

allotheroptions=NO_ASM,MERGED_MIDDLE,UNROLL_HEIGHT,UNROLL_WIDTH,WORKINGIN1,WORKINGOUT1,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_WIDTH,T2_REVERSELINE,CARRY32
%allotheroptions%,FANCY_MIDDLEMUL1 4531
%allotheroptions%,MORE_SQUARES_MIDDLEMUL1 4542
%allotheroptions%,CHEBYSHEV_METHOD 4492
%allotheroptions%,CHEBYSHEV_METHOD_FMA 4483 *
%allotheroptions%,ORIGINAL_METHOD 4546
%allotheroptions%,ORIGINAL_TWEAKED 4579

%allotheroptions%,ORIG_MIDDLEMUL2 4518
%allotheroptions%,CHEBYSHEV_MIDDLEMUL2 4445 *

%allotheroptions%,ORIG_SLOWTRIG 4552
%allotheroptions%,NEW_SLOWTRIG  4447
%allotheroptions%,MORE_ACCURATE 4438
%allotheroptions%,LESS_ACCURATE 4428 *

-use NO_ASM,UNROLL_ALL,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_WIDTH,T2_REVERSELINE,CARRY64,CHEBYSHEV_METHOD_FMA,CHEBYSHEV_MIDDLEMUL2,LESS_ACCURATE

4550/4428 =~ 1.0276
kriesel is offline   Reply With Quote
Old 2020-01-24, 03:28   #1804
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

485110 Posts
Default gpuowl v6.11-134 -use options testing on Radeon VII

Substantial tuning gained about 3% above program defaults.
Code:
gpuowl v6.11-134-g1e0ce1d
Radeon VII 16GB at 1244Mhz gpu clock, 880Mhz memory clock
Win10 x64
exponent 92561231 PRP
5M fft
-iters 10000 -time

NO_ASM 1017
NO_ASM 1014
NO_ASM,UNROLL_ALL 1015
NO_ASM,UNROLL_NONE 1001
NO_ASM,UNROLL_WIDTH 1002
NO_ASM,UNROLL_HEIGHT 1002
NO_ASM,UNROLL_MIDDLEMUL1 1013
NO_ASM,UNROLL_MIDDLEMUL2 989 *

NO_ASM,MERGED_MIDDLE,WORKINGIN 1393
NO_ASM,MERGED_MIDDLE,WORKINGIN 1391
NO_ASM,MERGED_MIDDLE,WORKINGIN1 1035
NO_ASM,MERGED_MIDDLE,WORKINGIN1A 1032
NO_ASM,MERGED_MIDDLE,WORKINGIN2 1038
NO_ASM,MERGED_MIDDLE,WORKINGIN3 1023 
NO_ASM,MERGED_MIDDLE,WORKINGIN4 1081
NO_ASM,MERGED_MIDDLE,WORKINGIN5 1010 *

NO_ASM,MERGED_MIDDLE,WORKINGOUT 1177
NO_ASM,MERGED_MIDDLE,WORKINGOUT0 1133
NO_ASM,MERGED_MIDDLE,WORKINGOUT1 1028 
NO_ASM,MERGED_MIDDLE,WORKINGOUT1A 1058
NO_ASM,MERGED_MIDDLE,WORKINGOUT2 1117
NO_ASM,MERGED_MIDDLE,WORKINGOUT3 1011 *
NO_ASM,MERGED_MIDDLE,WORKINGOUT4 1042
NO_ASM,MERGED_MIDDLE,WORKINGOUT5 1026

set wkgin=WORKINGIN5
set wkgout=WORKINGOUT3

NO_ASM,UNROLL_WIDTH,UNROLL_HEIGHT 1000
NO_ASM,UNROLL_WIDTH,UNROLL_MIDDLEMUL2 987
NO_ASM,UNROLL_HEIGHT,UNROLL_MIDDLEMUL2 989
NO_ASM,UNROLL_WIDTH,UNROLL_HEIGHT,UNROLL_MIDDLEMUL2 986 *

NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE_WIDTH 1017
NO_ASM,MERGED_MIDDLE,%wgkin%,%wkgout%,T2_SHUFFLE_MIDDLE 1022
NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE_HEIGHT 1012
NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE_REVERSELINE 1011 *
NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE 1029

set allotheroptions=NO_ASM,MERGED_MIDDLE,UNROLL_HEIGHT,UNROLL_WIDTH,WORKINGIN5,WORKINGOUT3
%allotheroptions%,T2_SHUFFLE_REVERSELINE,T2_SHUFFLE_HEIGHT 1014 *
%allotheroptions%,T2_SHUFFLE_REVERSELINE,T2_SHUFFLE_WIDTH 1016
%allotheroptions%,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_WIDTH,T2_SHUFFLE_REVERSELINE 1018
%allotheroptions%,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_REVERSELINE 1020
%allotheroptions%,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_WIDTH,T2_SHUFFLE_REVERSELINE 1028

set allotheroptions=NO_ASM,MERGED_MIDDLE,UNROLL_HEIGHT,UNROLL_WIDTH,WORKINGIN1,WORKINGOUT1,T2_SHUFFLE_WIDTH,T2_SHUFFLE_HEIGHT
%allotheroptions%,CARRY32 989 *
%allotheroptions%,CARRY64 1022

set allotheroptions=NO_ASM,MERGED_MIDDLE,UNROLL_HEIGHT,UNROLL_WIDTH,WORKINGIN1,WORKINGOUT1,T2_SHUFFLE_WIDTH,T2_SHUFFLE_HEIGHT,CARRY32
%allotheroptions%,FANCY_MIDDLEMUL1 1011
%allotheroptions%,MORE_SQUARES_MIDDLEMUL1 991
%allotheroptions%,CHEBYSHEV_METHOD 989 *
%allotheroptions%,CHEBYSHEV_METHOD_FMA 989 *
%allotheroptions%,ORIGINAL_METHOD 991
%allotheroptions%,ORIGINAL_TWEAKED 990

%allotheroptions%,ORIG_MIDDLEMUL2 987 *
%allotheroptions%,CHEBYSHEV_MIDDLEMUL2  988

%allotheroptions%,ORIG_SLOWTRIG 1022
%allotheroptions%,NEW_SLOWTRIG  988
%allotheroptions%,MORE_ACCURATE 988
%allotheroptions%,LESS_ACCURATE 986 *

NO_ASM,UNROLL_MIDDLEMUL2,MERGED_MIDDLE,WORKINGIN5,WORKINGOUT3,T2_SHUFFLE_REVERSELINE,T2_SHUFFLE_HEIGHT,CARRY32,CHEBYSHEV_METHOD,ORIG_MIDDLEMUL2,LESS_ACCURATE

repeatability +-1.5/1015.5 = 0.148%
base 1015.5
final 986
ratio 1015.5/986 = 1.030
timing overhead ~986/974-1 =~ .0123
kriesel is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1668 2020-12-22 15:38
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 02:59.

Thu Jan 28 02:59:20 UTC 2021 up 55 days, 23:10, 0 users, load averages: 3.80, 3.15, 2.85

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.