mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   The P-1 factoring CUDA program (https://www.mersenneforum.org/showthread.php?t=17835)

kriesel 2018-11-16 05:38

[QUOTE=aaronhaviland;500309]I seem to recall making some modifications to the memory allocations prior to my first git commit, but I cannot recall what they are.

We have to remember that it checks the available RAM before stage 1, as part of the bounds calculations:
...
But this memory is not actually allocated until much later, and the amount could have changed in that time. [/QUOTE]Much later, indeed. Even on fast gpus, a stage may take days for high exponents. Seems like recalculating right before stage 2 setup could help.[QUOTE]We have to be very careful not to exceed it (available memory) because therein lies fatal errors, and we do not have control over other applications that may also be using the same memory.

One reason I find the code uses less memory than what is available is that it (based on my understanding, at least):
[LIST=1][*]Determines the value of nrp based on the available memory and fft size (and for some reason restricts it to 4GiB on Windows. Possibly a 32-bit issue, or something from older CUDA versions?)[/LIST][/QUOTE]Could be left over from old compiler version limitations. I think it more likely a consequence of using the same code base for 64 bit and 32 bit application builds. Up to CUDA7.5 builds, 32bit builds were possible. I don't think 32bit builds are necessarily necessary any more. I'd be interested in other people's thoughts on that. There were some speed advantages in 32bit in older CUDA versions for CUDALucas, but they were not dramatic and perhaps not highly reproducible in benchmarking.

VictordeHolland 2018-11-16 11:50

1 Attachment(s)
[QUOTE=aaronhaviland;500255]Success compiling with MPIR.

64-bit binary attached
Requires CUDA 10, and a GPU with Compute Capability >= 3.5. Unsure of other requirements, I'm not too familiar with Windows dependencies.

[CODE]Microsoft Windows [Version 10.0.17134.407]
C:\Users\Aaron\Documents\Visual Studio 2017\Projects\CUDAPm1\x64\Release>CUDAPm1.exe 7990427 -b1 986 -b2 124000
CUDAPm1 v0.21
Assuming exponent is trial factored to 63 bits
------- DEVICE 0 -------
name GeForce RTX 2070
Compatibility 7.5
clockRate (MHz) 1710
memClockRate (MHz) 7001
totalGlobalMem 8589934592
totalConstMem 65536
l2CacheSize 4194304
sharedMemPerBlock 49152
regsPerBlock 65536
warpSize 32
memPitch 2147483647
maxThreadsPerBlock 1024
maxThreadsPerMP 1024
multiProcessorCount 36
maxThreadsDim[3] 1024,1024,64
maxGridSize[3] 2147483647,65535,65535
textureAlignment 512
deviceOverlap 1

No GeForceRTX2070_fft.txt file found. Using default fft lengths.
For optimal fft selection, please run
./CUDAPm1 -cufftbench 1 8192 r
for some small r, 0 < r < 6 e.g.
CUDA reports 6723M of 8192M GPU memory free.
No GeForceRTX2070_threads.txt file found. Running benchmark.
CUDA bench, testing various thread sizes for fft 448K, doing 15 passes.
fft size = 448K, square time = 0.0436 msec, threads 32
fft size = 448K, square time = 0.0449 msec, threads 64
fft size = 448K, square time = 0.0336 msec, threads 128
fft size = 448K, square time = 0.0335 msec, threads 256
fft size = 448K, square time = 0.0356 msec, threads 512
fft size = 448K, square time = 0.0438 msec, threads 1024

Best square time for fft = 448K, time: 0.0335, t = 256

fft size = 448K, ave time = 0.0408 msec, Norm1 threads 32, Norm2 threads 32
fft size = 448K, ave time = 0.0407 msec, Norm1 threads 32, Norm2 threads 64
fft size = 448K, ave time = 0.0408 msec, Norm1 threads 32, Norm2 threads 128
fft size = 448K, ave time = 0.0412 msec, Norm1 threads 32, Norm2 threads 256
fft size = 448K, ave time = 0.0419 msec, Norm1 threads 32, Norm2 threads 512
fft size = 448K, ave time = 0.0433 msec, Norm1 threads 32, Norm2 threads 1024
fft size = 448K, ave time = 0.0402 msec, Norm1 threads 64, Norm2 threads 32
fft size = 448K, ave time = 0.0402 msec, Norm1 threads 64, Norm2 threads 64
fft size = 448K, ave time = 0.0405 msec, Norm1 threads 64, Norm2 threads 128
fft size = 448K, ave time = 0.0406 msec, Norm1 threads 64, Norm2 threads 256
fft size = 448K, ave time = 0.0408 msec, Norm1 threads 64, Norm2 threads 512
fft size = 448K, ave time = 0.0428 msec, Norm1 threads 64, Norm2 threads 1024
fft size = 448K, ave time = 0.0394 msec, Norm1 threads 128, Norm2 threads 32
fft size = 448K, ave time = 0.0394 msec, Norm1 threads 128, Norm2 threads 64
fft size = 448K, ave time = 0.0397 msec, Norm1 threads 128, Norm2 threads 128
fft size = 448K, ave time = 0.0400 msec, Norm1 threads 128, Norm2 threads 256
fft size = 448K, ave time = 0.0411 msec, Norm1 threads 128, Norm2 threads 512
fft size = 448K, ave time = 0.0423 msec, Norm1 threads 128, Norm2 threads 1024
fft size = 448K, ave time = 0.0401 msec, Norm1 threads 256, Norm2 threads 32
fft size = 448K, ave time = 0.0394 msec, Norm1 threads 256, Norm2 threads 64
fft size = 448K, ave time = 0.0395 msec, Norm1 threads 256, Norm2 threads 128
fft size = 448K, ave time = 0.0403 msec, Norm1 threads 256, Norm2 threads 256
fft size = 448K, ave time = 0.0408 msec, Norm1 threads 256, Norm2 threads 512
fft size = 448K, ave time = 0.0423 msec, Norm1 threads 256, Norm2 threads 1024
fft size = 448K, ave time = 0.0417 msec, Norm1 threads 512, Norm2 threads 32
fft size = 448K, ave time = 0.0416 msec, Norm1 threads 512, Norm2 threads 64
fft size = 448K, ave time = 0.0417 msec, Norm1 threads 512, Norm2 threads 128
fft size = 448K, ave time = 0.0424 msec, Norm1 threads 512, Norm2 threads 256
fft size = 448K, ave time = 0.0428 msec, Norm1 threads 512, Norm2 threads 512
fft size = 448K, ave time = 0.0425 msec, Norm1 threads 512, Norm2 threads 1024

Best time for fft = 448K, time: 0.0394, t1 = 128, t2 = 256, t3 = 64
Using threads: norm1 256, mult 128, norm2 128.
Using up to 4119M GPU memory.
Starting stage 1 P-1, M7990427, B1 = 986, B2 = 124000, fft length = 448K
Doing 1452 iterations
M7990427, 0x32318b15f9d83ab6, n = 448K, CUDAPm1 v0.21
Stage 1 complete, estimated total time = 0:01
Starting stage 1 gcd.
M7990427 Stage 1 found no factor (P-1, B1=986, B2=124000, e=0, n=448K CUDAPm1 v0.21)
Starting stage 2.
Using b1 = 986, b2 = 124000, d = 420, e = 4, nrp = 96
Zeros: 4430, Ones: 8530, Pairs: 2981
Processing 1 - 96 of 96 relative primes.
Initializing pass... done. transforms: 1987, err = 0.02539, (0.71 real, 0.3550 ms/tran, ETA NA)
Transforms: 9204 M7990427, 0x456fdf3be182449c, n = 448K, CUDAPm1 v0.21 err = 0.02734 (0:03 real, 0.2873 ms/tran, ETA 0:02)
Transforms: 8928 M7990427, 0x2acd8bf807caa816, n = 448K, CUDAPm1 v0.21 err = 0.02734 (0:02 real, 0.2912 ms/tran, ETA 0:00)

Stage 2 complete, 20119 transforms, estimated total time = 0:05
Starting stage 2 gcd.
M7990427 has a factor: 10509037975912491881 (P-1, B1=986, B2=124000, e=4, n=448K CUDAPm1 v0.21)


C:\Users\Aaron\Documents\Visual Studio 2017\Projects\CUDAPm1\x64\Release>[/CODE][/QUOTE]
Looks like it works here!
W10 (1803) x64

CUDA10.0.130 (driver version 411.70)


GTX1080Ti
[code]
C:\CUDAPm1-CUDA10>CUDAPm1-CUDA10.exe 7990427 -b1 986 -b2 124000
CUDAPm1 v0.21
Assuming exponent is trial factored to 63 bits
Warning: Couldn't find .ini file. Using defaults for non-specified options.
CUDA reports 9312M of 11264M GPU memory free.
No GeForceGTX1080Ti_threads.txt file found. Running benchmark.
CUDA bench, testing various thread sizes for fft 512K, doing 15 passes.
fft size = 512K, square time = 0.0346 msec, threads 32
fft size = 512K, square time = 0.0360 msec, threads 64
fft size = 512K, square time = 0.0362 msec, threads 128
fft size = 512K, square time = 0.0363 msec, threads 256
fft size = 512K, square time = 0.0372 msec, threads 512
fft size = 512K, square time = 0.0379 msec, threads 1024

Best square time for fft = 512K, time: 0.0346, t = 32

fft size = 512K, ave time = 0.0454 msec, Norm1 threads 32, Norm2 threads 32
fft size = 512K, ave time = 0.0452 msec, Norm1 threads 32, Norm2 threads 64
fft size = 512K, ave time = 0.0450 msec, Norm1 threads 32, Norm2 threads 128
fft size = 512K, ave time = 0.0453 msec, Norm1 threads 32, Norm2 threads 256
fft size = 512K, ave time = 0.0452 msec, Norm1 threads 32, Norm2 threads 512
fft size = 512K, ave time = 0.0460 msec, Norm1 threads 32, Norm2 threads 1024
fft size = 512K, ave time = 0.0445 msec, Norm1 threads 64, Norm2 threads 32
fft size = 512K, ave time = 0.0445 msec, Norm1 threads 64, Norm2 threads 64
fft size = 512K, ave time = 0.0449 msec, Norm1 threads 64, Norm2 threads 128
fft size = 512K, ave time = 0.0451 msec, Norm1 threads 64, Norm2 threads 256
fft size = 512K, ave time = 0.0456 msec, Norm1 threads 64, Norm2 threads 512
fft size = 512K, ave time = 0.0465 msec, Norm1 threads 64, Norm2 threads 1024
fft size = 512K, ave time = 0.0452 msec, Norm1 threads 128, Norm2 threads 32
fft size = 512K, ave time = 0.0452 msec, Norm1 threads 128, Norm2 threads 64
fft size = 512K, ave time = 0.0453 msec, Norm1 threads 128, Norm2 threads 128
fft size = 512K, ave time = 0.0453 msec, Norm1 threads 128, Norm2 threads 256
fft size = 512K, ave time = 0.0461 msec, Norm1 threads 128, Norm2 threads 512
fft size = 512K, ave time = 0.0475 msec, Norm1 threads 128, Norm2 threads 1024
fft size = 512K, ave time = 0.0455 msec, Norm1 threads 256, Norm2 threads 32
fft size = 512K, ave time = 0.0455 msec, Norm1 threads 256, Norm2 threads 64
fft size = 512K, ave time = 0.0456 msec, Norm1 threads 256, Norm2 threads 128
fft size = 512K, ave time = 0.0456 msec, Norm1 threads 256, Norm2 threads 256
fft size = 512K, ave time = 0.0470 msec, Norm1 threads 256, Norm2 threads 512
fft size = 512K, ave time = 0.0477 msec, Norm1 threads 256, Norm2 threads 1024
fft size = 512K, ave time = 0.0459 msec, Norm1 threads 512, Norm2 threads 32
fft size = 512K, ave time = 0.0462 msec, Norm1 threads 512, Norm2 threads 64
fft size = 512K, ave time = 0.0463 msec, Norm1 threads 512, Norm2 threads 128
fft size = 512K, ave time = 0.0464 msec, Norm1 threads 512, Norm2 threads 256
fft size = 512K, ave time = 0.0474 msec, Norm1 threads 512, Norm2 threads 512
fft size = 512K, ave time = 0.0475 msec, Norm1 threads 512, Norm2 threads 1024

Best time for fft = 512K, time: 0.0445, t1 = 64, t2 = 32, t3 = 32
Using threads: norm1 256, mult 128, norm2 128.
Using up to 4124M GPU memory.
Starting stage 1 P-1, M7990427, B1 = 986, B2 = 124000, fft length = 512K
Doing 1452 iterations
M7990427, 0x32318b15f9d83ab6, n = 512K, CUDAPm1 v0.21
Stage 1 complete, estimated total time = 0:01
Starting stage 1 gcd.
M7990427 Stage 1 found no factor (P-1, B1=986, B2=124000, e=0, n=512K CUDAPm1 v0.21)
Starting stage 2.
Using b1 = 986, b2 = 124000, d = 420, e = 4, nrp = 96
Zeros: 4430, Ones: 8530, Pairs: 2981
Processing 1 - 96 of 96 relative primes.
Initializing pass... done. transforms: 1987, err = 0.00134, (0.53 real, 0.2650 ms/tran, ETA NA)
Transforms: 18132 M7990427, 0x2acd8bf807caa816, n = 512K, CUDAPm1 v0.21 err = 0.00146 (0:05 real, 0.3128 ms/tran, ETA 0:00)

Stage 2 complete, 20119 transforms, estimated total time = 0:05
Starting stage 2 gcd.
M7990427 has a factor: 10509037975912491881 (P-1, B1=986, B2=124000, e=4, n=512K CUDAPm1 v0.21)
[/code]Anymore test cases that I should run?

kriesel 2018-11-16 14:46

[QUOTE=VictordeHolland;500342]Looks like it works here!
...Anymore test cases that I should run?[/QUOTE]
You could try some run of the mill manual P-1 assignments.
Or get adventurous and try some larger ones. Note, run time can be quite long, and some might fail to complete. If you hit a case that fails, please share the details.

If you want some verification candidates, here's an excerpt from the draft rewrite of the CUDAPm1 readme file.
[CODE] Run CUDAPm1 on some exponents with known factors that should be found, and
see whether you find them. Easiest way is to select from the following list,
exponents at or near the size you plan to run, and put them in the worktodo
file. The bounds necessary to find factors vary by exponent. CUDAPm1's
automatic parameter selection will be enough to find most but not all.

Exponent Min B1 Min B2 fft length notes
4444091 7 2,557 256k
50001781 94,709 4,067,587 2688k
51558151 5,953 2,034,041 2880k
54447193 1,181 682,009 3072k
58610467 70,843 694,201 3200k
61012769 10,273 1,572,097 3360k
81229789 6,709 11,282,221 4704K
100000081 1,289 7,554,653 5600K
120002191 1,563 3,109,391 7168K
150000713 15,131 2,294,519 8640K
200000183 953 1,138,061 11200K
200001187 204,983 207,821 11200K
200003173 4,651 229,813 11200K
249500221 4 2.58951e+9 14336K big bounds, much memory & time
249500501 307 167,381 14336K
290001377 2,551 34,354,769 16384K takes days

PFactor=1,2,4444091,-1,70,2
PFactor=1,2,50001781,-1,74,2
PFactor=1,2,51558151,-1,74,2
PFactor=1,2,54447193,-1,74,2
PFactor=1,2,58610467,-1,74,2
PFactor=1,2,61012769,-1,74,2
PFactor=1,2,81229789,-1,75,2
PFactor=1,2,100000081,-1,76,2
Pfactor=1,2,120002191,-1,75,2
Pfactor=1,2,150000713,-1,75,2
Pfactor=1,2,200001187,-1,75,2
PFactor=1,2,249500501,-1,75,2
PFactor=1,2,290001377,-1,75,2

Exponent Factor (may be composite) Prime factors
4444091 1809798096458971047321927127 = 8888183 x 319974553 x 636358278473
50001781
4392938042637898431087689 = 3 x 182851 x 8008229
51558151
755277543419074012358186647
54447193
17261184235049628259201
58610467
69057033982979789260999
61012769 2018028590362685212673
81229789 355078783674010195200030259699844128700274440385857
= 488121804389130135740149369 x 727438890213848757119753
100000081 3441393510714285782119
120002191 100835659918276033441
150000713 1447762785107694357647
200000183 849003842550205126847
200001187 3050161780881530584679
200003173 14652109287435525414352647642348599
= 4320552944485007 x 3391257895852957657
249500221 5168661482381201657
249500501 3571511465549660434777661921959439
= 11607130072256471 x 307699788260867209
290001377 10645243382592701071676802590718709559
= 1436135993277492383 x 7412420155488583273
or 90944796249039267769901814723364335322839708522092302667497 =
* 170370076089478747961 * 371696926552024067119 * 1436135993277492383

Feel free to pick your own.
Evaluate them at their equivalent of
http://www.mersenne.ca/exponent/249500501[/CODE]

aaronhaviland 2018-11-16 22:00

[QUOTE=kriesel;500347]If you want some verification candidates, here's an excerpt from the draft rewrite of the CUDAPm1 readme file.[/QUOTE]
This is a great list. I want to include some more "quick" candidates as tests as part of the build process, beyond what I already have. (And I want to find out if Visual Studio can run tests post-compile... right now I just have Makefile rules for that on *nix)

VictordeHolland 2018-11-17 08:54

1 Attachment(s)
I ran the ones that take an hour at the most:
[code] 4,444,091 7 2,557
50,001,781 94,709 4,067,587
51,558,151 5,953 2,034,041
54,447,193 1,181 682,009
58,610,467 70,843 694,201
61,012,769 10,273 1,572,097
81,229,789 6,709 11,282,221
100,000,081 1,289 7,554,653
120,002,191 1,563 3,109,391
150,000,713 15,131 2,294,519
200,000,183 953 1,138,061
200,001,187 204,983 207,821
200,003,173 4,651 229,813


Pminus1=1,2,4444091,-1,7,2557
Pminus1=1,2,50001781,-1,94709,4067587
Pminus1=1,2,51558151,-1,5953,2034041
Pminus1=1,2,54447193,-1,1181,682009
Pminus1=1,2,58610467,-1,70843,694201
Pminus1=1,2,61012769,-1,10273,1572097
Pminus1=1,2,81229789,-1,6709,11282221
Pminus1=1,2,100000081,-1,1289,7554653
Pminus1=1,2,120002191,-1,1563,3109391
Pminus1=1,2,150000713,-1,15131,2294519
Pminus1=1,2,200000183,-1,953,1138061
Pminus1=1,2,200001187,-1,204983,207821
Pminus1=1,2,200003173,-1,4651,229813[/code]and they completed succesfully:
[code]
M4444091 has a factor: 2843992382407199 (P-1, B1=7, B2=7, e=0, n=256K CUDAPm1 v0.21)
M50001781 has a factor: 4392938042637898431087689 (P-1, B1=94709, B2=4067587, e=12, n=2816K CUDAPm1 v0.21)
M51558151 has a factor: 755277543419074012358186647 (P-1, B1=5953, B2=2034041, e=12, n=2816K CUDAPm1 v0.21)
M54447193 has a factor: 17261184235049628259201 (P-1, B1=1181, B2=682009, e=12, n=3200K CUDAPm1 v0.21)
M58610467 has a factor: 69057033982979789260999 (P-1, B1=70843, B2=694201, e=12, n=3200K CUDAPm1 v0.21)
M61012769 has a factor: 2018028590362685212673 (P-1, B1=10273, B2=1572097, e=12, n=3456K CUDAPm1 v0.21)
M81229789 has a factor: 727438890213848757119753 (P-1, B1=6709, B2=11282221, e=12, n=4480K CUDAPm1 v0.21)
M100000081 has a factor: 3441393510714285782119 (P-1, B1=1289, B2=7554653, e=12, n=5760K CUDAPm1 v0.21)
M120002191 has a factor: 100835659918276033441 (P-1, B1=1563, B2=3109391, e=12, n=6912K CUDAPm1 v0.21)
M150000713 has a factor: 1447762785107694357647 (P-1, B1=15131, B2=2294519, e=12, n=8640K CUDAPm1 v0.21)
M200000183 has a factor: 849003842550205126847 (P-1, B1=953, B2=1138061, e=12, n=11200K CUDAPm1 v0.21)
M200001187 has a factor: 3050161780881530584679 (P-1, B1=204983, B2=207821, e=12, n=11200K CUDAPm1 v0.21)
M200003173 has a factor: 14652109287435525414352647642348599 (P-1, B1=4651, B2=229813, e=12, n=11200K CUDAPm1 v0.21)
[/code]

aaronhaviland 2018-11-18 00:58

[QUOTE=aaronhaviland;500367]I want to include some more "quick" candidates as tests as part of the build process, beyond what I already have. [/QUOTE]

Aaaand on that note, I've added some built-in self-tests into the code itself, instead of relying on the build process.
[CODE]-selftest Run a quick selftest (ETA: 0:16)
-selftest2 Run a longer selftest (ETA: 17:22)[/CODE]So far I have 5 "quick" self tests (< 10s each on my hardware), and 2 "slow" self tests (~ 10m each on my hardware).
Checkpoints, worktodo.txt, and results.txt I/O are completely disabled for these tests.

aaronhaviland 2018-11-18 20:34

[QUOTE=kriesel;500133]Yes. See for example [URL]https://www.mersenneforum.org/showpost.php?p=456324&postcount=2591[/URL] where 1024 squaring threads is bad, gives timings half what others do, in CUDALucas. There are also cases where 32 threads is bad. Compute capability 2.0 I think. CUDAPm1 issue #16.

There are also cases where certain fft lengths give bad results. As I recall these were found for old CUDA levels.[/QUOTE] Check for anomalous thread timings: Commit 36ceb29
Check for anomalous fft timings: Commit 538118a

[QUOTE]CUDALucas was modified to trap for a select few bad-residue cases; 0x02, 0x00, and 0xfffffffffffffffd. The CUDALucas v2.06beta traps for its known bad residues. Since CUDAPM1 was derived from CUDALucas, years before, it has some of the same issues as well as some of its own. CUDAPm1's list of bad residues is longer.[/QUOTE]Added check for this. Commit a2c7f50

aaronhaviland 2018-11-19 00:53

Releasing all the above as v0.22
(Binaries uploaded:[URL]https://github.com/ah42/cuda-p1/releases/tag/0.22[/URL])
[LIST][*]First proper release since forking[*](Originally based on code from [URL]https://sourceforge.net/projects/cudapm1/[/URL] (r52)[*]Compute Dickman's function live, instead of using incorrect precomputed values[*]Fix memory leaks in stage2[*]Fix fencepost error causing invalid results[*]Fix potential overflows[*]Use smaller data types when possible[*]Reduce kernel branching[*]Update build for CUDA 10.0 / Compute Capability 7.5[*]Split kernel code into individual files[*]Replace GMP with MPIR for easier cross-platform builds.[*]Automatically run threadbench if required[*]Add VS2017 and eclipse build files.[*]Implement internal self-test system[*]Allow full memory allocation on 64-bit windows builds[*]Contributions from kriesel:[LIST][*]Add test for known invalid residues[*]Comment & code formatting/cleanup[*]Add test for abnormally low threadbench timings[*]Add test for abnormally low fftbench timings[/LIST] [/LIST]

LaurV 2018-11-19 03:44

Now, that is a very good job, after so long time, sir! Hat off and bow. :bow:
We will give it a spin tonight when we reach home.

VictordeHolland 2018-11-19 11:48

Wow, great job!

kriesel 2018-11-19 19:50

[QUOTE=aaronhaviland;500475]Releasing all the above as v0.22
(Binaries uploaded:[URL]https://github.com/ah42/cuda-p1/releases/tag/0.22[/URL])
[LIST][*]First proper release since forking[*](Originally based on code from [URL]https://sourceforge.net/projects/cudapm1/[/URL] (r52)[*]Compute Dickman's function live, instead of using incorrect precomputed values[*]Fix memory leaks in stage2[*]Fix fencepost error causing invalid results[*]Fix potential overflows[*]Use smaller data types when possible[*]Reduce kernel branching[*]Update build for CUDA 10.0 / Compute Capability 7.5[*]Split kernel code into individual files[*]Replace GMP with MPIR for easier cross-platform builds.[*]Automatically run threadbench if required[*]Add VS2017 and eclipse build files.[*]Implement internal self-test system[*]Allow full memory allocation on 64-bit windows builds[*]Contributions from kriesel:[LIST][*]Add test for known invalid residues[*]Comment & code formatting/cleanup[*]Add test for abnormally low threadbench timings[*]Add test for abnormally low fftbench timings[/LIST] [/LIST][/QUOTE]
Outstanding!

I've updated my reference material to point to this (Aaron's post), and emailed James Heinrich with a link for updating his mirror.
What's next Aaron? Logging extensions, date/time stamp addition, and removal of CUDAPm1 v0.2x from every iteration or transforms progress record?
What would other users like to see, assuming Aaron is open to suggestions?
I'll test this in my production running and for changes in limits, after finishing out some V0.20 limits testing that is still ongoing.


All times are UTC. The time now is 23:19.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.