![]() |
|
|
#364 |
|
Oct 2019
5·19 Posts |
Same problems for MM107, but MM89 is normal:
Code:
/content/drive/My Drive/mmff-test mmff v0.28 (64bit built) Compiletime options THREADS_PER_BLOCK 256 MORE_CLASSES enabled Runtime options GPU Sieving enabled WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486) GPUSievePrimes depends on worktodo entry GPUSieveSize 128M bits WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8) GPUSieveProcessSize 8K bits WorkFile worktodo.txt Checkpoints enabled CheckpointDelay 30s StopAfterFactor class PrintMode full V5UserID (none) ComputerID (none) GPUProgressHeader " class | candidates | time | ETA | raw rate | SievePrimes | CPU wait" GPUProgressFormat "%C/4620 | %n | %ts | %e | %rM/s | %s | %W%%" TimeStampInResults no CUDA version info binary compiled for CUDA 10.10 CUDA runtime version 10.10 CUDA driver version 10.10 CUDA device info name Tesla P100-PCIE-16GB compute capability 6.0 maximum threads per block 1024 number of mutliprocessors 56 (unknown number of shader cores) clock rate 1328MHz got assignment: MM107, k range 41400000000000 to 41500000000000 (154-bit factors) Starting trial factoring of MM107 in k range: 41400G to 41500G (154-bit factors) k_min = 41400000000000 k_max = 41500000000000 Using GPU kernel "mfaktc_barrett160_M107gs" Verifying (2^(2^107)) % 13435069371854815219033511685499715361952762321 = 974520303404695347505301237807931102140431668099 ERROR: Verifying on CPU failed. Remainder didn't match. Possible problems exist. Code:
/content/drive/My Drive/mmff-test
mmff v0.28 (64bit built)
Compiletime options
THREADS_PER_BLOCK 256
MORE_CLASSES enabled
Runtime options
GPU Sieving enabled
WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486)
GPUSievePrimes depends on worktodo entry
GPUSieveSize 128M bits
WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8)
GPUSieveProcessSize 8K bits
WorkFile worktodo.txt
Checkpoints enabled
CheckpointDelay 30s
StopAfterFactor class
PrintMode full
V5UserID (none)
ComputerID (none)
GPUProgressHeader " class | candidates | time | ETA | raw rate | SievePrimes | CPU wait"
GPUProgressFormat "%C/4620 | %n | %ts | %e | %rM/s | %s | %W%%"
TimeStampInResults no
CUDA version info
binary compiled for CUDA 10.10
CUDA runtime version 10.10
CUDA driver version 10.10
CUDA device info
name Tesla P100-PCIE-16GB
compute capability 6.0
maximum threads per block 1024
number of mutliprocessors 56 (unknown number of shader cores)
clock rate 1328MHz
got assignment: MM89, k range 41400000000000 to 41500000000000 (136-bit factors)
Starting trial factoring of MM89 in k range: 41400G to 41500G (136-bit factors)
k_min = 41400000000000
k_max = 41500000000000
Using GPU kernel "mfaktc_barrett140_M89gs"
Verifying (2^(2^89)) % 51250722476366711691515168579592911982721 = 37671549122511752130292866601915335328068
class | candidates | time | ETA | raw rate | SievePrimes | CPU wait
0/4620 | 21.65M | 0.029s | n.a. | 746.60M/s | 649781 | n.a.%
Verifying (2^(2^89)) % 51250720280168236496304157387929107838071 = 35746096159163930640949829473693574340078
5/4620 | 21.65M | 0.029s | n.a. | 746.60M/s | 649781 | n.a.%
Verifying (2^(2^89)) % 51250719954174058311049257317093331713479 = 22759295645343611258946139802672470959760
9/4620 | 21.65M | 0.029s | n.a. | 746.60M/s | 649781 | n.a.%
Verifying (2^(2^89)) % 51250720852115103746739125095447985265401 = 41842644712508723081556126612950349320116
20/4620 | 21.65M | 0.028s | n.a. | 773.27M/s | 649781 | n.a.%
Verifying (2^(2^89)) % 51250721744324486800537682201019693669463 = 13062456361537928045073778273891658192745
21/4620 | 21.65M | 0.028s | n.a. | 773.27M/s | 649781 | n.a.%
Verifying (2^(2^89)) % 51250722110368501136753204925391936624199 = 11766302253559315831356912138896967481965
29/4620 | 21.65M | 0.028s | n.a. | 773.27M/s | 649781 | n.a.%
Verifying (2^(2^89)) % 51250720709149122429788413288172826239287 = 14816860850408810792926573186880149802296
33/4620 | 21.65M | 0.028s | n.a. | 773.27M/s | 649781 | n.a.%
Verifying (2^(2^89)) % 51250721052309815139813681631034757950353 = 41152310359413274585223328516751757168125
36/4620 | 21.65M | 0.028s | n.a. | 773.27M/s | 649781 | n.a.%
Verifying (2^(2^89)) % 51250721263933188975570868864490245452809 = 44183317763900802218115380969512121058940
44/4620 | 21.65M | 0.027s | n.a. | 801.91M/s | 649781 | n.a.%
Verifying (2^(2^89)) % 51250721378323800365697147786268920062497 = 18692344536121868666837048177982998180467
48/4620 | 21.65M | 0.027s | n.a. | 801.91M/s | 649781 | n.a.%
Verifying (2^(2^89)) % 51250721395487839010388945297745277400527 = 3166578919721146857552725561773689514712
53/4620 | 21.65M | 0.026s | n.a. | 832.75M/s | 649781 | n.a.%
Verifying (2^(2^89)) % 51250722287699697944266073163866784053033 = 37430319078903975242289720426417282202568
56/4620 | 21.65M | 0.027s | n.a. | 801.91M/s | 649781 | n.a.%
Verifying (2^(2^89)) % 51250721481285749313140796010178879854681 = 13236591153213340344689881456839734478969
60/4620 | 21.65M | 0.026s | n.a. | 832.75M/s | 649781 | n.a.%
Verifying (2^(2^89)) % 51250721086661413289943698879210555986631 = 29373416315097083858424261053021555658515
65/4620 | 21.65M | 0.026s | n.a. | 832.75M/s | 649781 | n.a.%
Verifying (2^(2^89)) % 51250720960840901517095503879288267435217 = 28658988341202110234172669662839524833844
68/4620 | 21.65M | 0.026s | n.a. | 832.75M/s | 649781 | n.a.%
...
|
|
|
|
|
|
#365 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
31×173 Posts |
Various builds of mmff v0.28 have been posted. Do any of these support GPUSieveSize from 128 to 2047, like the recent increase in mfaktc? There seems to be an advantage all the way up to 128 and a bit of underutilization left yet there, on a GTX1650, and there likely is on other fast gpus also.
win 7 x64 gtx1650 mmff tune mm127, 120000T to 120500T GPUSievePrimes 810549 GPUSieveSize 16 GpuSieveProcessSize 32 367.75 66W 95% utilization GPUSievePrimes 810549 GPUSieveSize 32 GpuSieveProcessSize 32 380.41 GPUSievePrimes 810549 GPUSieveSize 64 GpuSieveProcessSize 32 387.10 99% GPUSievePrimes 810549 GPUSieveSize 128 GpuSieveProcessSize 32 389.59 * 66W 99% GPUSievePrimes 810549 GPUSieveSize 256 GpuSieveProcessSize 32 GPUSieveSize capped at 128 |
|
|
|
|
|
#366 | |
|
Oct 2019
5×19 Posts |
Quote:
|
|
|
|
|
|
|
#367 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
31·173 Posts |
Quote:
It appears to me after graphing the GTX1650 data I've collected, to offer about 0.6% additional throughput on that gpu model, or 2 to 2.5 days per year, depending on a 2047 or 4095 revised limit. Based on mfaktc experience, the effect is likely larger for faster gpus, and there are considerably faster than the GTX1650, such as the RTX2080 and similar, or the Tesla T4. Last fiddled with by kriesel on 2020-02-26 at 13:59 |
|
|
|
|
|
|
#368 | |
|
Oct 2019
5F16 Posts |
Quote:
Last fiddled with by Fan Ming on 2020-02-26 at 14:23 |
|
|
|
|
|
|
#369 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
123638 Posts |
Please make and post a Windows 7 x64 through Windows 10 x64 CUDA 10.x compatible build allowing GPUSieveSize up to 2047. Switching to unsigned int for 4095 would be more work.
|
|
|
|
|
|
#370 |
|
Oct 2019
5·19 Posts |
Compiled fixed mmff 0.28 (in this post: https://www.mersenneforum.org/showpo...&postcount=360) CUDA 10.1 version for Windows 64bit using Microsoft Visual Studio 2012. This time all test cases should pass now(though some Exp failure problem described in this post: https://www.mersenneforum.org/showpo...&postcount=362 still remain unsolved for specific card). The 2047 version will be posted later.
Last fiddled with by Fan Ming on 2020-02-27 at 09:46 |
|
|
|
|
|
#371 |
|
Oct 2019
5×19 Posts |
Compiled fixed mmff 0.28 for Windows 64 with GPUSievesizemax enlarged to 2047. It seems some code in the gpusieve.cu require to negate the GPUSievesize and involves arithmetic for signed 32 bit integer, so I didn't make change for further 4095. Only 2047 version here.
|
|
|
|
|
|
#372 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
123638 Posts |
Thanks for the builds, Fan Ming!
As before, Win7x64, GTX1650, etc 128-2047 variation tune feb 28: Code:
GPUSievePrimes 810549 GPUSieveSize 128 GpuSieveProcessSize 32 384.14 62W/75 99% GPUSievePrimes 810549 GPUSieveSize 256 GpuSieveProcessSize 32 386.14 66w 100% GPUSievePrimes 810549 GPUSieveSize 512 GpuSieveProcessSize 32 386.24 65w 100% GPUSievePrimes 810549 GPUSieveSize 1024 GpuSieveProcessSize 32 386.65 63w 100% GPUSievePrimes 810549 GPUSieveSize 2047 GpuSieveProcessSize 32 386.66 * 386.66/384.14= 1.00656 gain from 2047 over 128 GPUSieveSize |
|
|
|
|
|
#373 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
31·173 Posts |
For mmff v0.28, I see here,
CUDA ? OS? source only? https://mersenneforum.org/showpost.p...&postcount=317 CUDA 6 Win x86 and x64 https://mersenneforum.org/mmff/ CUDA 8.0 linux https://mersenneforum.org/showpost.p...&postcount=329 CUDA 8.0 linux https://mersenneforum.org/showpost.p...&postcount=331 CUDA 8.0 linux x64 https://mersenneforum.org/showpost.p...&postcount=333 CUDA 10. win 64 https://mersenneforum.org/showpost.p...&postcount=335 CUDA 10.1 linux https://mersenneforum.org/showpost.p...&postcount=360 CUDA 10.1 Win https://mersenneforum.org/showpost.p...&postcount=370 CUDA 10.1 GpuSieveSize 2047 max Win https://mersenneforum.org/showpost.p...&postcount=371 Could we also get a CUDA 8.0 Win 64 build with GpuSieveSize 2047 max, posted here? That would suit GTX10xx. |
|
|
|
|
|
#374 |
|
"Dylan"
Mar 2017
24316 Posts |
Attached are two builds of mmff v0.28.1 (Gary's source), compiled on Ubuntu 20.04, with Cuda 10.1 and sm_61 (good for Pascal cards, ie GTX10xx). The first build is with a default max sieve size, the other is with max sieve size 2047. These run the worktodo_check file with no issues, however, MM107 still doesn't work:
Code:
dylan@dylan-G11CD:~/Desktop/mmff-0.28.1$ ./mmff.exe -v 3 mmff v0.28.1 (64bit built) Compiletime options THREADS_PER_BLOCK 256 MORE_CLASSES enabled Runtime options GPU Sieving enabled WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486) GPUSievePrimes depends on worktodo entry GPUSieveSize 128M bits WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8) GPUSieveProcessSize 8K bits WorkFile worktodo.txt Checkpoints enabled CheckpointDelay 300s StopAfterFactor disabled PrintMode full V5UserID (none) ComputerID (none) GPUProgressHeader " class | candidates | time | ETA | raw rate | SievePrimes | CPU wait" WARNING, no ProgressFormat specified in mmff.ini, using default ProgressFormat "%C/4620 | %n | %ts | %e | %rM/s | %s" TimeStampInResults no CUDA version info binary compiled for CUDA 10.10 CUDA runtime version 10.10 CUDA driver version 10.20 CUDA device info name GeForce GTX 1060 6GB compute capability 6.1 maximum threads per block 1024 number of mutliprocessors 10 (unknown number of shader cores) clock rate 1708MHz got assignment: MM107, k range 41400000000000 to 41500000000000 (154-bit factors) Starting trial factoring of MM107 in k range: 41400G to 41500G (154-bit factors) k_min = 41400000000000 k_max = 41500000000000 Using GPU kernel "mfaktc_barrett160_M107gs" Verifying (2^(2^107)) % 13435069353863506604210333952641545581205240561 = 549163915026848401193023077053146353871994535742 ERROR: Exponentiation failure Code:
dylan@dylan-G11CD:~/Desktop/mmff-0.28.1$ ./mmff.exe -v 3 mmff v0.28.1 (64bit built) Compiletime options THREADS_PER_BLOCK 256 MORE_CLASSES enabled Runtime options GPU Sieving enabled WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486) GPUSievePrimes depends on worktodo entry GPUSieveSize 128M bits WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8) GPUSieveProcessSize 8K bits WorkFile worktodo.txt Checkpoints enabled CheckpointDelay 300s StopAfterFactor disabled PrintMode full V5UserID (none) ComputerID (none) GPUProgressHeader " class | candidates | time | ETA | raw rate | SievePrimes | CPU wait" WARNING, no ProgressFormat specified in mmff.ini, using default ProgressFormat "%C/4620 | %n | %ts | %e | %rM/s | %s" TimeStampInResults no CUDA version info binary compiled for CUDA 10.10 CUDA runtime version 10.10 CUDA driver version 10.20 CUDA device info name GeForce GTX 1060 6GB compute capability 6.1 maximum threads per block 1024 number of mutliprocessors 10 (unknown number of shader cores) clock rate 1708MHz got assignment: MM107, k range 10000000000000000 to 12000000000000000 (162-bit factors) Starting trial factoring of MM107 in k range: 10P to 12P (162-bit factors) k_min = 10000000000000000 k_max = 12000000000000000 Using GPU kernel "mfaktc_barrett172_M107gs" Verifying (2^(2^107)) % 3245185537408870535270390810652173364064364295271 = 249933689397060655837985681873552465902105993031524 ERROR: Exponentiation failure |
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Mersenne trial division implementation | mathPuzzles | Math | 8 | 2017-04-21 07:21 |
| trial division over a factor base | Peter Hackman | Factoring | 7 | 2009-10-26 18:27 |
| P95 trial division strategy | SPWorley | Math | 8 | 2009-08-24 23:26 |
| Trial division software for Mersenne | SPWorley | Factoring | 7 | 2009-08-16 00:23 |
| Need GMP trial-division timings | ewmayer | Factoring | 7 | 2008-12-11 22:12 |