![]() |
Same problems for [B]MM107[/B], but MM89 is normal:
[CODE]/content/drive/My Drive/mmff-test mmff v0.28 (64bit built) Compiletime options THREADS_PER_BLOCK 256 MORE_CLASSES enabled Runtime options GPU Sieving enabled WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486) GPUSievePrimes depends on worktodo entry GPUSieveSize 128M bits WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8) GPUSieveProcessSize 8K bits WorkFile worktodo.txt Checkpoints enabled CheckpointDelay 30s StopAfterFactor class PrintMode full V5UserID (none) ComputerID (none) GPUProgressHeader " class | candidates | time | ETA | raw rate | SievePrimes | CPU wait" GPUProgressFormat "%C/4620 | %n | %ts | %e | %rM/s | %s | %W%%" TimeStampInResults no CUDA version info binary compiled for CUDA 10.10 CUDA runtime version 10.10 CUDA driver version 10.10 CUDA device info name Tesla P100-PCIE-16GB compute capability 6.0 maximum threads per block 1024 number of mutliprocessors 56 (unknown number of shader cores) clock rate 1328MHz got assignment: MM107, k range 41400000000000 to 41500000000000 (154-bit factors) Starting trial factoring of MM107 in k range: 41400G to 41500G (154-bit factors) k_min = 41400000000000 k_max = 41500000000000 Using GPU kernel "mfaktc_barrett160_M107gs" Verifying (2^(2^107)) % 13435069371854815219033511685499715361952762321 = 974520303404695347505301237807931102140431668099 ERROR: Verifying on CPU failed. Remainder didn't match. Possible problems exist.[/CODE] MM89 works properly: [CODE]/content/drive/My Drive/mmff-test mmff v0.28 (64bit built) Compiletime options THREADS_PER_BLOCK 256 MORE_CLASSES enabled Runtime options GPU Sieving enabled WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486) GPUSievePrimes depends on worktodo entry GPUSieveSize 128M bits WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8) GPUSieveProcessSize 8K bits WorkFile worktodo.txt Checkpoints enabled CheckpointDelay 30s StopAfterFactor class PrintMode full V5UserID (none) ComputerID (none) GPUProgressHeader " class | candidates | time | ETA | raw rate | SievePrimes | CPU wait" GPUProgressFormat "%C/4620 | %n | %ts | %e | %rM/s | %s | %W%%" TimeStampInResults no CUDA version info binary compiled for CUDA 10.10 CUDA runtime version 10.10 CUDA driver version 10.10 CUDA device info name Tesla P100-PCIE-16GB compute capability 6.0 maximum threads per block 1024 number of mutliprocessors 56 (unknown number of shader cores) clock rate 1328MHz got assignment: MM89, k range 41400000000000 to 41500000000000 (136-bit factors) Starting trial factoring of MM89 in k range: 41400G to 41500G (136-bit factors) k_min = 41400000000000 k_max = 41500000000000 Using GPU kernel "mfaktc_barrett140_M89gs" Verifying (2^(2^89)) % 51250722476366711691515168579592911982721 = 37671549122511752130292866601915335328068 class | candidates | time | ETA | raw rate | SievePrimes | CPU wait 0/4620 | 21.65M | 0.029s | n.a. | 746.60M/s | 649781 | n.a.% Verifying (2^(2^89)) % 51250720280168236496304157387929107838071 = 35746096159163930640949829473693574340078 5/4620 | 21.65M | 0.029s | n.a. | 746.60M/s | 649781 | n.a.% Verifying (2^(2^89)) % 51250719954174058311049257317093331713479 = 22759295645343611258946139802672470959760 9/4620 | 21.65M | 0.029s | n.a. | 746.60M/s | 649781 | n.a.% Verifying (2^(2^89)) % 51250720852115103746739125095447985265401 = 41842644712508723081556126612950349320116 20/4620 | 21.65M | 0.028s | n.a. | 773.27M/s | 649781 | n.a.% Verifying (2^(2^89)) % 51250721744324486800537682201019693669463 = 13062456361537928045073778273891658192745 21/4620 | 21.65M | 0.028s | n.a. | 773.27M/s | 649781 | n.a.% Verifying (2^(2^89)) % 51250722110368501136753204925391936624199 = 11766302253559315831356912138896967481965 29/4620 | 21.65M | 0.028s | n.a. | 773.27M/s | 649781 | n.a.% Verifying (2^(2^89)) % 51250720709149122429788413288172826239287 = 14816860850408810792926573186880149802296 33/4620 | 21.65M | 0.028s | n.a. | 773.27M/s | 649781 | n.a.% Verifying (2^(2^89)) % 51250721052309815139813681631034757950353 = 41152310359413274585223328516751757168125 36/4620 | 21.65M | 0.028s | n.a. | 773.27M/s | 649781 | n.a.% Verifying (2^(2^89)) % 51250721263933188975570868864490245452809 = 44183317763900802218115380969512121058940 44/4620 | 21.65M | 0.027s | n.a. | 801.91M/s | 649781 | n.a.% Verifying (2^(2^89)) % 51250721378323800365697147786268920062497 = 18692344536121868666837048177982998180467 48/4620 | 21.65M | 0.027s | n.a. | 801.91M/s | 649781 | n.a.% Verifying (2^(2^89)) % 51250721395487839010388945297745277400527 = 3166578919721146857552725561773689514712 53/4620 | 21.65M | 0.026s | n.a. | 832.75M/s | 649781 | n.a.% Verifying (2^(2^89)) % 51250722287699697944266073163866784053033 = 37430319078903975242289720426417282202568 56/4620 | 21.65M | 0.027s | n.a. | 801.91M/s | 649781 | n.a.% Verifying (2^(2^89)) % 51250721481285749313140796010178879854681 = 13236591153213340344689881456839734478969 60/4620 | 21.65M | 0.026s | n.a. | 832.75M/s | 649781 | n.a.% Verifying (2^(2^89)) % 51250721086661413289943698879210555986631 = 29373416315097083858424261053021555658515 65/4620 | 21.65M | 0.026s | n.a. | 832.75M/s | 649781 | n.a.% Verifying (2^(2^89)) % 51250720960840901517095503879288267435217 = 28658988341202110234172669662839524833844 68/4620 | 21.65M | 0.026s | n.a. | 832.75M/s | 649781 | n.a.% ...[/CODE] |
GPUSieveSize limit
Various builds of mmff v0.28 have been posted. Do any of these support GPUSieveSize from 128 to 2047, like the recent increase in mfaktc? There seems to be an advantage all the way up to 128 and a bit of underutilization left yet there, on a GTX1650, and there likely is on other fast gpus also.
win 7 x64 gtx1650 mmff tune mm127, 120000T to 120500T GPUSievePrimes 810549 GPUSieveSize 16 GpuSieveProcessSize 32 367.75 66W 95% utilization GPUSievePrimes 810549 GPUSieveSize 32 GpuSieveProcessSize 32 380.41 GPUSievePrimes 810549 GPUSieveSize 64 GpuSieveProcessSize 32 387.10 99% GPUSievePrimes 810549 GPUSieveSize 128 GpuSieveProcessSize 32 389.59 * 66W 99% GPUSievePrimes 810549 GPUSieveSize 256 GpuSieveProcessSize 32 GPUSieveSize capped at 128 |
[QUOTE=kriesel;538323]Various builds of mmff v0.28 have been posted. Do any of these support GPUSieveSize from 128 to 2047, like the recent increase in mfaktc? There seems to be an advantage all the way up to 128 and a bit of underutilization left yet there, on a GTX1650, and there likely is on other fast gpus also.
[/QUOTE] I've ever tried to enlarge the upper limit to 2047, however, the speed gain seems no significant. I experimented it on colab T4. |
1 Attachment(s)
[QUOTE=Fan Ming;538340]I've ever tried to enlarge the upper limit to 2047, however, the speed gain seems no significant. I experimented it on colab T4.[/QUOTE]
Thanks for your response. Please post any T4 throughput data versus GPUSieveSize that you have collected. It appears to me after graphing the GTX1650 data I've collected, to offer about 0.6% additional throughput on that gpu model, or 2 to 2.5 days per year, depending on a 2047 or 4095 revised limit. Based on mfaktc experience, the effect is likely larger for faster gpus, and there are considerably faster than the GTX1650, such as the RTX2080 and similar, or the Tesla T4. |
[QUOTE=kriesel;538364]Thanks for your response. Please post any T4 throughput data versus GPUSieveSize that you have collected.
It appears to me after graphing the GTX1650 data I've collected, to offer about 0.6% additional throughput on that gpu model, or 2 to 2.5 days per year, depending on a 2047 or 4095 revised limit. Based on mfaktc experience, the effect is likely larger for faster gpus, and there are considerably faster than the GTX1650, such as the RTX2080 and similar, or the Tesla T4.[/QUOTE] Sorry I didn't keep the detailed data. I tested MM89, and Raw rate is about 1340? when GPUSieveSize is 128, and still ~1340 when GPUSieveSize is 2047. Since the change was not too significant, I'm not impressed with that and didn't keep the data. |
2047 GPUSieveSize limit Windows build requested
Please make and post a Windows 7 x64 through Windows 10 x64 CUDA 10.x compatible build allowing GPUSieveSize up to 2047. Switching to unsigned int for 4095 would be more work.
|
1 Attachment(s)
Compiled fixed mmff 0.28 (in this post: [url]https://www.mersenneforum.org/showpost.php?p=535756&postcount=360[/url]) CUDA 10.1 version for Windows 64bit using Microsoft Visual Studio 2012. This time all test cases should pass now(though some Exp failure problem described in this post: [url]https://www.mersenneforum.org/showpost.php?p=535994&postcount=362[/url] still remain unsolved for specific card). The 2047 version will be posted later.
|
1 Attachment(s)
Compiled fixed mmff 0.28 for Windows 64 with GPUSievesizemax enlarged to 2047. It seems some code in the gpusieve.cu require to negate the GPUSievesize and involves arithmetic for signed 32 bit integer, so I didn't make change for further 4095. Only 2047 version here.
|
Going to 2047
Thanks for the builds, Fan Ming!
As before, Win7x64, GTX1650, etc 128-2047 variation tune feb 28: [CODE]GPUSievePrimes 810549 GPUSieveSize 128 GpuSieveProcessSize 32 384.14 62W/75 99% GPUSievePrimes 810549 GPUSieveSize 256 GpuSieveProcessSize 32 386.14 66w 100% GPUSievePrimes 810549 GPUSieveSize 512 GpuSieveProcessSize 32 386.24 65w 100% GPUSievePrimes 810549 GPUSieveSize 1024 GpuSieveProcessSize 32 386.65 63w 100% GPUSievePrimes 810549 GPUSieveSize 2047 GpuSieveProcessSize 32 386.66 * 386.66/384.14= 1.00656 gain from 2047 over 128 GPUSieveSize[/CODE] I would expect somewhat more gain than that ratio, on faster gpus. |
Build request
For mmff v0.28, I see here,
CUDA ? OS? source only? [URL]https://mersenneforum.org/showpost.php?p=376423&postcount=317[/URL] CUDA 6 Win x86 and x64 [URL]https://mersenneforum.org/mmff/[/URL] CUDA 8.0 linux [URL]https://mersenneforum.org/showpost.php?p=497116&postcount=329[/URL] CUDA 8.0 linux [URL]https://mersenneforum.org/showpost.php?p=497151&postcount=331[/URL] CUDA 8.0 linux x64 [URL]https://mersenneforum.org/showpost.php?p=497231&postcount=333[/URL] CUDA 10. win 64 [URL]https://mersenneforum.org/showpost.php?p=505723&postcount=335[/URL] CUDA 10.1 linux [URL]https://mersenneforum.org/showpost.php?p=535756&postcount=360[/URL] CUDA 10.1 Win [URL]https://mersenneforum.org/showpost.php?p=538430&postcount=370[/URL] CUDA 10.1 GpuSieveSize 2047 max Win [URL]https://mersenneforum.org/showpost.php?p=538431&postcount=371[/URL] Could we also get a CUDA 8.0 Win 64 build with GpuSieveSize 2047 max, posted here? That would suit GTX10xx. |
2 Attachment(s)
Attached are two builds of mmff v0.28.1 (Gary's source), compiled on Ubuntu 20.04, with Cuda 10.1 and sm_61 (good for Pascal cards, ie GTX10xx). The first build is with a default max sieve size, the other is with max sieve size 2047. These run the worktodo_check file with no issues, however, MM107 still doesn't work:
[CODE]dylan@dylan-G11CD:~/Desktop/mmff-0.28.1$ ./mmff.exe -v 3 mmff v0.28.1 (64bit built) Compiletime options THREADS_PER_BLOCK 256 MORE_CLASSES enabled Runtime options GPU Sieving enabled WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486) GPUSievePrimes depends on worktodo entry GPUSieveSize 128M bits WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8) GPUSieveProcessSize 8K bits WorkFile worktodo.txt Checkpoints enabled CheckpointDelay 300s StopAfterFactor disabled PrintMode full V5UserID (none) ComputerID (none) GPUProgressHeader " class | candidates | time | ETA | raw rate | SievePrimes | CPU wait" WARNING, no ProgressFormat specified in mmff.ini, using default ProgressFormat "%C/4620 | %n | %ts | %e | %rM/s | %s" TimeStampInResults no CUDA version info binary compiled for CUDA 10.10 CUDA runtime version 10.10 CUDA driver version 10.20 CUDA device info name GeForce GTX 1060 6GB compute capability 6.1 maximum threads per block 1024 number of mutliprocessors 10 (unknown number of shader cores) clock rate 1708MHz got assignment: MM107, k range 41400000000000 to 41500000000000 (154-bit factors) Starting trial factoring of MM107 in k range: 41400G to 41500G (154-bit factors) k_min = 41400000000000 k_max = 41500000000000 Using GPU kernel "mfaktc_barrett160_M107gs" Verifying (2^(2^107)) % 13435069353863506604210333952641545581205240561 = 549163915026848401193023077053146353871994535742 ERROR: Exponentiation failure[/CODE]It even persists with a leading edge range, which uses a different kernel than what Fan Ming used in [URL="https://mersenneforum.org/showpost.php?p=535997&postcount=364"]post 364[/URL]: [CODE]dylan@dylan-G11CD:~/Desktop/mmff-0.28.1$ ./mmff.exe -v 3 mmff v0.28.1 (64bit built) Compiletime options THREADS_PER_BLOCK 256 MORE_CLASSES enabled Runtime options GPU Sieving enabled WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486) GPUSievePrimes depends on worktodo entry GPUSieveSize 128M bits WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8) GPUSieveProcessSize 8K bits WorkFile worktodo.txt Checkpoints enabled CheckpointDelay 300s StopAfterFactor disabled PrintMode full V5UserID (none) ComputerID (none) GPUProgressHeader " class | candidates | time | ETA | raw rate | SievePrimes | CPU wait" WARNING, no ProgressFormat specified in mmff.ini, using default ProgressFormat "%C/4620 | %n | %ts | %e | %rM/s | %s" TimeStampInResults no CUDA version info binary compiled for CUDA 10.10 CUDA runtime version 10.10 CUDA driver version 10.20 CUDA device info name GeForce GTX 1060 6GB compute capability 6.1 maximum threads per block 1024 number of mutliprocessors 10 (unknown number of shader cores) clock rate 1708MHz got assignment: MM107, k range 10000000000000000 to 12000000000000000 (162-bit factors) Starting trial factoring of MM107 in k range: 10P to 12P (162-bit factors) k_min = 10000000000000000 k_max = 12000000000000000 Using GPU kernel "mfaktc_barrett172_M107gs" Verifying (2^(2^107)) % 3245185537408870535270390810652173364064364295271 = 249933689397060655837985681873552465902105993031524 ERROR: Exponentiation failure[/CODE] |
| All times are UTC. The time now is 00:40. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.