mersenneforum.org mtsieve
 Register FAQ Search Today's Posts Mark Forums Read

 2018-09-19, 13:23 #78 ET_ Banned     "Luigi" Aug 2002 Team Italia 476610 Posts Hi folks. I have a basic core2 machine at home, and an AVX-512 capable 4-core server via AWS. Now, I need a copy of gfndsieve compiled for Linux 64-bit and all the AVX optimizations turned on and no OpenCL, but on the AWS system I only have gcc version 4.8 and can't compile my optimized copy. Is anybody out there who can provide me with such file? Or should I use the correct -march flag an compile on my machine? Thanks in advance. Luigi --- Last fiddled with by ET_ on 2018-09-19 at 13:25
2018-09-19, 13:33   #79
rogue

"Mark"
Apr 2003
Between here and the

2·11·269 Posts

Quote:
 Originally Posted by ET_ Hi folks. I have a basic core2 machine at home, and an AVX-512 capable 4-core server via AWS. Now, I need a copy of gfndsieve compiled for Linux 64-bit and all the AVX optimizations turned on and no OpenCL, but on the AWS system I only have gcc version 4.8 and can't compile my optimized copy. Is anybody out there who can provide me with such file? Or should I use the correct -march flag an compile on my machine?
There is no AVX512 code (yet) and the decision to use AVX or SSE/FPU is decided a runtime based upon the capability of the CPU.

To disable compiling and linking with GPU code, set ENABLE_GPU to no in the makefile.

Once you do that, what issues are you getting on that box with gcc?

2018-09-19, 14:05   #80
ET_
Banned

"Luigi"
Aug 2002
Team Italia

2×2,383 Posts

Quote:
 Originally Posted by rogue There is no AVX512 code (yet) and the decision to use AVX or SSE/FPU is decided a runtime based upon the capability of the CPU. To disable compiling and linking with GPU code, set ENABLE_GPU to no in the makefile. Once you do that, what issues are you getting on that box with gcc?
I knew there were distinct code paths enabled on the executable, but I thought one had to enable the relative processor optimizations to have the code recognize it.

In other words, if my code is compiled with -march=native, and I have a Intel G2030 processor (a crippled ivy-bridge with no AVX / FMA3 support), will the executable automatically run the FMA3 path once it is run on a AWS Skylake architecture?

If so, then I solved the issue.
If not, then I should recompile the code on an architecture whose "native" processor recognizes the optimizations. But the AWS gcc is locked at version 4.8, and I'm afraid it wouldn't recognize FMA3 optimizations.

I'm a master in complicating my own life...

2018-09-19, 14:32   #81
rogue

"Mark"
Apr 2003
Between here and the

2·11·269 Posts

Quote:
 Originally Posted by ET_ I knew there were distinct code paths enabled on the executable, but I thought one had to enable the relative processor optimizations to have the code recognize it. In other words, if my code is compiled with -march=native, and I have a Intel G2030 processor (a crippled ivy-bridge with no AVX / FMA3 support), will the executable automatically run the FMA3 path once it is run on a AWS Skylake architecture? If so, then I solved the issue. If not, then I should recompile the code on an architecture whose "native" processor recognizes the optimizations. But the AWS gcc is locked at version 4.8, and I'm afraid it wouldn't recognize FMA3 optimizations. I'm a master in complicating my own life...
It should. It calls a gcc function called builtin_cpu_supports() when deciding if it can use AVX code. I assume that function checks something specific to the computer upon which the code is executing.

2018-09-19, 14:50   #82
ET_
Banned

"Luigi"
Aug 2002
Team Italia

2·2,383 Posts

Quote:
 Originally Posted by rogue It should. It calls a gcc function called builtin_cpu_supports() when deciding if it can use AVX code. I assume that function checks something specific to the computer upon which the code is executing.
Thank you Mark. I will test it and then report here. BTW, is there a message saying what optimizations are used at runtime?

2018-09-19, 15:47   #83
ET_
Banned

"Luigi"
Aug 2002
Team Italia

2·2,383 Posts

Quote:
 Originally Posted by ET_ Thank you Mark. I will test it and then report here. BTW, is there a message saying what optimizations are used at runtime?
After the 50G primes tested with the same executable, the AVX version on the Xeon is 35%-40% faster than the base version at the same clock.

2018-09-19, 16:51   #84
rogue

"Mark"
Apr 2003
Between here and the

591810 Posts

Quote:
 Originally Posted by ET_ Thank you Mark. I will test it and then report here. BTW, is there a message saying what optimizations are used at runtime?
Not at this time, but it is something I have considered adding.

 2018-09-26, 01:36 #85 rogue     "Mark" Apr 2003 Between here and the 2×11×269 Posts I have posted mtsieve 1.8.0 at my website. Here are the changes: Code:  Added twinsieve. This is more than 3x faster than newpgen's twin sieve. Modified OpenCL code to change calculation for default workunits to improve GPU throughput. Modified "start sieving" message to include expected factors, but only if -P is not the default value. Modified all sieves to have custom "start sieving message" so it each show more detail specific to that sieve. The default behavior of twinsieve is to sieve such that only potential twin primes are remaining, but there is a -i switch that allows one to sieve the +1 and -1 side independently. Last fiddled with by rogue on 2018-09-26 at 01:38
2018-09-26, 07:33   #86
pepi37

Dec 2011
After milion nines:)

17×79 Posts

Quote:
 Originally Posted by rogue I have posted mtsieve 1.8.0 at my website. Here are the changes: Code:  Added twinsieve. This is more than 3x faster than newpgen's twin sieve. Modified OpenCL code to change calculation for default workunits to improve GPU throughput. Modified "start sieving" message to include expected factors, but only if -P is not the default value. Modified all sieves to have custom "start sieving message" so it each show more detail specific to that sieve. The default behavior of twinsieve is to sieve such that only potential twin primes are remaining, but there is a -i switch that allows one to sieve the +1 and -1 side independently.

In twinsieve you use switch -i on two different places

-i --inputterms=i input file of remaining candidates
-i --independent Sieve +1 and -1 independently
if (!ib_OnlyTwins && it_Format == FF_ABC)
FatalError("Can only support ABC format if sieving +1 and -1 independently");
If i use --independent then is always zero output regardless format ABC

d:\MTSIEVE\TWINSIEVE>twinsieve -P100000000000 -w10000000 -i1.npg -ofact.txt -W4 -fN -r
twinsieve v1.0.0, a program to find factors of k*b^n+1/-1 numbers for fixed b and n and variable k
Sieve started: 30000000001 < p < 1e11 with 18446744073709502166 terms (261 < k < 99309, k*2^1778899) (expecting 876855490500155136 factors)

If in command line stay switch -r then you got this , if you remove it, then all is ok

 2018-09-26, 12:19 #87 pepi37     Dec 2011 After milion nines:) 17·79 Posts And last If sieve passed 54105949591 ( or very close up to this value) then will be no output and program just terminate. If sieve depth is lower then that value, program gives output as should do.
2018-09-26, 13:08   #88
rogue

"Mark"
Apr 2003
Between here and the

2×11×269 Posts

Quote:
 Originally Posted by pepi37 In twinsieve you use switch -i on two different places -i --inputterms=i input file of remaining candidates -i --independent Sieve +1 and -1 independently if (!ib_OnlyTwins && it_Format == FF_ABC) FatalError("Can only support ABC format if sieving +1 and -1 independently"); If i use --independent then is always zero output regardless format ABC d:\MTSIEVE\TWINSIEVE>twinsieve -P100000000000 -w10000000 -i1.npg -ofact.txt -W4 -fN -r twinsieve v1.0.0, a program to find factors of k*b^n+1/-1 numbers for fixed b and n and variable k Sieve started: 30000000001 < p < 1e11 with 18446744073709502166 terms (261 < k < 99309, k*2^1778899) (expecting 876855490500155136 factors) If in command line stay switch -r then you got this , if you remove it, then all is ok
I'll switch it to use a different character as -i is reserved for the underlying framework.

All times are UTC. The time now is 07:15.

Mon Sep 28 07:15:32 UTC 2020 up 18 days, 4:26, 0 users, load averages: 1.61, 1.45, 1.51

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.