mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software > Mlucas

Reply
 
Thread Tools
Old 2018-01-01, 20:32   #78
heliosh
 
Oct 2017
++41

7D16 Posts
Default

I've installed debian:arm64, recompiled mlucas and ran the tests again (which now went through without memory allocation problem). Still with 80% background load on one core.
In average the tests were sped up by a factor of 1.5 compared to raspbian 32-Bit..
Possibly some thermal throttling was involved since I don't have a heatsink yet. The clock was always at 1.2GHz when I checked, however with the temps around 75-77°C. I've read that thermal throttling starts at 80°C.

64-Bit mlucas.cfg:
https://pastebin.com/raw/H2H9dkWH
heliosh is offline   Reply With Quote
Old 2018-01-02, 01:10   #79
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

101101111011002 Posts
Default

Quote:
Originally Posted by heliosh View Post
I've installed debian:arm64, recompiled mlucas and ran the tests again (which now went through without memory allocation problem). Still with 80% background load on one core.
In average the tests were sped up by a factor of 1.5 compared to raspbian 32-Bit..
Possibly some thermal throttling was involved since I don't have a heatsink yet. The clock was always at 1.2GHz when I checked, however with the temps around 75-77°C. I've read that thermal throttling starts at 80°C.

64-Bit mlucas.cfg:
https://pastebin.com/raw/H2H9dkWH
Thanks! I've not looked much at the Paspberry Pi series of micro-PCs ... what do you think acounts for the large speed difference between your Pi3 and my Odroid C2. Clock speed difference is only 1.2 vs 1.5 GHz - does Pi3 use a substantially different implementation-in-Silicon of the A53 processor?

Or perhaps more pertinently, compare your timings vs ET_'s on his Pi3 - his timings are only modestly slower - roughly 70% the throughput - than my C2.

Last fiddled with by ewmayer on 2018-01-02 at 01:14
ewmayer is offline   Reply With Quote
Old 2018-01-02, 07:35   #80
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

10011000000012 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Thanks! I've not looked much at the Paspberry Pi series of micro-PCs ... what do you think acounts for the large speed difference between your Pi3 and my Odroid C2. Clock speed difference is only 1.2 vs 1.5 GHz - does Pi3 use a substantially different implementation-in-Silicon of the A53 processor?

Or perhaps more pertinently, compare your timings vs ET_'s on his Pi3 - his timings are only modestly slower - roughly 70% the throughput - than my C2.
AFAIK, the Odroid-C2 has 2GB of RAM instead of just 1 (but I have no idea about the controller), and a faster access to the "disk".
ET_ is offline   Reply With Quote
Old 2018-01-02, 09:25   #81
heliosh
 
Oct 2017
++41

53 Posts
Default

I have rerun the 4096k-Test without any background load and it is still a lot slower than ET_'s:
Code:
4096  msec/iter =  683.82  ROE[avg,max] = [0.254464286, 0.312500000]  radices = 256  8  8  8 16  0  0  0  0  0
I can only think of two things where the significant discrepancy is coming from:
- my compiler-flags were horribly wrong (plus i use GCC-6, while ET_ has used GCC-5)
- Thermal issue

None of those sound like a plausible explanation to me, but who knows.

Edit: I was checking pi64-config for the CPU-frequency and it said: "Throttling occured [under-voltage throttled], your RPI doesn't perform well under load. This usually happens because of a suboptimal power supply cable." I'll check that.

Last fiddled with by heliosh on 2018-01-02 at 09:36
heliosh is offline   Reply With Quote
Old 2018-01-02, 09:35   #82
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

114018 Posts
Default

Quote:
Originally Posted by heliosh View Post
I have rerun the 4096k-Test without any background load and it is still a lot slower than ET_'s:
Code:
4096  msec/iter =  683.82  ROE[avg,max] = [0.254464286, 0.312500000]  radices = 256  8  8  8 16  0  0  0  0  0
I can only think of two things where the significant discrepancy is coming from:
- my compiler-flags were horribly wrong (plus i use GCC-6, while ET_ has used GCC-5)
- Thermal issue

None of those sound like a plausible explanation to me, but who knows.
Did you use the following command?
Code:
gcc -c -O3 -DUSE_ARM_V8_SIMD -DUSE_THREADS ../src/*.c >& build.log
Did you start from a clean evironment, with no precompiled object files from previous building tests?

Luigi
ET_ is offline   Reply With Quote
Old 2018-01-02, 12:01   #83
heliosh
 
Oct 2017
++41

12510 Posts
Default

I've had "-O3 -mcpu=cortex-a53"
I've compiled it now with -DUSE_ARM_V8_SIMD, but I instantly get a segfault when running the tests. And yes, the object files were deleted.

Last fiddled with by heliosh on 2018-01-02 at 12:02
heliosh is offline   Reply With Quote
Old 2018-01-02, 13:58   #84
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

2×52×19 Posts
Default

If I were you I'd use the gentoo distro me and ET use. I initially tried a debian 64 distro and encountered very similar problems you have, down to the seg faults (check the other thread). The timings you posted are slower probably due to being scalar, but also much lower than my scalar timings so the undervolting is a big part of the issue.

Definitely replace the power supply, it could be a cheap cable or an insufficient transformer. If you're using an old phone charger that's probably the issue, I have one that's only rated for 5V @ 300mA, which fails at ~1000mA. Fine for light use, but under full load my pi 3 oscillates around 1000mA +-250mA. Any modern phone transformer is probably fine, most seem to be rated for 2000mA or 2400mA.

Go for a heatsink on the SoC, I have an aluminium one which I think is just about keeping up but would get copper if doing it again. You don't need to put a heatsink on the io chip, but I would recommend one on the RAM chip on the underside, or at least drilling a hole in the case (if you use one), as otherwise it's getting nearly no airflow.
M344587487 is offline   Reply With Quote
Old 2018-01-02, 14:31   #85
heliosh
 
Oct 2017
++41

53 Posts
Default

I'm using a 1.5A psu that came with a Raspi2 I've had earlier. I've now ordered a 5.1V 2.5A "official Raspberry Pi 3 Power supply" and a copper heatsink.
I have USB devices attached which also draw a significant amount of power, so 1.5A might be a bit tight.
Thermal imaging shows that the RAM isn't getting very hot, just the SoC is glowing.

But it's getting offtopic here. I'll post an update if I get a significant improvement.
heliosh is offline   Reply With Quote
Old 2018-02-18, 01:05   #86
VictordeHolland
 
VictordeHolland's Avatar
 
"Victor de Hollander"
Aug 2011
the Netherlands

49B16 Posts
Default

Code:
Victor@PCVICTOR MINGW64 ~
$ pacman -S mingw-w64-x86_64-gcc
afhankelijkheden oplossen...
zoeken naar conflicterende pakketten...

Pakketten (2) mingw-w64-x86_64-gcc-libs-7.3.0-1  mingw-w64-x86_64-gcc-7.3.0-1

Totale Geïnstalleerde Grootte:   116,40 MiB
Netto Upgrade Grootte:            14,16 MiB

:: Doorgaan met de installatie? [J/n] j
(2/2) sleutels in sleutelbos controleren           [#####################] 100%
(2/2) pakketintegriteit controleren                [#####################] 100%
(2/2) pakketbestanden laden                        [#####################] 100%
(2/2) controleren van conflicterende bestanden     [#####################] 100%
(2/2) beschikbare schijfruimte controleren         [#####################] 100%
(1/2) upgraden mingw-w64-x86_64-gcc-libs           [#####################] 100%
(2/2) upgraden mingw-w64-x86_64-gcc                [#####################] 100%

Victor@PCVICTOR MINGW64 ~
$ which gcc
/mingw64/bin/gcc

Victor@PCVICTOR MINGW64 ~
$ gcc -v
Using built-in specs.
COLLECT_GCC=C:\msys64\mingw64\bin\gcc.exe
COLLECT_LTO_WRAPPER=C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/7.3.0/lto-wrapper.exe
Target: x86_64-w64-mingw32
Configured with: ../gcc-7.3.0/configure --prefix=/mingw64 --with-local-prefix=/mingw64/local --build=x86_64-w64-mingw32 --host=x86_64-w64-mingw32 --target=x86_64-w64-mingw32 --with-native-system-header-dir=/mingw64/x86_64-w64-mingw32/include --libexecdir=/mingw64/lib --enable-bootstrap --with-arch=x86-64 --with-tune=generic --enable-languages=c,lto,c++,objc,obj-c++,fortran,ada --enable-shared --enable-static --enable-libatomic --enable-threads=posix --enable-graphite --enable-fully-dynamic-string --enable-libstdcxx-time=yes --enable-libstdcxx-filesystem-ts=yes --disable-libstdcxx-pch --disable-libstdcxx-debug --disable-isl-version-check --enable-lto --enable-libgomp --disable-multilib --enable-checking=release --disable-rpath --disable-win32-registry --disable-nls --disable-werror --disable-symvers --with-libiconv --with-system-zlib --with-gmp=/mingw64 --with-mpfr=/mingw64 --with-mpc=/mingw64 --with-isl=/mingw64 --with-pkgversion='Rev1, Built by MSYS2 project' --with-bugurl=https://sourceforge.net/projects/msys2 --with-gnu-as --with-gnu-ld
Thread model: posix
gcc version 7.3.0 (Rev1, Built by MSYS2 project)

Victor@PCVICTOR MINGW64 ~
$ cd ..

Victor@PCVICTOR MINGW64 /home
$ cd mlucas_v17.1-20180123/

Victor@PCVICTOR MINGW64 /home/mlucas_v17.1-20180123
$ cd AVX

Victor@PCVICTOR MINGW64 /home/mlucas_v17.1-20180123/AVX
$ gcc -c -O3 -DUSE_AVX *.c>& build.log

Victor@PCVICTOR MINGW64 /home/mlucas_v17.1-20180123/AVX
$ grep -i error build.log

Victor@PCVICTOR MINGW64 /home/mlucas_v17.1-20180123/AVX
$ gcc -o Mlucas *.o -lm -lpthread -lrt
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/7.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: cannot find -lrt
collect2.exe: error: ld returned 1 exit status

Victor@PCVICTOR MINGW64 /home/mlucas_v17.1-20180123/AVX
$ gcc -o Mlucas *.o -lm -lpthread -lrt

Victor@PCVICTOR MINGW64 /home/mlucas_v17.1-20180123/AVX
$ cd ..

Victor@PCVICTOR MINGW64 /home/mlucas_v17.1-20180123
$ cd SSE2/

Victor@PCVICTOR MINGW64 /home/mlucas_v17.1-20180123/SSE2
$ gcc -c -O3 -DUSE_SSE2 *.c>& build.log

Victor@PCVICTOR MINGW64 /home/mlucas_v17.1-20180123/SSE2
$ grep -i error build.log

Victor@PCVICTOR MINGW64 /home/mlucas_v17.1-20180123/SSE2
$ gcc -o Mlucas *.o -lm -lpthread -lrt
Compiled without errors (only lots of warnings) with your excelent guide (http://www.mersenneforum.org/mayer/README.html#windows)
MSYS2 + MINGW64
GCC version 7.3.0 (Rev1, Built by MSYS2 project)
Win7 64bit
Intel Core i5 2500k @4.0GHz

Somehow the librt wasn't included in the folder gcc was looking for it, but that was a simple copy paste fix by coping it from a (seperate) mingw64-6.3.0 installation

SSE2 (1 core)
Code:
17.1
       128  msec/iter =    1.80  ROE[avg,max] = [0.243858937, 0.312500000]  radices =  16 16 16 16  0  0  0  0  0  0
       160  msec/iter =    2.30  ROE[avg,max] = [0.275809152, 0.312500000]  radices =  20 16 16 16  0  0  0  0  0  0
       192  msec/iter =    2.70  ROE[avg,max] = [0.255859375, 0.304687500]  radices =  24 16 16 16  0  0  0  0  0  0
       208  msec/iter =    3.20  ROE[avg,max] = [0.287562779, 0.343750000]  radices = 208 16 32  0  0  0  0  0  0  0
       224  msec/iter =    3.50  ROE[avg,max] = [0.302427455, 0.375000000]  radices =  28 16 16 16  0  0  0  0  0  0
       240  msec/iter =    3.90  ROE[avg,max] = [0.259737723, 0.312500000]  radices =  60  8 16 16  0  0  0  0  0  0
       256  msec/iter =    3.70  ROE[avg,max] = [0.303571429, 0.375000000]  radices =  32 16 16 16  0  0  0  0  0  0
       288  msec/iter =    4.60  ROE[avg,max] = [0.246065848, 0.312500000]  radices = 144 32 32  0  0  0  0  0  0  0
       320  msec/iter =    4.80  ROE[avg,max] = [0.275948661, 0.375000000]  radices =  40 16 16 16  0  0  0  0  0  0
       352  msec/iter =    5.70  ROE[avg,max] = [0.292622811, 0.375000000]  radices =  44 16 16 16  0  0  0  0  0  0
       384  msec/iter =    5.70  ROE[avg,max] = [0.260909598, 0.312500000]  radices =  24 16 16 32  0  0  0  0  0  0
       416  msec/iter =    6.70  ROE[avg,max] = [0.264285714, 0.296875000]  radices =  52 16 16 16  0  0  0  0  0  0
       448  msec/iter =    6.90  ROE[avg,max] = [0.290206473, 0.343750000]  radices =  28 16 16 32  0  0  0  0  0  0
       480  msec/iter =    7.60  ROE[avg,max] = [0.280245536, 0.375000000]  radices =  60 16 16 16  0  0  0  0  0  0
       512  msec/iter =    7.60  ROE[avg,max] = [0.248214286, 0.312500000]  radices =  16 16 32 32  0  0  0  0  0  0
       576  msec/iter =    9.10  ROE[avg,max] = [0.263337054, 0.375000000]  radices =  36 16 16 32  0  0  0  0  0  0
       640  msec/iter =    9.70  ROE[avg,max] = [0.261049107, 0.312500000]  radices =  20 16 32 32  0  0  0  0  0  0
       704  msec/iter =   11.50  ROE[avg,max] = [0.299386161, 0.359375000]  radices =  44 16 16 32  0  0  0  0  0  0
       768  msec/iter =   11.70  ROE[avg,max] = [0.285895647, 0.375000000]  radices =  48 16 16 32  0  0  0  0  0  0
       832  msec/iter =   13.40  ROE[avg,max] = [0.267006138, 0.328125000]  radices =  52 16 16 32  0  0  0  0  0  0
       896  msec/iter =   14.20  ROE[avg,max] = [0.291106306, 0.343750000]  radices =  56 16 16 32  0  0  0  0  0  0
       960  msec/iter =   15.40  ROE[avg,max] = [0.285044643, 0.375000000]  radices =  60 16 16 32  0  0  0  0  0  0
      1024  msec/iter =   15.80  ROE[avg,max] = [0.271428571, 0.375000000]  radices =  32 16 32 32  0  0  0  0  0  0
      1152  msec/iter =   18.80  ROE[avg,max] = [0.259458705, 0.312500000]  radices =  36 16 32 32  0  0  0  0  0  0
      1280  msec/iter =   20.50  ROE[avg,max] = [0.265569196, 0.328125000]  radices =  40 16 32 32  0  0  0  0  0  0
      1408  msec/iter =   25.10  ROE[avg,max] = [0.302511161, 0.375000000]  radices =  44 32 32 16  0  0  0  0  0  0
      1536  msec/iter =   24.40  ROE[avg,max] = [0.287500000, 0.343750000]  radices =  48 16 32 32  0  0  0  0  0  0
      1664  msec/iter =   28.70  ROE[avg,max] = [0.254003906, 0.281250000]  radices =  52 16 32 32  0  0  0  0  0  0
      1792  msec/iter =   29.80  ROE[avg,max] = [0.288364955, 0.343750000]  radices =  56 16 32 32  0  0  0  0  0  0
      1920  msec/iter =   33.20  ROE[avg,max] = [0.258398438, 0.312500000]  radices =  60 16 32 32  0  0  0  0  0  0
      2048  msec/iter =   34.00  ROE[avg,max] = [0.246616908, 0.312500000]  radices =  32 32 32 32  0  0  0  0  0  0
      2304  msec/iter =   40.00  ROE[avg,max] = [0.298158482, 0.375000000]  radices = 144 16 16 32  0  0  0  0  0  0
      2560  msec/iter =   43.51  ROE[avg,max] = [0.264843750, 0.312500000]  radices =  40 32 32 32  0  0  0  0  0  0
      2816  msec/iter =   51.00  ROE[avg,max] = [0.317815290, 0.375000000]  radices = 176 16 16 32  0  0  0  0  0  0
      3072  msec/iter =   51.90  ROE[avg,max] = [0.243532017, 0.296875000]  radices =  48 32 32 32  0  0  0  0  0  0
      3328  msec/iter =   60.11  ROE[avg,max] = [0.252553013, 0.312500000]  radices =  52 32 32 32  0  0  0  0  0  0
      3584  msec/iter =   63.40  ROE[avg,max] = [0.292243304, 0.375000000]  radices =  56 32 32 32  0  0  0  0  0  0
      3840  msec/iter =   69.50  ROE[avg,max] = [0.267271205, 0.375000000]  radices = 240 16 16 32  0  0  0  0  0  0
      4096  msec/iter =   69.00  ROE[avg,max] = [0.244712612, 0.281250000]  radices =  16 16 16 16 32  0  0  0  0  0
      4608  msec/iter =   81.20  ROE[avg,max] = [0.268268694, 0.343750000]  radices = 288 16 16 32  0  0  0  0  0  0
      5120  msec/iter =   90.60  ROE[avg,max] = [0.344419643, 0.375000000]  radices =  20 16 16 16 32  0  0  0  0  0
      5632  msec/iter =  105.40  ROE[avg,max] = [0.324665179, 0.375000000]  radices = 176 16 32 32  0  0  0  0  0  0
      6144  msec/iter =  106.60  ROE[avg,max] = [0.252887835, 0.289062500]  radices =  24 16 16 16 32  0  0  0  0  0
      6656  msec/iter =  122.59  ROE[avg,max] = [0.281138393, 0.312500000]  radices = 208 16 32 32  0  0  0  0  0  0
      7168  msec/iter =  132.00  ROE[avg,max] = [0.289226423, 0.343750000]  radices =  28 16 16 16 32  0  0  0  0  0
      7680  msec/iter =  145.20  ROE[avg,max] = [0.260156250, 0.312500000]  radices = 240 16 32 32  0  0  0  0  0  0
      8192  msec/iter =  146.80  ROE[avg,max] = [0.244656808, 0.281250000]  radices =  16 16 16 32 32  0  0  0  0  0
      9216  msec/iter =  170.30  ROE[avg,max] = [0.254994420, 0.316406250]  radices =  36 16 16 16 32  0  0  0  0  0
     10240  msec/iter =  185.20  ROE[avg,max] = [0.284905134, 0.343750000]  radices =  40 16 16 16 32  0  0  0  0  0
     11264  msec/iter =  215.90  ROE[avg,max] = [0.284776088, 0.328125000]  radices =  44 16 16 16 32  0  0  0  0  0
     12288  msec/iter =  222.20  ROE[avg,max] = [0.247877720, 0.312500000]  radices =  48 16 16 16 32  0  0  0  0  0
     13312  msec/iter =  251.81  ROE[avg,max] = [0.304910714, 0.343750000]  radices =  52 16 16 16 32  0  0  0  0  0
     14336  msec/iter =  279.70  ROE[avg,max] = [0.275892857, 0.312500000]  radices =  28 16 16 32 32  0  0  0  0  0
     15360  msec/iter =  295.60  ROE[avg,max] = [0.288839286, 0.343750000]  radices =  60 16 16 16 32  0  0  0  0  0
     16384  msec/iter =  306.90  ROE[avg,max] = [0.253431920, 0.312500000]  radices =  32 16 16 32 32  0  0  0  0  0
     18432  msec/iter =  365.40  ROE[avg,max] = [0.261997768, 0.296875000]  radices =  36 16 16 32 32  0  0  0  0  0
     20480  msec/iter =  397.50  ROE[avg,max] = [0.269196429, 0.312500000]  radices =  40 16 16 32 32  0  0  0  0  0
     22528  msec/iter =  465.71  ROE[avg,max] = [0.284232003, 0.343750000]  radices =  44 16 16 32 32  0  0  0  0  0
     24576  msec/iter =  479.50  ROE[avg,max] = [0.293080357, 0.343750000]  radices =  48 16 16 32 32  0  0  0  0  0
     26624  msec/iter =  547.20  ROE[avg,max] = [0.267187500, 0.312500000]  radices =  52 16 16 32 32  0  0  0  0  0
     28672  msec/iter =  576.00  ROE[avg,max] = [0.309486607, 0.343750000]  radices =  56 16 16 32 32  0  0  0  0  0
     30720  msec/iter =  633.21  ROE[avg,max] = [0.264899554, 0.312500000]  radices =  60 16 16 32 32  0  0  0  0  0
     32768  msec/iter =  666.00  ROE[avg,max] = [0.255109515, 0.312500000]  radices =  32 32 32 16 32  0  0  0  0  0
     36864  msec/iter =  757.10  ROE[avg,max] = [0.273688616, 0.312500000]  radices = 144 16 16 16 32  0  0  0  0  0
     40960  msec/iter =  857.64  ROE[avg,max] = [0.262755476, 0.296875000]  radices = 160 16 16 16 32  0  0  0  0  0
     45056  msec/iter =  960.50  ROE[avg,max] = [0.295835658, 0.343750000]  radices = 176 16 16 16 32  0  0  0  0  0
     49152  msec/iter = 1057.21  ROE[avg,max] = [0.280859375, 0.312500000]  radices =  48 16 32 32 32  0  0  0  0  0
     53248  msec/iter = 1205.40  ROE[avg,max] = [0.258314732, 0.312500000]  radices =  52 16 32 32 32  0  0  0  0  0
     57344  msec/iter = 1231.01  ROE[avg,max] = [0.282653373, 0.312500000]  radices = 224 16 16 16 32  0  0  0  0  0
     61440  msec/iter = 1322.60  ROE[avg,max] = [0.264676339, 0.343750000]  radices = 240 16 16 16 32  0  0  0  0  0
AVX (1 core)
Code:
17.1
       128  msec/iter =    1.40  ROE[avg,max] = [0.278125000, 0.375000000]  radices =  16 16 16 16  0  0  0  0  0  0
       144  msec/iter =    1.60  ROE[avg,max] = [0.257686942, 0.328125000]  radices = 144 16 32  0  0  0  0  0  0  0
       160  msec/iter =    1.80  ROE[avg,max] = [0.283258929, 0.343750000]  radices = 160 32 16  0  0  0  0  0  0  0
       192  msec/iter =    2.10  ROE[avg,max] = [0.276339286, 0.343750000]  radices =  48  8 16 16  0  0  0  0  0  0
       224  msec/iter =    2.50  ROE[avg,max] = [0.285142299, 0.343750000]  radices =  28 16 16 16  0  0  0  0  0  0
       240  msec/iter =    2.80  ROE[avg,max] = [0.259054129, 0.312500000]  radices = 240 16 32  0  0  0  0  0  0  0
       256  msec/iter =    2.80  ROE[avg,max] = [0.247427150, 0.281250000]  radices =  32 16 16 16  0  0  0  0  0  0
       288  msec/iter =    3.30  ROE[avg,max] = [0.294754464, 0.375000000]  radices =  36 16 16 16  0  0  0  0  0  0
       320  msec/iter =    3.50  ROE[avg,max] = [0.256869071, 0.312500000]  radices =  20 16 16 32  0  0  0  0  0  0
       384  msec/iter =    4.20  ROE[avg,max] = [0.259472656, 0.312500000]  radices =  24 16 16 32  0  0  0  0  0  0
       416  msec/iter =    5.00  ROE[avg,max] = [0.258949498, 0.312500000]  radices = 208 32 32  0  0  0  0  0  0  0
       448  msec/iter =    5.00  ROE[avg,max] = [0.279471261, 0.328125000]  radices =  28 16 16 32  0  0  0  0  0  0
       480  msec/iter =    5.70  ROE[avg,max] = [0.268457031, 0.312500000]  radices =  60 16 16 16  0  0  0  0  0  0
       512  msec/iter =    5.80  ROE[avg,max] = [0.243409947, 0.312500000]  radices =  32 16 16 32  0  0  0  0  0  0
       576  msec/iter =    6.70  ROE[avg,max] = [0.302343750, 0.375000000]  radices =  36 16 16 32  0  0  0  0  0  0
       640  msec/iter =    7.20  ROE[avg,max] = [0.281138393, 0.375000000]  radices =  40 16 16 32  0  0  0  0  0  0
       768  msec/iter =    8.70  ROE[avg,max] = [0.252845982, 0.296875000]  radices =  48 16 16 32  0  0  0  0  0  0
       832  msec/iter =   10.20  ROE[avg,max] = [0.299107143, 0.375000000]  radices =  52 16 16 32  0  0  0  0  0  0
       896  msec/iter =   10.70  ROE[avg,max] = [0.280482701, 0.375000000]  radices =  28 16 32 32  0  0  0  0  0  0
       960  msec/iter =   11.60  ROE[avg,max] = [0.266210938, 0.312500000]  radices =  60 16 16 32  0  0  0  0  0  0
      1024  msec/iter =   12.13  ROE[avg,max] = [0.237806920, 0.312500000]  radices =  32 16 32 32  0  0  0  0  0  0
      1152  msec/iter =   14.00  ROE[avg,max] = [0.277790179, 0.312500000]  radices =  36 16 32 32  0  0  0  0  0  0
      1280  msec/iter =   15.41  ROE[avg,max] = [0.286830357, 0.343750000]  radices =  40 16 32 32  0  0  0  0  0  0
      1408  msec/iter =   18.15  ROE[avg,max] = [0.308140346, 0.390625000]  radices = 176 16 16 16  0  0  0  0  0  0
      1536  msec/iter =   18.38  ROE[avg,max] = [0.254910714, 0.343750000]  radices =  48 16 32 32  0  0  0  0  0  0
      1664  msec/iter =   21.22  ROE[avg,max] = [0.282310268, 0.343750000]  radices = 208 16 16 16  0  0  0  0  0  0
      1792  msec/iter =   22.84  ROE[avg,max] = [0.271777344, 0.312500000]  radices =  28 32 32 32  0  0  0  0  0  0
      1920  msec/iter =   24.30  ROE[avg,max] = [0.296428571, 0.375000000]  radices =  60 16 32 32  0  0  0  0  0  0
      2048  msec/iter =   25.74  ROE[avg,max] = [0.247865513, 0.312500000]  radices =  32 32 32 32  0  0  0  0  0  0
      2304  msec/iter =   28.84  ROE[avg,max] = [0.275669643, 0.312500000]  radices = 144 16 16 32  0  0  0  0  0  0
      2560  msec/iter =   32.86  ROE[avg,max] = [0.300000000, 0.375000000]  radices =  40 32 32 32  0  0  0  0  0  0
      2816  msec/iter =   36.79  ROE[avg,max] = [0.291238839, 0.343750000]  radices = 176 16 16 32  0  0  0  0  0  0
      3072  msec/iter =   38.61  ROE[avg,max] = [0.245962960, 0.281250000]  radices =  48 32 32 32  0  0  0  0  0  0
      3328  msec/iter =   43.28  ROE[avg,max] = [0.284221540, 0.343750000]  radices = 208 16 16 32  0  0  0  0  0  0
      3584  msec/iter =   46.51  ROE[avg,max] = [0.290764509, 0.343750000]  radices = 224 16 16 32  0  0  0  0  0  0
      3840  msec/iter =   49.49  ROE[avg,max] = [0.258475167, 0.296875000]  radices = 240 16 16 32  0  0  0  0  0  0
      4096  msec/iter =   53.52  ROE[avg,max] = [0.284402902, 0.312500000]  radices =  16 16 16 16 32  0  0  0  0  0
      4608  msec/iter =   59.81  ROE[avg,max] = [0.249079241, 0.281250000]  radices = 144 16 32 32  0  0  0  0  0  0
      5120  msec/iter =   66.84  ROE[avg,max] = [0.257080078, 0.312500000]  radices =  20 16 16 16 32  0  0  0  0  0
      5632  msec/iter =   76.25  ROE[avg,max] = [0.282209124, 0.343750000]  radices = 176 16 32 32  0  0  0  0  0  0
      6144  msec/iter =   79.79  ROE[avg,max] = [0.277678571, 0.312500000]  radices =  24 16 16 16 32  0  0  0  0  0
      6656  msec/iter =   88.71  ROE[avg,max] = [0.264644950, 0.312500000]  radices = 208 16 32 32  0  0  0  0  0  0
      7168  msec/iter =   96.36  ROE[avg,max] = [0.294782366, 0.375000000]  radices = 224 16 32 32  0  0  0  0  0  0
      7680  msec/iter =  103.47  ROE[avg,max] = [0.267142160, 0.312500000]  radices = 240 16 32 32  0  0  0  0  0  0
      8192  msec/iter =  112.06  ROE[avg,max] = [0.257198661, 0.312500000]  radices = 256 16 32 32  0  0  0  0  0  0
      9216  msec/iter =  124.63  ROE[avg,max] = [0.293973214, 0.343750000]  radices =  36 16 16 16 32  0  0  0  0  0
     10240  msec/iter =  136.20  ROE[avg,max] = [0.280468750, 0.375000000]  radices =  40 16 16 16 32  0  0  0  0  0
     11264  msec/iter =  160.42  ROE[avg,max] = [0.283238002, 0.328125000]  radices =  44 16 16 16 32  0  0  0  0  0
     12288  msec/iter =  162.44  ROE[avg,max] = [0.261104911, 0.312500000]  radices =  48 16 16 16 32  0  0  0  0  0
     13312  msec/iter =  188.98  ROE[avg,max] = [0.289564732, 0.343750000]  radices = 208 32 32 32  0  0  0  0  0  0
     14336  msec/iter =  197.28  ROE[avg,max] = [0.287133789, 0.343750000]  radices =  56 16 16 16 32  0  0  0  0  0
     15360  msec/iter =  218.04  ROE[avg,max] = [0.262025670, 0.296875000]  radices =  60 16 16 16 32  0  0  0  0  0
     16384  msec/iter =  232.43  ROE[avg,max] = [0.239365932, 0.281250000]  radices =  32 16 16 32 32  0  0  0  0  0
     18432  msec/iter =  269.14  ROE[avg,max] = [0.246674456, 0.281250000]  radices = 288 32 32 32  0  0  0  0  0  0
     20480  msec/iter =  297.32  ROE[avg,max] = [0.325000000, 0.375000000]  radices =  40 16 16 32 32  0  0  0  0  0
     22528  msec/iter =  345.23  ROE[avg,max] = [0.304185268, 0.367187500]  radices = 176 16 16 16 16  0  0  0  0  0
     24576  msec/iter =  352.60  ROE[avg,max] = [0.257749721, 0.312500000]  radices =  48 16 16 32 32  0  0  0  0  0
     26624  msec/iter =  414.03  ROE[avg,max] = [0.284179688, 0.343750000]  radices =  52 16 16 32 32  0  0  0  0  0
     28672  msec/iter =  420.70  ROE[avg,max] = [0.302594866, 0.343750000]  radices =  56 16 16 32 32  0  0  0  0  0
     30720  msec/iter =  466.00  ROE[avg,max] = [0.291629464, 0.375000000]  radices = 240 16 16 16 16  0  0  0  0  0
     32768  msec/iter =  498.50  ROE[avg,max] = [0.267689732, 0.343750000]  radices = 128 16 16 16 32  0  0  0  0  0
     36864  msec/iter =  535.63  ROE[avg,max] = [0.254352679, 0.312500000]  radices = 144 16 16 16 32  0  0  0  0  0
     40960  msec/iter =  594.90  ROE[avg,max] = [0.297098214, 0.343750000]  radices = 160 16 16 16 32  0  0  0  0  0
     45056  msec/iter =  674.78  ROE[avg,max] = [0.299944196, 0.343750000]  radices = 176 16 16 16 32  0  0  0  0  0
     49152  msec/iter =  784.09  ROE[avg,max] = [0.254603795, 0.281250000]  radices = 192 16 16 16 32  0  0  0  0  0
     53248  msec/iter =  904.71  ROE[avg,max] = [0.271316964, 0.312500000]  radices =  52 16 32 32 32  0  0  0  0  0
     57344  msec/iter =  857.03  ROE[avg,max] = [0.319642857, 0.375000000]  radices = 224 16 16 16 32  0  0  0  0  0
     61440  msec/iter =  897.44  ROE[avg,max] = [0.276255580, 0.312500000]  radices = 240 16 16 16 32  0  0  0  0  0
Strangely the scalar build (without AVX/SSE2) builds without errors (lots of warning though), but at the selftest the round-off-errors at every single iteration are HUGE, like in the millions (with all the tests by -s m).
VictordeHolland is offline   Reply With Quote
Old 2018-02-19, 01:40   #87
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

22·2,939 Posts
Default

Thanks, Victor - you clearly spent a lot of time running the full self-test ranges, is this an otherwise-idle AVX system of yours?

The huge-roundoff-errors-in-scalar-build sound like a bad nearest-int emulation ... if you recompile just a single small file (say br.c) and add -DVERBOSE_HEADERS to the compile command, that will tell you which version of gcc's rint() is being used, e.g. on my Core macbook:
Code:
In file included from ../br.c:23:
In file included from ../Mlucas.h:29:
In file included from ../align.h:29:
../types.h:225:3: warning: #warning Using lrint() for DNINT [-W#warnings]
        #warning Using lrint() for DNINT
If you're interested, I can guide you in adding some simple printf's to the scalar-double-mode carry macro which could help pinpoint the issue. Do all radix combos at all FFT lengths in your build suffer the huge ROEs, or do some run OK?

Last fiddled with by ewmayer on 2018-02-19 at 01:48
ewmayer is offline   Reply With Quote
Old 2018-02-23, 16:17   #88
VictordeHolland
 
VictordeHolland's Avatar
 
"Victor de Hollander"
Aug 2011
the Netherlands

32·131 Posts
Default

I haven't tried all the FFT sizes with the scalar build, but all the ones I tried all failed. It is when compiling with MINGW64 for Windows, which could introduce some strange behaviour. I wouldn't put it high on the priority list, as the AVX and SSE2 build successfully and even an old Pentium4 has SSE2.

Anyway I ran the verbose header function on br.c:
Code:
Victor@PCVICTOR MINGW64 /home/mlucas_v17.1-20180123/SCALAR
$ gcc -c -DVERBOSE_HEADERS br.c >& build.log
This is the resulting build.log
Code:
In file included from types.h:30:0,
                 from align.h:29,
                 from Mlucas.h:29,
                 from br.c:23:
platform.h:1518:3: warning: #warning platform.h: Defining both X64_ASM and X32_ASM [-Wcpp]
  #warning platform.h: Defining both X64_ASM and X32_ASM
   ^~~~~~~
In file included from align.h:29:0,
                 from Mlucas.h:29,
                 from br.c:23:
types.h:225:3: warning: #warning Using lrint() for DNINT [-Wcpp]
  #warning Using lrint() for DNINT
   ^~~~~~~
In file included from imul_macro.h:29:0,
                 from mi64.h:30,
                 from Mdata.h:31,
                 from carry.h:29,
                 from Mlucas.h:30,
                 from br.c:23:
imul_macro0.h:309:3: warning: #warning X86_64-type CPU detected [-Wcpp]
  #warning X86_64-type CPU detected
   ^~~~~~~
VictordeHolland is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Mlucas v18 available ewmayer Mlucas 48 2019-11-28 02:53
Mlucas on ubuntu Damian Mlucas 17 2017-11-13 18:12
Mlucas version 17 ewmayer Mlucas 3 2017-06-17 11:18
MLucas on IBM Mainframe Lorenzo Mlucas 52 2016-03-13 08:45
mlucas on sun delta_t Mlucas 14 2007-10-04 05:45

All times are UTC. The time now is 04:26.


Fri Jul 7 04:26:23 UTC 2023 up 323 days, 1:54, 0 users, load averages: 2.88, 1.99, 1.69

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔