![]() |
|
|
#78 |
|
Oct 2017
++41
53 Posts |
I've installed debian:arm64, recompiled mlucas and ran the tests again (which now went through without memory allocation problem). Still with 80% background load on one core.
In average the tests were sped up by a factor of 1.5 compared to raspbian 32-Bit.. Possibly some thermal throttling was involved since I don't have a heatsink yet. The clock was always at 1.2GHz when I checked, however with the temps around 75-77°C. I've read that thermal throttling starts at 80°C. 64-Bit mlucas.cfg: https://pastebin.com/raw/H2H9dkWH |
|
|
|
|
|
#79 | |
|
∂2ω=0
Sep 2002
República de California
22·2,939 Posts |
Quote:
Or perhaps more pertinently, compare your timings vs ET_'s on his Pi3 - his timings are only modestly slower - roughly 70% the throughput - than my C2. Last fiddled with by ewmayer on 2018-01-02 at 01:14 |
|
|
|
|
|
|
#80 | |
|
Banned
"Luigi"
Aug 2002
Team Italia
5×7×139 Posts |
Quote:
|
|
|
|
|
|
|
#81 |
|
Oct 2017
++41
12510 Posts |
I have rerun the 4096k-Test without any background load and it is still a lot slower than ET_'s:
Code:
4096 msec/iter = 683.82 ROE[avg,max] = [0.254464286, 0.312500000] radices = 256 8 8 8 16 0 0 0 0 0 - my compiler-flags were horribly wrong (plus i use GCC-6, while ET_ has used GCC-5) - Thermal issue None of those sound like a plausible explanation to me, but who knows. Edit: I was checking pi64-config for the CPU-frequency and it said: "Throttling occured [under-voltage throttled], your RPI doesn't perform well under load. This usually happens because of a suboptimal power supply cable." I'll check that. Last fiddled with by heliosh on 2018-01-02 at 09:36 |
|
|
|
|
|
#82 | |
|
Banned
"Luigi"
Aug 2002
Team Italia
5×7×139 Posts |
Quote:
Code:
gcc -c -O3 -DUSE_ARM_V8_SIMD -DUSE_THREADS ../src/*.c >& build.log Luigi |
|
|
|
|
|
|
#83 |
|
Oct 2017
++41
53 Posts |
I've had "-O3 -mcpu=cortex-a53"
I've compiled it now with -DUSE_ARM_V8_SIMD, but I instantly get a segfault when running the tests. And yes, the object files were deleted. Last fiddled with by heliosh on 2018-01-02 at 12:02 |
|
|
|
|
|
#84 |
|
"Composite as Heck"
Oct 2017
2×52×19 Posts |
If I were you I'd use the gentoo distro me and ET use. I initially tried a debian 64 distro and encountered very similar problems you have, down to the seg faults (check the other thread). The timings you posted are slower probably due to being scalar, but also much lower than my scalar timings so the undervolting is a big part of the issue.
Definitely replace the power supply, it could be a cheap cable or an insufficient transformer. If you're using an old phone charger that's probably the issue, I have one that's only rated for 5V @ 300mA, which fails at ~1000mA. Fine for light use, but under full load my pi 3 oscillates around 1000mA +-250mA. Any modern phone transformer is probably fine, most seem to be rated for 2000mA or 2400mA. Go for a heatsink on the SoC, I have an aluminium one which I think is just about keeping up but would get copper if doing it again. You don't need to put a heatsink on the io chip, but I would recommend one on the RAM chip on the underside, or at least drilling a hole in the case (if you use one), as otherwise it's getting nearly no airflow. |
|
|
|
|
|
#85 |
|
Oct 2017
++41
12510 Posts |
I'm using a 1.5A psu that came with a Raspi2 I've had earlier. I've now ordered a 5.1V 2.5A "official Raspberry Pi 3 Power supply" and a copper heatsink.
I have USB devices attached which also draw a significant amount of power, so 1.5A might be a bit tight. Thermal imaging shows that the RAM isn't getting very hot, just the SoC is glowing. But it's getting offtopic here. I'll post an update if I get a significant improvement. |
|
|
|
|
|
#86 |
|
"Victor de Hollander"
Aug 2011
the Netherlands
32×131 Posts |
Code:
Victor@PCVICTOR MINGW64 ~ $ pacman -S mingw-w64-x86_64-gcc afhankelijkheden oplossen... zoeken naar conflicterende pakketten... Pakketten (2) mingw-w64-x86_64-gcc-libs-7.3.0-1 mingw-w64-x86_64-gcc-7.3.0-1 Totale Geïnstalleerde Grootte: 116,40 MiB Netto Upgrade Grootte: 14,16 MiB :: Doorgaan met de installatie? [J/n] j (2/2) sleutels in sleutelbos controleren [#####################] 100% (2/2) pakketintegriteit controleren [#####################] 100% (2/2) pakketbestanden laden [#####################] 100% (2/2) controleren van conflicterende bestanden [#####################] 100% (2/2) beschikbare schijfruimte controleren [#####################] 100% (1/2) upgraden mingw-w64-x86_64-gcc-libs [#####################] 100% (2/2) upgraden mingw-w64-x86_64-gcc [#####################] 100% Victor@PCVICTOR MINGW64 ~ $ which gcc /mingw64/bin/gcc Victor@PCVICTOR MINGW64 ~ $ gcc -v Using built-in specs. COLLECT_GCC=C:\msys64\mingw64\bin\gcc.exe COLLECT_LTO_WRAPPER=C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/7.3.0/lto-wrapper.exe Target: x86_64-w64-mingw32 Configured with: ../gcc-7.3.0/configure --prefix=/mingw64 --with-local-prefix=/mingw64/local --build=x86_64-w64-mingw32 --host=x86_64-w64-mingw32 --target=x86_64-w64-mingw32 --with-native-system-header-dir=/mingw64/x86_64-w64-mingw32/include --libexecdir=/mingw64/lib --enable-bootstrap --with-arch=x86-64 --with-tune=generic --enable-languages=c,lto,c++,objc,obj-c++,fortran,ada --enable-shared --enable-static --enable-libatomic --enable-threads=posix --enable-graphite --enable-fully-dynamic-string --enable-libstdcxx-time=yes --enable-libstdcxx-filesystem-ts=yes --disable-libstdcxx-pch --disable-libstdcxx-debug --disable-isl-version-check --enable-lto --enable-libgomp --disable-multilib --enable-checking=release --disable-rpath --disable-win32-registry --disable-nls --disable-werror --disable-symvers --with-libiconv --with-system-zlib --with-gmp=/mingw64 --with-mpfr=/mingw64 --with-mpc=/mingw64 --with-isl=/mingw64 --with-pkgversion='Rev1, Built by MSYS2 project' --with-bugurl=https://sourceforge.net/projects/msys2 --with-gnu-as --with-gnu-ld Thread model: posix gcc version 7.3.0 (Rev1, Built by MSYS2 project) Victor@PCVICTOR MINGW64 ~ $ cd .. Victor@PCVICTOR MINGW64 /home $ cd mlucas_v17.1-20180123/ Victor@PCVICTOR MINGW64 /home/mlucas_v17.1-20180123 $ cd AVX Victor@PCVICTOR MINGW64 /home/mlucas_v17.1-20180123/AVX $ gcc -c -O3 -DUSE_AVX *.c>& build.log Victor@PCVICTOR MINGW64 /home/mlucas_v17.1-20180123/AVX $ grep -i error build.log Victor@PCVICTOR MINGW64 /home/mlucas_v17.1-20180123/AVX $ gcc -o Mlucas *.o -lm -lpthread -lrt C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/7.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: cannot find -lrt collect2.exe: error: ld returned 1 exit status Victor@PCVICTOR MINGW64 /home/mlucas_v17.1-20180123/AVX $ gcc -o Mlucas *.o -lm -lpthread -lrt Victor@PCVICTOR MINGW64 /home/mlucas_v17.1-20180123/AVX $ cd .. Victor@PCVICTOR MINGW64 /home/mlucas_v17.1-20180123 $ cd SSE2/ Victor@PCVICTOR MINGW64 /home/mlucas_v17.1-20180123/SSE2 $ gcc -c -O3 -DUSE_SSE2 *.c>& build.log Victor@PCVICTOR MINGW64 /home/mlucas_v17.1-20180123/SSE2 $ grep -i error build.log Victor@PCVICTOR MINGW64 /home/mlucas_v17.1-20180123/SSE2 $ gcc -o Mlucas *.o -lm -lpthread -lrt MSYS2 + MINGW64 GCC version 7.3.0 (Rev1, Built by MSYS2 project) Win7 64bit Intel Core i5 2500k @4.0GHz Somehow the librt wasn't included in the folder gcc was looking for it, but that was a simple copy paste fix by coping it from a (seperate) mingw64-6.3.0 installation SSE2 (1 core) Code:
17.1
128 msec/iter = 1.80 ROE[avg,max] = [0.243858937, 0.312500000] radices = 16 16 16 16 0 0 0 0 0 0
160 msec/iter = 2.30 ROE[avg,max] = [0.275809152, 0.312500000] radices = 20 16 16 16 0 0 0 0 0 0
192 msec/iter = 2.70 ROE[avg,max] = [0.255859375, 0.304687500] radices = 24 16 16 16 0 0 0 0 0 0
208 msec/iter = 3.20 ROE[avg,max] = [0.287562779, 0.343750000] radices = 208 16 32 0 0 0 0 0 0 0
224 msec/iter = 3.50 ROE[avg,max] = [0.302427455, 0.375000000] radices = 28 16 16 16 0 0 0 0 0 0
240 msec/iter = 3.90 ROE[avg,max] = [0.259737723, 0.312500000] radices = 60 8 16 16 0 0 0 0 0 0
256 msec/iter = 3.70 ROE[avg,max] = [0.303571429, 0.375000000] radices = 32 16 16 16 0 0 0 0 0 0
288 msec/iter = 4.60 ROE[avg,max] = [0.246065848, 0.312500000] radices = 144 32 32 0 0 0 0 0 0 0
320 msec/iter = 4.80 ROE[avg,max] = [0.275948661, 0.375000000] radices = 40 16 16 16 0 0 0 0 0 0
352 msec/iter = 5.70 ROE[avg,max] = [0.292622811, 0.375000000] radices = 44 16 16 16 0 0 0 0 0 0
384 msec/iter = 5.70 ROE[avg,max] = [0.260909598, 0.312500000] radices = 24 16 16 32 0 0 0 0 0 0
416 msec/iter = 6.70 ROE[avg,max] = [0.264285714, 0.296875000] radices = 52 16 16 16 0 0 0 0 0 0
448 msec/iter = 6.90 ROE[avg,max] = [0.290206473, 0.343750000] radices = 28 16 16 32 0 0 0 0 0 0
480 msec/iter = 7.60 ROE[avg,max] = [0.280245536, 0.375000000] radices = 60 16 16 16 0 0 0 0 0 0
512 msec/iter = 7.60 ROE[avg,max] = [0.248214286, 0.312500000] radices = 16 16 32 32 0 0 0 0 0 0
576 msec/iter = 9.10 ROE[avg,max] = [0.263337054, 0.375000000] radices = 36 16 16 32 0 0 0 0 0 0
640 msec/iter = 9.70 ROE[avg,max] = [0.261049107, 0.312500000] radices = 20 16 32 32 0 0 0 0 0 0
704 msec/iter = 11.50 ROE[avg,max] = [0.299386161, 0.359375000] radices = 44 16 16 32 0 0 0 0 0 0
768 msec/iter = 11.70 ROE[avg,max] = [0.285895647, 0.375000000] radices = 48 16 16 32 0 0 0 0 0 0
832 msec/iter = 13.40 ROE[avg,max] = [0.267006138, 0.328125000] radices = 52 16 16 32 0 0 0 0 0 0
896 msec/iter = 14.20 ROE[avg,max] = [0.291106306, 0.343750000] radices = 56 16 16 32 0 0 0 0 0 0
960 msec/iter = 15.40 ROE[avg,max] = [0.285044643, 0.375000000] radices = 60 16 16 32 0 0 0 0 0 0
1024 msec/iter = 15.80 ROE[avg,max] = [0.271428571, 0.375000000] radices = 32 16 32 32 0 0 0 0 0 0
1152 msec/iter = 18.80 ROE[avg,max] = [0.259458705, 0.312500000] radices = 36 16 32 32 0 0 0 0 0 0
1280 msec/iter = 20.50 ROE[avg,max] = [0.265569196, 0.328125000] radices = 40 16 32 32 0 0 0 0 0 0
1408 msec/iter = 25.10 ROE[avg,max] = [0.302511161, 0.375000000] radices = 44 32 32 16 0 0 0 0 0 0
1536 msec/iter = 24.40 ROE[avg,max] = [0.287500000, 0.343750000] radices = 48 16 32 32 0 0 0 0 0 0
1664 msec/iter = 28.70 ROE[avg,max] = [0.254003906, 0.281250000] radices = 52 16 32 32 0 0 0 0 0 0
1792 msec/iter = 29.80 ROE[avg,max] = [0.288364955, 0.343750000] radices = 56 16 32 32 0 0 0 0 0 0
1920 msec/iter = 33.20 ROE[avg,max] = [0.258398438, 0.312500000] radices = 60 16 32 32 0 0 0 0 0 0
2048 msec/iter = 34.00 ROE[avg,max] = [0.246616908, 0.312500000] radices = 32 32 32 32 0 0 0 0 0 0
2304 msec/iter = 40.00 ROE[avg,max] = [0.298158482, 0.375000000] radices = 144 16 16 32 0 0 0 0 0 0
2560 msec/iter = 43.51 ROE[avg,max] = [0.264843750, 0.312500000] radices = 40 32 32 32 0 0 0 0 0 0
2816 msec/iter = 51.00 ROE[avg,max] = [0.317815290, 0.375000000] radices = 176 16 16 32 0 0 0 0 0 0
3072 msec/iter = 51.90 ROE[avg,max] = [0.243532017, 0.296875000] radices = 48 32 32 32 0 0 0 0 0 0
3328 msec/iter = 60.11 ROE[avg,max] = [0.252553013, 0.312500000] radices = 52 32 32 32 0 0 0 0 0 0
3584 msec/iter = 63.40 ROE[avg,max] = [0.292243304, 0.375000000] radices = 56 32 32 32 0 0 0 0 0 0
3840 msec/iter = 69.50 ROE[avg,max] = [0.267271205, 0.375000000] radices = 240 16 16 32 0 0 0 0 0 0
4096 msec/iter = 69.00 ROE[avg,max] = [0.244712612, 0.281250000] radices = 16 16 16 16 32 0 0 0 0 0
4608 msec/iter = 81.20 ROE[avg,max] = [0.268268694, 0.343750000] radices = 288 16 16 32 0 0 0 0 0 0
5120 msec/iter = 90.60 ROE[avg,max] = [0.344419643, 0.375000000] radices = 20 16 16 16 32 0 0 0 0 0
5632 msec/iter = 105.40 ROE[avg,max] = [0.324665179, 0.375000000] radices = 176 16 32 32 0 0 0 0 0 0
6144 msec/iter = 106.60 ROE[avg,max] = [0.252887835, 0.289062500] radices = 24 16 16 16 32 0 0 0 0 0
6656 msec/iter = 122.59 ROE[avg,max] = [0.281138393, 0.312500000] radices = 208 16 32 32 0 0 0 0 0 0
7168 msec/iter = 132.00 ROE[avg,max] = [0.289226423, 0.343750000] radices = 28 16 16 16 32 0 0 0 0 0
7680 msec/iter = 145.20 ROE[avg,max] = [0.260156250, 0.312500000] radices = 240 16 32 32 0 0 0 0 0 0
8192 msec/iter = 146.80 ROE[avg,max] = [0.244656808, 0.281250000] radices = 16 16 16 32 32 0 0 0 0 0
9216 msec/iter = 170.30 ROE[avg,max] = [0.254994420, 0.316406250] radices = 36 16 16 16 32 0 0 0 0 0
10240 msec/iter = 185.20 ROE[avg,max] = [0.284905134, 0.343750000] radices = 40 16 16 16 32 0 0 0 0 0
11264 msec/iter = 215.90 ROE[avg,max] = [0.284776088, 0.328125000] radices = 44 16 16 16 32 0 0 0 0 0
12288 msec/iter = 222.20 ROE[avg,max] = [0.247877720, 0.312500000] radices = 48 16 16 16 32 0 0 0 0 0
13312 msec/iter = 251.81 ROE[avg,max] = [0.304910714, 0.343750000] radices = 52 16 16 16 32 0 0 0 0 0
14336 msec/iter = 279.70 ROE[avg,max] = [0.275892857, 0.312500000] radices = 28 16 16 32 32 0 0 0 0 0
15360 msec/iter = 295.60 ROE[avg,max] = [0.288839286, 0.343750000] radices = 60 16 16 16 32 0 0 0 0 0
16384 msec/iter = 306.90 ROE[avg,max] = [0.253431920, 0.312500000] radices = 32 16 16 32 32 0 0 0 0 0
18432 msec/iter = 365.40 ROE[avg,max] = [0.261997768, 0.296875000] radices = 36 16 16 32 32 0 0 0 0 0
20480 msec/iter = 397.50 ROE[avg,max] = [0.269196429, 0.312500000] radices = 40 16 16 32 32 0 0 0 0 0
22528 msec/iter = 465.71 ROE[avg,max] = [0.284232003, 0.343750000] radices = 44 16 16 32 32 0 0 0 0 0
24576 msec/iter = 479.50 ROE[avg,max] = [0.293080357, 0.343750000] radices = 48 16 16 32 32 0 0 0 0 0
26624 msec/iter = 547.20 ROE[avg,max] = [0.267187500, 0.312500000] radices = 52 16 16 32 32 0 0 0 0 0
28672 msec/iter = 576.00 ROE[avg,max] = [0.309486607, 0.343750000] radices = 56 16 16 32 32 0 0 0 0 0
30720 msec/iter = 633.21 ROE[avg,max] = [0.264899554, 0.312500000] radices = 60 16 16 32 32 0 0 0 0 0
32768 msec/iter = 666.00 ROE[avg,max] = [0.255109515, 0.312500000] radices = 32 32 32 16 32 0 0 0 0 0
36864 msec/iter = 757.10 ROE[avg,max] = [0.273688616, 0.312500000] radices = 144 16 16 16 32 0 0 0 0 0
40960 msec/iter = 857.64 ROE[avg,max] = [0.262755476, 0.296875000] radices = 160 16 16 16 32 0 0 0 0 0
45056 msec/iter = 960.50 ROE[avg,max] = [0.295835658, 0.343750000] radices = 176 16 16 16 32 0 0 0 0 0
49152 msec/iter = 1057.21 ROE[avg,max] = [0.280859375, 0.312500000] radices = 48 16 32 32 32 0 0 0 0 0
53248 msec/iter = 1205.40 ROE[avg,max] = [0.258314732, 0.312500000] radices = 52 16 32 32 32 0 0 0 0 0
57344 msec/iter = 1231.01 ROE[avg,max] = [0.282653373, 0.312500000] radices = 224 16 16 16 32 0 0 0 0 0
61440 msec/iter = 1322.60 ROE[avg,max] = [0.264676339, 0.343750000] radices = 240 16 16 16 32 0 0 0 0 0
Code:
17.1
128 msec/iter = 1.40 ROE[avg,max] = [0.278125000, 0.375000000] radices = 16 16 16 16 0 0 0 0 0 0
144 msec/iter = 1.60 ROE[avg,max] = [0.257686942, 0.328125000] radices = 144 16 32 0 0 0 0 0 0 0
160 msec/iter = 1.80 ROE[avg,max] = [0.283258929, 0.343750000] radices = 160 32 16 0 0 0 0 0 0 0
192 msec/iter = 2.10 ROE[avg,max] = [0.276339286, 0.343750000] radices = 48 8 16 16 0 0 0 0 0 0
224 msec/iter = 2.50 ROE[avg,max] = [0.285142299, 0.343750000] radices = 28 16 16 16 0 0 0 0 0 0
240 msec/iter = 2.80 ROE[avg,max] = [0.259054129, 0.312500000] radices = 240 16 32 0 0 0 0 0 0 0
256 msec/iter = 2.80 ROE[avg,max] = [0.247427150, 0.281250000] radices = 32 16 16 16 0 0 0 0 0 0
288 msec/iter = 3.30 ROE[avg,max] = [0.294754464, 0.375000000] radices = 36 16 16 16 0 0 0 0 0 0
320 msec/iter = 3.50 ROE[avg,max] = [0.256869071, 0.312500000] radices = 20 16 16 32 0 0 0 0 0 0
384 msec/iter = 4.20 ROE[avg,max] = [0.259472656, 0.312500000] radices = 24 16 16 32 0 0 0 0 0 0
416 msec/iter = 5.00 ROE[avg,max] = [0.258949498, 0.312500000] radices = 208 32 32 0 0 0 0 0 0 0
448 msec/iter = 5.00 ROE[avg,max] = [0.279471261, 0.328125000] radices = 28 16 16 32 0 0 0 0 0 0
480 msec/iter = 5.70 ROE[avg,max] = [0.268457031, 0.312500000] radices = 60 16 16 16 0 0 0 0 0 0
512 msec/iter = 5.80 ROE[avg,max] = [0.243409947, 0.312500000] radices = 32 16 16 32 0 0 0 0 0 0
576 msec/iter = 6.70 ROE[avg,max] = [0.302343750, 0.375000000] radices = 36 16 16 32 0 0 0 0 0 0
640 msec/iter = 7.20 ROE[avg,max] = [0.281138393, 0.375000000] radices = 40 16 16 32 0 0 0 0 0 0
768 msec/iter = 8.70 ROE[avg,max] = [0.252845982, 0.296875000] radices = 48 16 16 32 0 0 0 0 0 0
832 msec/iter = 10.20 ROE[avg,max] = [0.299107143, 0.375000000] radices = 52 16 16 32 0 0 0 0 0 0
896 msec/iter = 10.70 ROE[avg,max] = [0.280482701, 0.375000000] radices = 28 16 32 32 0 0 0 0 0 0
960 msec/iter = 11.60 ROE[avg,max] = [0.266210938, 0.312500000] radices = 60 16 16 32 0 0 0 0 0 0
1024 msec/iter = 12.13 ROE[avg,max] = [0.237806920, 0.312500000] radices = 32 16 32 32 0 0 0 0 0 0
1152 msec/iter = 14.00 ROE[avg,max] = [0.277790179, 0.312500000] radices = 36 16 32 32 0 0 0 0 0 0
1280 msec/iter = 15.41 ROE[avg,max] = [0.286830357, 0.343750000] radices = 40 16 32 32 0 0 0 0 0 0
1408 msec/iter = 18.15 ROE[avg,max] = [0.308140346, 0.390625000] radices = 176 16 16 16 0 0 0 0 0 0
1536 msec/iter = 18.38 ROE[avg,max] = [0.254910714, 0.343750000] radices = 48 16 32 32 0 0 0 0 0 0
1664 msec/iter = 21.22 ROE[avg,max] = [0.282310268, 0.343750000] radices = 208 16 16 16 0 0 0 0 0 0
1792 msec/iter = 22.84 ROE[avg,max] = [0.271777344, 0.312500000] radices = 28 32 32 32 0 0 0 0 0 0
1920 msec/iter = 24.30 ROE[avg,max] = [0.296428571, 0.375000000] radices = 60 16 32 32 0 0 0 0 0 0
2048 msec/iter = 25.74 ROE[avg,max] = [0.247865513, 0.312500000] radices = 32 32 32 32 0 0 0 0 0 0
2304 msec/iter = 28.84 ROE[avg,max] = [0.275669643, 0.312500000] radices = 144 16 16 32 0 0 0 0 0 0
2560 msec/iter = 32.86 ROE[avg,max] = [0.300000000, 0.375000000] radices = 40 32 32 32 0 0 0 0 0 0
2816 msec/iter = 36.79 ROE[avg,max] = [0.291238839, 0.343750000] radices = 176 16 16 32 0 0 0 0 0 0
3072 msec/iter = 38.61 ROE[avg,max] = [0.245962960, 0.281250000] radices = 48 32 32 32 0 0 0 0 0 0
3328 msec/iter = 43.28 ROE[avg,max] = [0.284221540, 0.343750000] radices = 208 16 16 32 0 0 0 0 0 0
3584 msec/iter = 46.51 ROE[avg,max] = [0.290764509, 0.343750000] radices = 224 16 16 32 0 0 0 0 0 0
3840 msec/iter = 49.49 ROE[avg,max] = [0.258475167, 0.296875000] radices = 240 16 16 32 0 0 0 0 0 0
4096 msec/iter = 53.52 ROE[avg,max] = [0.284402902, 0.312500000] radices = 16 16 16 16 32 0 0 0 0 0
4608 msec/iter = 59.81 ROE[avg,max] = [0.249079241, 0.281250000] radices = 144 16 32 32 0 0 0 0 0 0
5120 msec/iter = 66.84 ROE[avg,max] = [0.257080078, 0.312500000] radices = 20 16 16 16 32 0 0 0 0 0
5632 msec/iter = 76.25 ROE[avg,max] = [0.282209124, 0.343750000] radices = 176 16 32 32 0 0 0 0 0 0
6144 msec/iter = 79.79 ROE[avg,max] = [0.277678571, 0.312500000] radices = 24 16 16 16 32 0 0 0 0 0
6656 msec/iter = 88.71 ROE[avg,max] = [0.264644950, 0.312500000] radices = 208 16 32 32 0 0 0 0 0 0
7168 msec/iter = 96.36 ROE[avg,max] = [0.294782366, 0.375000000] radices = 224 16 32 32 0 0 0 0 0 0
7680 msec/iter = 103.47 ROE[avg,max] = [0.267142160, 0.312500000] radices = 240 16 32 32 0 0 0 0 0 0
8192 msec/iter = 112.06 ROE[avg,max] = [0.257198661, 0.312500000] radices = 256 16 32 32 0 0 0 0 0 0
9216 msec/iter = 124.63 ROE[avg,max] = [0.293973214, 0.343750000] radices = 36 16 16 16 32 0 0 0 0 0
10240 msec/iter = 136.20 ROE[avg,max] = [0.280468750, 0.375000000] radices = 40 16 16 16 32 0 0 0 0 0
11264 msec/iter = 160.42 ROE[avg,max] = [0.283238002, 0.328125000] radices = 44 16 16 16 32 0 0 0 0 0
12288 msec/iter = 162.44 ROE[avg,max] = [0.261104911, 0.312500000] radices = 48 16 16 16 32 0 0 0 0 0
13312 msec/iter = 188.98 ROE[avg,max] = [0.289564732, 0.343750000] radices = 208 32 32 32 0 0 0 0 0 0
14336 msec/iter = 197.28 ROE[avg,max] = [0.287133789, 0.343750000] radices = 56 16 16 16 32 0 0 0 0 0
15360 msec/iter = 218.04 ROE[avg,max] = [0.262025670, 0.296875000] radices = 60 16 16 16 32 0 0 0 0 0
16384 msec/iter = 232.43 ROE[avg,max] = [0.239365932, 0.281250000] radices = 32 16 16 32 32 0 0 0 0 0
18432 msec/iter = 269.14 ROE[avg,max] = [0.246674456, 0.281250000] radices = 288 32 32 32 0 0 0 0 0 0
20480 msec/iter = 297.32 ROE[avg,max] = [0.325000000, 0.375000000] radices = 40 16 16 32 32 0 0 0 0 0
22528 msec/iter = 345.23 ROE[avg,max] = [0.304185268, 0.367187500] radices = 176 16 16 16 16 0 0 0 0 0
24576 msec/iter = 352.60 ROE[avg,max] = [0.257749721, 0.312500000] radices = 48 16 16 32 32 0 0 0 0 0
26624 msec/iter = 414.03 ROE[avg,max] = [0.284179688, 0.343750000] radices = 52 16 16 32 32 0 0 0 0 0
28672 msec/iter = 420.70 ROE[avg,max] = [0.302594866, 0.343750000] radices = 56 16 16 32 32 0 0 0 0 0
30720 msec/iter = 466.00 ROE[avg,max] = [0.291629464, 0.375000000] radices = 240 16 16 16 16 0 0 0 0 0
32768 msec/iter = 498.50 ROE[avg,max] = [0.267689732, 0.343750000] radices = 128 16 16 16 32 0 0 0 0 0
36864 msec/iter = 535.63 ROE[avg,max] = [0.254352679, 0.312500000] radices = 144 16 16 16 32 0 0 0 0 0
40960 msec/iter = 594.90 ROE[avg,max] = [0.297098214, 0.343750000] radices = 160 16 16 16 32 0 0 0 0 0
45056 msec/iter = 674.78 ROE[avg,max] = [0.299944196, 0.343750000] radices = 176 16 16 16 32 0 0 0 0 0
49152 msec/iter = 784.09 ROE[avg,max] = [0.254603795, 0.281250000] radices = 192 16 16 16 32 0 0 0 0 0
53248 msec/iter = 904.71 ROE[avg,max] = [0.271316964, 0.312500000] radices = 52 16 32 32 32 0 0 0 0 0
57344 msec/iter = 857.03 ROE[avg,max] = [0.319642857, 0.375000000] radices = 224 16 16 16 32 0 0 0 0 0
61440 msec/iter = 897.44 ROE[avg,max] = [0.276255580, 0.312500000] radices = 240 16 16 16 32 0 0 0 0 0
|
|
|
|
|
|
#87 |
|
∂2ω=0
Sep 2002
República de California
101101111011002 Posts |
Thanks, Victor - you clearly spent a lot of time running the full self-test ranges, is this an otherwise-idle AVX system of yours?
The huge-roundoff-errors-in-scalar-build sound like a bad nearest-int emulation ... if you recompile just a single small file (say br.c) and add -DVERBOSE_HEADERS to the compile command, that will tell you which version of gcc's rint() is being used, e.g. on my Core macbook: Code:
In file included from ../br.c:23:
In file included from ../Mlucas.h:29:
In file included from ../align.h:29:
../types.h:225:3: warning: #warning Using lrint() for DNINT [-W#warnings]
#warning Using lrint() for DNINT
Last fiddled with by ewmayer on 2018-02-19 at 01:48 |
|
|
|
|
|
#88 |
|
"Victor de Hollander"
Aug 2011
the Netherlands
32×131 Posts |
I haven't tried all the FFT sizes with the scalar build, but all the ones I tried all failed. It is when compiling with MINGW64 for Windows, which could introduce some strange behaviour. I wouldn't put it high on the priority list, as the AVX and SSE2 build successfully and even an old Pentium4 has SSE2.
Anyway I ran the verbose header function on br.c: Code:
Victor@PCVICTOR MINGW64 /home/mlucas_v17.1-20180123/SCALAR $ gcc -c -DVERBOSE_HEADERS br.c >& build.log Code:
In file included from types.h:30:0,
from align.h:29,
from Mlucas.h:29,
from br.c:23:
platform.h:1518:3: warning: #warning platform.h: Defining both X64_ASM and X32_ASM [-Wcpp]
#warning platform.h: Defining both X64_ASM and X32_ASM
^~~~~~~
In file included from align.h:29:0,
from Mlucas.h:29,
from br.c:23:
types.h:225:3: warning: #warning Using lrint() for DNINT [-Wcpp]
#warning Using lrint() for DNINT
^~~~~~~
In file included from imul_macro.h:29:0,
from mi64.h:30,
from Mdata.h:31,
from carry.h:29,
from Mlucas.h:30,
from br.c:23:
imul_macro0.h:309:3: warning: #warning X86_64-type CPU detected [-Wcpp]
#warning X86_64-type CPU detected
^~~~~~~
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Mlucas v18 available | ewmayer | Mlucas | 48 | 2019-11-28 02:53 |
| Mlucas on ubuntu | Damian | Mlucas | 17 | 2017-11-13 18:12 |
| Mlucas version 17 | ewmayer | Mlucas | 3 | 2017-06-17 11:18 |
| MLucas on IBM Mainframe | Lorenzo | Mlucas | 52 | 2016-03-13 08:45 |
| mlucas on sun | delta_t | Mlucas | 14 | 2007-10-04 05:45 |