![]() |
|
|
#133 | |
|
∂2ω=0
Sep 2002
República de California
22×2,939 Posts |
I sent e-mail to Hardkernel last night:
Quote:
Might be useful to post a link to this thread and the performance data - Simd-versus-not, A53-versus-A57, etc - folks have posted here, any suggestions as to which subforum at the above site would be most appropriate for such a thread are welcome - perhaps the Projects subforum? My little C2 continues to steadily crunch away, currently ~25% through its first GIMPS double-check assignment. |
|
|
|
|
|
|
#134 |
|
"Composite as Heck"
Oct 2017
2×52×19 Posts |
There's an A53 board currently looking for crowdfunding on indiegogo, which looks interesting compared to similar boards because of the RAM: https://www.indiegogo.com/projects/r...android-linux/
DDR4-2133 in the Renegade vs LPDDR2 900 in a Pi 3, could this potentially yield an appreciable difference in mlucas? Failing that, is there some other prime related use for a low cost device with more performant RAM? There are 3 options for RAM, 1, 2 and 4 GB. |
|
|
|
|
|
#135 | |
|
∂2ω=0
Sep 2002
República de California
22×2,939 Posts |
Quote:
|
|
|
|
|
|
|
#136 |
|
"Composite as Heck"
Oct 2017
2×52×19 Posts |
Ok, worth a shot. I pledged for a 1GB Renegade, and am now prepping to bench a pi3 in armv8 mode (64 bit kernel).
Compiled on pi3 with Code:
gcc -march=armv8-a -mtune=cortex-a53 -mcpu=cortex-a53 -c -O3 -DUSE_ARM_V8_SIMD -DUSE_THREADS ../src/*.c >& build.log Code:
Mlucas 17.1
http://hogranch.com/mayer/README.html
INFO: using 53-bit-significand form of floating-double rounding constant for scalar-mode DNINT emulation.
INFO: testing FFT radix tables...
Mlucas selftest running.....
/****************************************************************************/
INFO: Unable to find/open mlucas.cfg file in r+ mode ... creating from scratch.
NTHREADS = 4
M20000047: using FFT length 1024K = 1048576 8-byte floats.
this gives an average 19.073531150817871 bits per digit
Using complex FFT radices 1024 16 32
radix16_dif_dit_pass pfetch_dist = 32
radix16_wrapper_square: pfetch_dist = 1024
Code:
-march=armv8-a -mtune=cortex-a53 -mcpu=cortex-a53 |
|
|
|
|
|
#137 |
|
∂2ω=0
Sep 2002
República de California
22·2,939 Posts |
@M344587487:
I also tried builds on my Odroid with and sans the extra -march flags, using them actually gave a consistently 1-2% slower binary, so I reverted to the basic -O3. Suggest firing up the SIMD-enabled binary and running the self-test under gdb ('run -s m') to try to localize the crash. If on crashing-under-gdb 'where' gives a specific function, you can rebuild the file containing same with some debugging enabled to try to further localize things. E.g. if 'where' indicates (say) the radix1024_ditN_cy_dif1 function, here is what I would do if the crash were on my system: ctrl-z (pause gdb) gcc -c -O0 -g3 -ggdb ../src/radix1024_ditN_cy_dif1.c (rebuild just that file with no-opts and debug symbols enabled) gcc -o Mlucas *.o -lm -lpthread -lrt && fg (relink and go back into gdb) run -s m ...and if the crash again occurs (if not, you'd need to redo the above with -O1 to see if that minimal opt-level allows you to reproduce the crash) in the same function, now it should allow you zero in on a specific line number, probably one with a SIMD-asm macro invocation. Two further questions: 1. What version of gcc are you using? 2. What does cat /proc/cpuinfo say re. the CPU(s) on the system in question? |
|
|
|
|
|
#138 |
|
"Composite as Heck"
Oct 2017
2·52·19 Posts |
Running this distribution: https://github.com/bamarni/pi64
/proc/cpuinfo: Code:
processor : 0 BogoMIPS : 38.40 Features : fp asimd evtstrm crc32 cpuid CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd03 CPU revision : 4 processor : 1 BogoMIPS : 38.40 Features : fp asimd evtstrm crc32 cpuid CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd03 CPU revision : 4 processor : 2 BogoMIPS : 38.40 Features : fp asimd evtstrm crc32 cpuid CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd03 CPU revision : 4 processor : 3 BogoMIPS : 38.40 Features : fp asimd evtstrm crc32 cpuid CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd03 CPU revision : 4 Code:
gcc (Debian 6.3.0-18) 6.3.0 20170516 Copyright (C) 2016 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. where indicated radix32_wrapper_square: Code:
(gdb) run -s m
Starting program: /home/pi/mlucas/mlucas_v17.1/mlucas -s m
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
Mlucas 17.1
http://hogranch.com/mayer/README.html
INFO: testing qfloat routines...
CPU Family = ARM Embedded ABI, OS = Linux, 64-bit Version, compiled with Gnu C [or other compatible], Version 6.3.0 20170516.
INFO: Build uses ARMv8 advanced-SIMD instruction set.
INFO: Using inline-macro form of MUL_LOHI64.
INFO: MLUCAS_PATH is set to ""
INFO: using 53-bit-significand form of floating-double rounding constant for scalar-mode DNINT emulation.
Setting DAT_BITS = 10, PAD_BITS = 2
INFO: testing IMUL routines...
INFO: System has 4 available processor cores.
INFO: testing FFT radix tables...
Mlucas selftest running.....
/****************************************************************************/
INFO: Unable to find/open mlucas.cfg file in r+ mode ... creating from scratch.
No CPU set or threadcount specified ... running single-threaded.
Set affinity for the following 1 cores: 0.
M20000047: using FFT length 1024K = 1048576 8-byte floats.
this gives an average 19.073531150817871 bits per digit
Using complex FFT radices 1024 16 32
[New Thread 0x7fb3c97200 (LWP 1224)]
mers_mod_square: Init threadpool of 1 threads
radix16_dif_dit_pass pfetch_dist = 32
radix16_wrapper_square: pfetch_dist = 1024
Thread 2 "mlucas" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fb3c97200 (LWP 1224)]
0x000000010016bf68 in radix32_wrapper_square ()
(gdb) where
#0 0x000000010016bf68 in radix32_wrapper_square ()
#1 0x000000010004e8c0 in mers_process_chunk ()
#2 0x000000010022b4e4 in worker_thr_routine ()
#3 0x0000007fb7f020a0 in start_thread (arg=0x10022b288 <worker_thr_routine>) at pthread_create.c:335
#4 0x0000007fb7e61edc in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:77
(gdb)
Code:
M20000047 Roundoff warning on iteration 5, maxerr = 0.406358212233 M20000047 Roundoff warning on iteration 6, maxerr = 0.484222412109 FATAL ERROR...Halting test of exponent 20000047 ***** Excessive level of roundoff error detected - this radix set will not be used. ***** NTHREADS = 1 M20000047: using FFT length 1024K = 1048576 8-byte floats. this gives an average 19.073531150817871 bits per digit Using complex FFT radices 1024 32 16 [New Thread 0x7fb27e6200 (LWP 1389)] mers_mod_square: Init threadpool of 1 threads Thread 4 "mlucas" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fb27e6200 (LWP 1389)] 0x00000001000f2aac in radix16_wrapper_square () Code:
-march=armv8-a -mtune=cortex-a53 -mcpu=cortex-a53 -O3 Code:
Starting program: /home/pi/mlucas/mlucas_v17.1/obj/mlucas -s m
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
Mlucas 17.1
http://hogranch.com/mayer/README.html
INFO: testing qfloat routines...
CPU Family = ARM Embedded ABI, OS = Linux, 64-bit Version, compiled with Gnu C [or other compatible], Version 6.3.0 20170516.
INFO: Build uses ARMv8 advanced-SIMD instruction set.
INFO: Using inline-macro form of MUL_LOHI64.
INFO: MLUCAS_PATH is set to ""
INFO: using 53-bit-significand form of floating-double rounding constant for scalar-mode DNINT emulation.
Setting DAT_BITS = 10, PAD_BITS = 2
INFO: testing IMUL routines...
INFO: System has 4 available processor cores.
INFO: testing FFT radix tables...
Mlucas selftest running.....
/****************************************************************************/
INFO: Unable to find/open mlucas.cfg file in r+ mode ... creating from scratch.
No CPU set or threadcount specified ... running single-threaded.
Set affinity for the following 1 cores: 0.
M20000047: using FFT length 1024K = 1048576 8-byte floats.
this gives an average 19.073531150817871 bits per digit
Using complex FFT radices 1024 16 32
[New Thread 0x7fb3c97200 (LWP 1645)]
mers_mod_square: Init threadpool of 1 threads
radix16_dif_dit_pass pfetch_dist = 32
[New Thread 0x7fb3080200 (LWP 1646)]
Using 1 threads in carry step
M20000047 Roundoff warning on iteration 5, maxerr = 0.406358203385
M20000047 Roundoff warning on iteration 6, maxerr = 0.484222412109
FATAL ERROR...Halting test of exponent 20000047
***** Excessive level of roundoff error detected - this radix set will not be used. *****
NTHREADS = 1
M20000047: using FFT length 1024K = 1048576 8-byte floats.
this gives an average 19.073531150817871 bits per digit
Using complex FFT radices 1024 32 16
[New Thread 0x7fb285e200 (LWP 1647)]
mers_mod_square: Init threadpool of 1 threads
M20000047 Roundoff warning on iteration 5, maxerr = 0.406358212698
M20000047 Roundoff warning on iteration 6, maxerr = 0.484222412109
FATAL ERROR...Halting test of exponent 20000047
***** Excessive level of roundoff error detected - this radix set will not be used. *****
NTHREADS = 1
M20000047: using FFT length 1024K = 1048576 8-byte floats.
this gives an average 19.073531150817871 bits per digit
Using complex FFT radices 256 8 16 16
[New Thread 0x7fb205e200 (LWP 1648)]
mers_mod_square: Init threadpool of 1 threads
[New Thread 0x7fb185e200 (LWP 1649)]
Using 1 threads in carry step
M20000047 Roundoff warning on iteration 5, maxerr = 0.406358212698
M20000047 Roundoff warning on iteration 6, maxerr = 0.484222412109
FATAL ERROR...Halting test of exponent 20000047
***** Excessive level of roundoff error detected - this radix set will not be used. *****
NTHREADS = 1
M20000047: using FFT length 1024K = 1048576 8-byte floats.
this gives an average 19.073531150817871 bits per digit
Using complex FFT radices 128 16 16 16
[New Thread 0x7fb105e200 (LWP 1650)]
mers_mod_square: Init threadpool of 1 threads
[New Thread 0x7fb085e200 (LWP 1651)]
Using 1 threads in carry step
M20000047 Roundoff warning on iteration 5, maxerr = 0.406358212698
M20000047 Roundoff warning on iteration 6, maxerr = 0.484222412109
FATAL ERROR...Halting test of exponent 20000047
***** Excessive level of roundoff error detected - this radix set will not be used. *****
NTHREADS = 1
M20000047: using FFT length 1024K = 1048576 8-byte floats.
this gives an average 19.073531150817871 bits per digit
Using complex FFT radices 64 16 16 32
[New Thread 0x7fb005e200 (LWP 1652)]
mers_mod_square: Init threadpool of 1 threads
[New Thread 0x7faf85e200 (LWP 1653)]
Using 1 threads in carry step
M20000047 Roundoff warning on iteration 5, maxerr = 0.406358203385
M20000047 Roundoff warning on iteration 6, maxerr = 0.484222412109
FATAL ERROR...Halting test of exponent 20000047
***** Excessive level of roundoff error detected - this radix set will not be used. *****
NTHREADS = 1
M20000047: using FFT length 1024K = 1048576 8-byte floats.
this gives an average 19.073531150817871 bits per digit
Using complex FFT radices 64 32 16 16
[New Thread 0x7faf05e200 (LWP 1654)]
mers_mod_square: Init threadpool of 1 threads
M20000047 Roundoff warning on iteration 5, maxerr = 0.406358212698
M20000047 Roundoff warning on iteration 6, maxerr = 0.484222412109
FATAL ERROR...Halting test of exponent 20000047
***** Excessive level of roundoff error detected - this radix set will not be used. *****
NTHREADS = 1
M20000047: using FFT length 1024K = 1048576 8-byte floats.
this gives an average 19.073531150817871 bits per digit
Using complex FFT radices 64 8 8 8 16
[New Thread 0x7fae85e200 (LWP 1655)]
mers_mod_square: Init threadpool of 1 threads
M20000047 Roundoff warning on iteration 5, maxerr = 0.406358212698
M20000047 Roundoff warning on iteration 6, maxerr = 0.484222412109
FATAL ERROR...Halting test of exponent 20000047
***** Excessive level of roundoff error detected - this radix set will not be used. *****
NTHREADS = 1
M20000047: using FFT length 1024K = 1048576 8-byte floats.
this gives an average 19.073531150817871 bits per digit
Using complex FFT radices 32 16 32 32
[New Thread 0x7fae05e200 (LWP 1656)]
mers_mod_square: Init threadpool of 1 threads
[New Thread 0x7fad85e200 (LWP 1657)]
Using 1 threads in carry step
M20000047 Roundoff warning on iteration 5, maxerr = 0.406358203385
M20000047 Roundoff warning on iteration 6, maxerr = 0.484222412109
FATAL ERROR...Halting test of exponent 20000047
***** Excessive level of roundoff error detected - this radix set will not be used. *****
NTHREADS = 1
M20000047: using FFT length 1024K = 1048576 8-byte floats.
this gives an average 19.073531150817871 bits per digit
Using complex FFT radices 32 32 32 16
[New Thread 0x7fad05e200 (LWP 1658)]
mers_mod_square: Init threadpool of 1 threads
M20000047 Roundoff warning on iteration 5, maxerr = 0.406358212698
M20000047 Roundoff warning on iteration 6, maxerr = 0.484222412109
FATAL ERROR...Halting test of exponent 20000047
***** Excessive level of roundoff error detected - this radix set will not be used. *****
NTHREADS = 1
M20000047: using FFT length 1024K = 1048576 8-byte floats.
this gives an average 19.073531150817871 bits per digit
Using complex FFT radices 32 8 8 16 16
[New Thread 0x7fac85e200 (LWP 1660)]
mers_mod_square: Init threadpool of 1 threads
M20000047 Roundoff warning on iteration 5, maxerr = 0.406358212698
M20000047 Roundoff warning on iteration 6, maxerr = 0.484222412109
FATAL ERROR...Halting test of exponent 20000047
***** Excessive level of roundoff error detected - this radix set will not be used. *****
NTHREADS = 1
M20000047: using FFT length 1024K = 1048576 8-byte floats.
this gives an average 19.073531150817871 bits per digit
Using complex FFT radices 16 32 32 32
[New Thread 0x7fac05e200 (LWP 1661)]
mers_mod_square: Init threadpool of 1 threads
[New Thread 0x7fab85e200 (LWP 1662)]
Using 1 threads in carry step
M20000047 Roundoff warning on iteration 5, maxerr = 0.406358203385
M20000047 Roundoff warning on iteration 6, maxerr = 0.484222412109
FATAL ERROR...Halting test of exponent 20000047
***** Excessive level of roundoff error detected - this radix set will not be used. *****
...
I'm now rebuilding from scratch without the -march, -mtune or -mcpu flags just incase that's the problem. Could it be that the distribution I'm using is unsuitable? If this clean build fails I guess the next step is to try a different distro. Thanks for walking me through the debugging steps, I've barely used gdb before so it helped a lot. edit: The clean build seg faults at radix32 as before: Code:
pi@raspberrypi:~/mlucas/mlucas_v17.1/clean$ gdb mlucas
GNU gdb (Debian 7.12-6) 7.12.0.20161007-git
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "aarch64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from mlucas...(no debugging symbols found)...done.
(gdb) run -s m
Starting program: /home/pi/mlucas/mlucas_v17.1/clean/mlucas -s m
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
Mlucas 17.1
http://hogranch.com/mayer/README.html
INFO: testing qfloat routines...
CPU Family = ARM Embedded ABI, OS = Linux, 64-bit Version, compiled with Gnu C [or other compatible], Version 6.3.0 20170516.
INFO: Build uses ARMv8 advanced-SIMD instruction set.
INFO: Using inline-macro form of MUL_LOHI64.
INFO: MLUCAS_PATH is set to ""
INFO: using 53-bit-significand form of floating-double rounding constant for scalar-mode DNINT emulation.
Setting DAT_BITS = 10, PAD_BITS = 2
INFO: testing IMUL routines...
INFO: System has 4 available processor cores.
INFO: testing FFT radix tables...
Mlucas selftest running.....
/****************************************************************************/
INFO: Unable to find/open mlucas.cfg file in r+ mode ... creating from scratch.
No CPU set or threadcount specified ... running single-threaded.
Set affinity for the following 1 cores: 0.
M20000047: using FFT length 1024K = 1048576 8-byte floats.
this gives an average 19.073531150817871 bits per digit
Using complex FFT radices 1024 16 32
[New Thread 0x7fb3c97200 (LWP 2237)]
mers_mod_square: Init threadpool of 1 threads
radix16_dif_dit_pass pfetch_dist = 32
radix16_wrapper_square: pfetch_dist = 1024
Thread 2 "mlucas" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fb3c97200 (LWP 2237)]
0x000000010016ca4c in radix32_wrapper_square ()
(gdb) where
#0 0x000000010016ca4c in radix32_wrapper_square ()
#1 0x000000010004f7d8 in mers_process_chunk ()
#2 0x000000010022ac34 in worker_thr_routine ()
#3 0x0000007fb7f020a0 in start_thread (arg=0x10022a9d8 <worker_thr_routine>) at pthread_create.c:335
#4 0x0000007fb7e61edc in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:77
(gdb) quit
Last fiddled with by M344587487 on 2017-12-12 at 11:33 |
|
|
|
|
|
#139 |
|
"Composite as Heck"
Oct 2017
3B616 Posts |
pi3b scalar 4 thread stock (A53 @ 1.2 Ghz) Pi64 distro: https://github.com/bamarni/pi64
Code:
17.1
1024 msec/iter = 74.14 ROE[avg,max] = [0.262276786, 0.312500000] radices = 256 8 16 16 0 0 0 0 0 0
1152 msec/iter = 90.16 ROE[avg,max] = [0.206633650, 0.250000000] radices = 288 8 16 16 0 0 0 0 0 0
1280 msec/iter = 111.18 ROE[avg,max] = [0.222712054, 0.250000000] radices = 160 16 16 16 0 0 0 0 0 0
1408 msec/iter = 118.85 ROE[avg,max] = [0.228299386, 0.250000000] radices = 176 16 16 16 0 0 0 0 0 0
1536 msec/iter = 133.57 ROE[avg,max] = [0.234375000, 0.312500000] radices = 192 16 16 16 0 0 0 0 0 0
1664 msec/iter = 137.73 ROE[avg,max] = [0.229310826, 0.281250000] radices = 208 16 16 16 0 0 0 0 0 0
1792 msec/iter = 146.71 ROE[avg,max] = [0.221177455, 0.281250000] radices = 224 16 16 16 0 0 0 0 0 0
1920 msec/iter = 164.82 ROE[avg,max] = [0.258203125, 0.312500000] radices = 240 16 16 16 0 0 0 0 0 0
2048 msec/iter = 190.78 ROE[avg,max] = [0.216552734, 0.250000000] radices = 128 32 16 16 0 0 0 0 0 0
2304 msec/iter = 199.57 ROE[avg,max] = [0.254799107, 0.312500000] radices = 288 16 16 16 0 0 0 0 0 0
2560 msec/iter = 261.56 ROE[avg,max] = [0.302678571, 0.375000000] radices = 160 16 16 32 0 0 0 0 0 0
2816 msec/iter = 276.59 ROE[avg,max] = [0.265848214, 0.312500000] radices = 176 32 16 16 0 0 0 0 0 0
3072 msec/iter = 423.48 ROE[avg,max] = [0.260714286, 0.312500000] radices = 768 8 16 16 0 0 0 0 0 0
3328 msec/iter = 576.40 ROE[avg,max] = [0.316964286, 0.375000000] radices = 208 8 8 8 16 0 0 0 0 0
3584 msec/iter = 606.78 ROE[avg,max] = [0.227008929, 0.281250000] radices = 224 8 8 8 16 0 0 0 0 0
3840 msec/iter = 522.76 ROE[avg,max] = [0.227008929, 0.281250000] radices = 60 32 32 32 0 0 0 0 0 0
4096 msec/iter = 570.49 ROE[avg,max] = [0.260937500, 0.281250000] radices = 64 32 32 32 0 0 0 0 0 0
4608 msec/iter = 629.13 ROE[avg,max] = [0.226729911, 0.265625000] radices = 144 16 32 32 0 0 0 0 0 0
5120 msec/iter = 718.28 ROE[avg,max] = [0.248325893, 0.312500000] radices = 160 16 32 32 0 0 0 0 0 0
5632 msec/iter = 775.04 ROE[avg,max] = [0.300000000, 0.343750000] radices = 44 16 16 16 16 0 0 0 0 0
6144 msec/iter = 860.31 ROE[avg,max] = [0.238113839, 0.281250000] radices = 192 16 32 32 0 0 0 0 0 0
6656 msec/iter = 867.53 ROE[avg,max] = [0.303348214, 0.375000000] radices = 208 16 32 32 0 0 0 0 0 0
7168 msec/iter = 1302.08 ROE[avg,max] = [0.310044643, 0.375000000] radices = 224 32 32 16 0 0 0 0 0 0
7680 msec/iter = 1186.41 ROE[avg,max] = [0.232700893, 0.281250000] radices = 240 8 8 16 16 0 0 0 0 0
Last fiddled with by M344587487 on 2017-12-12 at 17:38 |
|
|
|
|
|
#140 |
|
∂2ω=0
Sep 2002
República de California
22·2,939 Posts |
@M344587487: Thanks for the data - from the general symptomology my suspicion is a bad SIMD compile. Can you get hold of a newer GCC version for your distro? Tom Womack hit a bad build using his default-installed 4.6, but was able to build the SIMD code fine using GCC 7.2. Trying GCC v7 is the first thing we should try, because debugging the nonreproducible crashes you describe from your various compile attempts sounds like a nightmare.
|
|
|
|
|
|
#141 |
|
"Composite as Heck"
Oct 2017
3B616 Posts |
I changed to the distro ET_ was using for his benchmarks and the simd tests completed. The power consumption bounced between 4.5W and 5.5W, but seemed much more stable around 5.0W than the scalar test was:
pi3b simd 4 thread stock (A53 @ 1.2 Ghz) gentoo 64 bit, gcc 6.4.0: https://github.com/sakaki-/gentoo-on-rpi3-64bit Code:
17.1
1024 msec/iter = 55.98 ROE[avg,max] = [0.254687500, 0.312500000] radices = 256 8 16 16 0 0 0 0 0 0
1152 msec/iter = 63.23 ROE[avg,max] = [0.223256138, 0.281250000] radices = 144 16 16 16 0 0 0 0 0 0
1280 msec/iter = 68.89 ROE[avg,max] = [0.264508929, 0.343750000] radices = 160 16 16 16 0 0 0 0 0 0
1408 msec/iter = 80.33 ROE[avg,max] = [0.227343750, 0.265625000] radices = 176 16 16 16 0 0 0 0 0 0
1536 msec/iter = 88.89 ROE[avg,max] = [0.254241071, 0.312500000] radices = 192 16 16 16 0 0 0 0 0 0
1664 msec/iter = 97.32 ROE[avg,max] = [0.270758929, 0.312500000] radices = 208 16 16 16 0 0 0 0 0 0
1792 msec/iter = 105.75 ROE[avg,max] = [0.220532663, 0.250000000] radices = 224 16 16 16 0 0 0 0 0 0
1920 msec/iter = 116.11 ROE[avg,max] = [0.257756696, 0.312500000] radices = 240 16 16 16 0 0 0 0 0 0
2048 msec/iter = 123.62 ROE[avg,max] = [0.236921038, 0.281250000] radices = 256 16 16 16 0 0 0 0 0 0
2304 msec/iter = 140.70 ROE[avg,max] = [0.248751395, 0.312500000] radices = 288 16 16 16 0 0 0 0 0 0
2560 msec/iter = 162.87 ROE[avg,max] = [0.236908831, 0.312500000] radices = 160 32 16 16 0 0 0 0 0 0
2816 msec/iter = 186.90 ROE[avg,max] = [0.262500000, 0.312500000] radices = 176 32 16 16 0 0 0 0 0 0
3072 msec/iter = 205.46 ROE[avg,max] = [0.262111119, 0.312500000] radices = 192 32 16 16 0 0 0 0 0 0
3328 msec/iter = 224.56 ROE[avg,max] = [0.281250000, 0.375000000] radices = 208 32 16 16 0 0 0 0 0 0
3584 msec/iter = 248.33 ROE[avg,max] = [0.252343750, 0.312500000] radices = 224 32 16 16 0 0 0 0 0 0
3840 msec/iter = 278.88 ROE[avg,max] = [0.248437500, 0.343750000] radices = 240 32 16 16 0 0 0 0 0 0
4096 msec/iter = 305.09 ROE[avg,max] = [0.229129464, 0.281250000] radices = 256 16 16 32 0 0 0 0 0 0
4608 msec/iter = 359.20 ROE[avg,max] = [0.258928571, 0.312500000] radices = 144 32 32 16 0 0 0 0 0 0
5120 msec/iter = 389.39 ROE[avg,max] = [0.237137277, 0.281250000] radices = 160 32 32 16 0 0 0 0 0 0
5632 msec/iter = 459.98 ROE[avg,max] = [0.256919643, 0.312500000] radices = 176 32 32 16 0 0 0 0 0 0
6144 msec/iter = 499.97 ROE[avg,max] = [0.246651786, 0.281250000] radices = 192 32 32 16 0 0 0 0 0 0
6656 msec/iter = 556.30 ROE[avg,max] = [0.262500000, 0.312500000] radices = 208 32 32 16 0 0 0 0 0 0
7168 msec/iter = 594.89 ROE[avg,max] = [0.224874442, 0.281250000] radices = 224 32 32 16 0 0 0 0 0 0
7680 msec/iter = 645.15 ROE[avg,max] = [0.237053571, 0.281250000] radices = 240 32 32 16 0 0 0 0 0 0
It's not all roses though as I tried to do a scalar self test on this distro, which aborted with a stack smash: Code:
NTHREADS = 4 M39397201: using FFT length 2048K = 2097152 8-byte floats. this gives an average 18.786049365997314 bits per digit Using complex FFT radices 1024 32 32 mers_mod_square: Init threadpool of 4 threads M39397201 Roundoff warning on iteration 60, maxerr = 0.500000000000 FATAL ERROR...Halting test of exponent 39397201 ***** Excessive level of roundoff error detected - this radix set will not be used. ***** NTHREADS = 4 M39397201: using FFT length 2048K = 2097152 8-byte floats. this gives an average 18.786049365997314 bits per digit Using complex FFT radices 256 16 16 16 mers_mod_square: Init threadpool of 4 threads *** stack smashing detected ***: ./mlucas terminated Aborted Am now attempting to upgrade gcc on gentoo, never used emerge/gentoo before so google to the rescue, maybe. Last fiddled with by M344587487 on 2017-12-12 at 22:47 |
|
|
|
|
|
#142 | ||
|
∂2ω=0
Sep 2002
República de California
22·2,939 Posts |
Quote:
No idea what's causing your stack-smash problems with scalar-double build under that distro, but overall the scalar build should not stress the processor more than the SIMD, unless it's a local-functional-unit-stress issue *and* the processor in question uses different parts of the silicon for, say scalar and SIMD arithmetic hardware. While that is the case on x86, with its weird mix of legacy 80-bit FP register arithmetic and IEEE64-compliant SIMD-double math, it would be very surprising to see on an architecture designed to be lean, mean and low-power as is the ARM, and for which both sclar and vector arithmetic are IEEE64-compliant. Interestingly, on the same-or-different-functional-units-for-scalar-and-simd-math theme, only a few hours before reading your gbd-trial-and-error post above, I had sent the following PM to fivemack, regarding the possible origins of the mere 1.5x SIMD speedup for ARMv8, compared to the 2.5-3x gain I get over scalar-double from using 128-bit SIMD (SSE2) on my Core2: Quote:
[1] Half as many instructions needed to process the total dataset, i.e. better use of icache; [2] Hand-rolled ASM better at using registers and FMA. But since the number of arithmetic functional units is no greater for SIMD than for scalar, that limits the gains to the instruction side only, and the maximum possible arithmetic throughput is the same for both build modes. Last fiddled with by ewmayer on 2017-12-13 at 01:00 |
||
|
|
|
|
|
#143 |
|
Banned
"Luigi"
Aug 2002
Team Italia
5·7·139 Posts |
How hard would it be to give Mlucas those PRP capabilities added to mprime in the last month? I am asking because PRP-C workunits are quite small (between 3M and 6M exponents) and they would be a wonderful task for small Berries.
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Economic prospects for solar photovoltaic power | cheesehead | Science & Technology | 137 | 2018-06-26 15:46 |
| Which SIMD flag to use for Raspberry Pi | BrainStone | Mlucas | 14 | 2017-11-19 00:59 |
| compiler/assembler optimizations possible? | ixfd64 | Software | 7 | 2011-02-25 20:05 |
| Running 32-bit builds on a Win7 system | ewmayer | Programming | 34 | 2010-10-18 22:36 |
| SIMD string->int | fivemack | Software | 7 | 2009-03-23 18:15 |