![]() |
|
|
#34 | |
|
Jan 2008
France
2×52×11 Posts |
Quote:
|
|
|
|
|
|
|
#35 |
|
Jan 2008
France
10001001102 Posts |
For that to succeed, you need this:
Code:
$ diff platform.h~ platform.h 714a715,728 > #elif defined(__AARCH64EL__) > #ifndef OS_BITS > #define OS_BITS 32 > #endif > #define CPU_TYPE > #define CPU_IS_ARM_EABI > #if(defined(__GNUC__) || defined(__GNUG__)) > #define COMPILER_TYPE > #define COMPILER_TYPE_GCC > #else > #define COMPILER_TYPE > #define COMPILER_TYPE_UNKNOWN > #endif > Code:
$ aarch64-none-linux-gnu-gcc -Os -DUSE_THREADS -c *.c $ aarch64-none-linux-gnu-gcc -Os -DUSE_THREADS *.o -o mlucas64 -lm -lpthread $ file mlucas64 mlucas64: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, for GNU/Linux 3.7.0, not stripped Last fiddled with by ldesnogu on 2017-03-13 at 20:38 |
|
|
|
|
|
#36 | ||
|
∂2ω=0
Sep 2002
República de California
103×113 Posts |
Quote:
Code:
#elif(defined(_AIX)) #define OS_TYPE #define OS_TYPE_AIX Quote:
You can quick-test the binary by trying some timing runs at a specific FFT length, say ./Mlucas -fftlen 1024 -nthread 1 will try all radix combos available @1024K and write the best-timing one to the mlucas.cfg file. You can also play with the threadcount - note the default there is to try to use all available cores. |
||
|
|
|
|
|
#37 | ||
|
Jan 2008
France
2×52×11 Posts |
Quote:
Quote:
Code:
/work/qemu/qemu/aarch64-linux-user/qemu-aarch64 -L /work/Cross/fsf-6.169/aarch64-none-linux-gnu/libc ./mlucas64 -fftlen 1024 -nthread 1 -iters 1
Mlucas 14.1
http://hogranch.com/mayer/README.html
INFO: testing qfloat routines...
CPU Family = ARM Embedded ABI, OS = Linux, 64-bit Version, compiled with Gnu C [or other compatible], Version 6.3.1 20170118.
INFO: Using inline-macro form of MUL_LOHI64.
INFO: MLUCAS_PATH is set to ""
INFO: using 53-bit-significand form of floating-double rounding constant for scalar-mode DNINT emulation.
INFO: testing IMUL routines...
INFO: System has 4 available processor cores.
INFO: testing FFT radix tables...
|
||
|
|
|
|
|
#38 |
|
∂2ω=0
Sep 2002
República de California
2D7716 Posts |
Thanks, Laurent - so I suspect a file-encoding issue, with Lorenzo's .h file downloaded from my post, or perhaps his unzip utility inserted a bunch of garbage chars.
|
|
|
|
|
|
#39 | |
|
Aug 2010
Republic of Belarus
2×89 Posts |
Quote:
It's working nice!) So withoit SIMD optimization it looks like: Code:
ubuntu@pine64:~/Solaris2/mlucas-14.1$ cat mlucas.cfg
14.1
1024 msec/iter = 114.57 ROE[avg,max] = [0.250000000, 0.250000000] radices = 32 32 16 32 0 0 0 0 0 0
1152 msec/iter = 109.04 ROE[avg,max] = [0.206808036, 0.250000000] radices = 288 8 16 16 0 0 0 0 0 0
1280 msec/iter = 133.03 ROE[avg,max] = [0.236600167, 0.281250000] radices = 160 16 16 16 0 0 0 0 0 0
1408 msec/iter = 140.47 ROE[avg,max] = [0.273688616, 0.343750000] radices = 176 16 16 16 0 0 0 0 0 0
1536 msec/iter = 161.30 ROE[avg,max] = [0.223493304, 0.281250000] radices = 192 16 16 16 0 0 0 0 0 0
1664 msec/iter = 166.09 ROE[avg,max] = [0.246149554, 0.312500000] radices = 208 16 16 16 0 0 0 0 0 0
1792 msec/iter = 180.60 ROE[avg,max] = [0.220703125, 0.281250000] radices = 224 16 16 16 0 0 0 0 0 0
1920 msec/iter = 198.81 ROE[avg,max] = [0.222460938, 0.250000000] radices = 240 16 16 16 0 0 0 0 0 0
2048 msec/iter = 206.38 ROE[avg,max] = [0.278125000, 0.281250000] radices = 256 16 16 16 0 0 0 0 0 0
2304 msec/iter = 242.52 ROE[avg,max] = [0.208269392, 0.250000000] radices = 288 16 16 16 0 0 0 0 0 0
2560 msec/iter = 308.94 ROE[avg,max] = [0.243164062, 0.281250000] radices = 160 16 16 32 0 0 0 0 0 0
2816 msec/iter = 329.54 ROE[avg,max] = [0.272896903, 0.343750000] radices = 176 16 16 32 0 0 0 0 0 0
3072 msec/iter = 371.71 ROE[avg,max] = [0.225892857, 0.281250000] radices = 192 16 16 32 0 0 0 0 0 0
3328 msec/iter = 388.66 ROE[avg,max] = [0.241322545, 0.281250000] radices = 208 16 16 32 0 0 0 0 0 0
3584 msec/iter = 414.33 ROE[avg,max] = [0.220870536, 0.250000000] radices = 224 16 16 32 0 0 0 0 0 0
3840 msec/iter = 453.97 ROE[avg,max] = [0.213636998, 0.265625000] radices = 240 16 16 32 0 0 0 0 0 0
4096 msec/iter = 472.52 ROE[avg,max] = [0.247321429, 0.250000000] radices = 256 16 16 32 0 0 0 0 0 0
4608 msec/iter = 544.08 ROE[avg,max] = [0.201870292, 0.222656250] radices = 288 16 16 32 0 0 0 0 0 0
5120 msec/iter = 673.79 ROE[avg,max] = [0.239508929, 0.312500000] radices = 160 16 32 32 0 0 0 0 0 0
5632 msec/iter = 693.38 ROE[avg,max] = [0.278264509, 0.343750000] radices = 176 16 32 32 0 0 0 0 0 0
6144 msec/iter = 776.30 ROE[avg,max] = [0.213504464, 0.250000000] radices = 192 16 32 32 0 0 0 0 0 0
6656 msec/iter = 814.97 ROE[avg,max] = [0.242299107, 0.281250000] radices = 208 16 32 32 0 0 0 0 0 0
7168 msec/iter = 870.94 ROE[avg,max] = [0.219768415, 0.312500000] radices = 224 16 32 32 0 0 0 0 0 0
7680 msec/iter = 955.79 ROE[avg,max] = [0.222209821, 0.250000000] radices = 240 16 32 32 0 0 0 0 0 0
|
|
|
|
|
|
|
#40 | |
|
∂2ω=0
Sep 2002
República de California
103×113 Posts |
Quote:
The only timing that really pops out is the anomalously low one @1152K ... but SIMD timings will be the ones of real interest. How many threads did you run your self-test with? (Your screen output will indicate that, e.g. NTHREADS = {some value >= 1}. |
|
|
|
|
|
|
#41 |
|
Aug 2010
Republic of Belarus
2×89 Posts |
Issue was in that file was unzipped not correctly by me. So in generally it's ok.
I ran ./mlucas -s m. So looks like Mlucas used 4 cores (threads ) correctly. I didn't play with threads yet. So in generally very slow
Last fiddled with by Lorenzo on 2017-03-14 at 08:19 |
|
|
|
|
|
#42 |
|
∂2ω=0
Sep 2002
República de California
265678 Posts |
Yes - even with a 2-3x speedup from use of SIMD, the ARM will be more about performance per watt (and per hardware $) than speed-per-core.
|
|
|
|
|
|
#43 | |
|
Banned
"Luigi"
Aug 2002
Team Italia
32×5×107 Posts |
Quote:
Code:
2048 sec/iter = 0.134 ROE[min,max] = [0.281250000, 0.343750000] radices = 32 32 32 32 0 0 0 0 0 0 [Any text offset from the list-ending 0 by whitespace is ignored] 2304 sec/iter = 0.148 ROE[min,max] = [0.242187500, 0.281250000] radices = 36 8 16 16 16 0 0 0 0 0 2560 sec/iter = 0.166 ROE[min,max] = [0.281250000, 0.312500000] radices = 40 8 16 16 16 0 0 0 0 0 2816 sec/iter = 0.188 ROE[min,max] = [0.328125000, 0.343750000] radices = 44 8 16 16 16 0 0 0 0 0 3072 sec/iter = 0.222 ROE[min,max] = [0.250000000, 0.250000000] radices = 24 16 16 16 16 0 0 0 0 0 3584 sec/iter = 0.264 ROE[min,max] = [0.281250000, 0.281250000] radices = 28 16 16 16 16 0 0 0 0 0 4096 sec/iter = 0.300 ROE[min,max] = [0.250000000, 0.312500000] radices = 16 16 16 16 32 0 0 0 0 0 Code:
2048 msec/iter = 206.38 ROE[avg,max] = [0.278125000, 0.281250000] radices = 256 16 16 16 0 0 0 0 0 0
2304 msec/iter = 242.52 ROE[avg,max] = [0.208269392, 0.250000000] radices = 288 16 16 16 0 0 0 0 0 0
2560 msec/iter = 308.94 ROE[avg,max] = [0.243164062, 0.281250000] radices = 160 16 16 32 0 0 0 0 0 0
2816 msec/iter = 329.54 ROE[avg,max] = [0.272896903, 0.343750000] radices = 176 16 16 32 0 0 0 0 0 0
3072 msec/iter = 371.71 ROE[avg,max] = [0.225892857, 0.281250000] radices = 192 16 16 32 0 0 0 0 0 0
3328 msec/iter = 388.66 ROE[avg,max] = [0.241322545, 0.281250000] radices = 208 16 16 32 0 0 0 0 0 0
3584 msec/iter = 414.33 ROE[avg,max] = [0.220870536, 0.250000000] radices = 224 16 16 32 0 0 0 0 0 0
3840 msec/iter = 453.97 ROE[avg,max] = [0.213636998, 0.265625000] radices = 240 16 16 32 0 0 0 0 0 0
4096 msec/iter = 472.52 ROE[avg,max] = [0.247321429, 0.250000000] radices = 256 16 16 32 0 0 0 0 0 0
With a 3x SIMD speedup its efficiency would be 0.5x on a per-core comparison, and 1:1 on a per-core-and-GHz comparison with the Opteron. That's to say, a 20 ARM cores minicluster would be 20x faster on a per GHz measurement and 10x faster on a per-core measurement. And also as cheap as the single Opteron system. Not to speak about the energy saving... Last fiddled with by ET_ on 2017-03-14 at 10:20 |
|
|
|
|
|
|
#44 |
|
"Victor de Hollander"
Aug 2011
the Netherlands
100100110002 Posts |
You got it working, nice!
That is a Pine64 with 4x ARM Cortex A53 cores (@1.4GHz) right? I'm a little bit surprised it is about as fast as my Odroid-U2 (4x ARM Cortex A9 cores @1.7Ghz) which is only 32bit and an much older architecture. http://mersenneforum.org/showpost.ph...5&postcount=94 Code:
1024 msec/iter = 121.70 ROE[avg,max] = [0.298214286, 0.312500000] radices = 128 16 16 16 0 0 0 0 0 0
1152 msec/iter = 142.69 ROE[avg,max] = [0.225310407, 0.250000000] radices = 144 16 16 16 0 0 0 0 0 0
1280 msec/iter = 161.44 ROE[avg,max] = [0.251618304, 0.312500000] radices = 160 16 16 16 0 0 0 0 0 0
1408 msec/iter = 185.52 ROE[avg,max] = [0.297056362, 0.375000000] radices = 176 16 16 16 0 0 0 0 0 0
1536 msec/iter = 195.56 ROE[avg,max] = [0.234742955, 0.312500000] radices = 192 16 16 16 0 0 0 0 0 0
1664 msec/iter = 208.36 ROE[avg,max] = [0.254631696, 0.312500000] radices = 208 16 16 16 0 0 0 0 0 0
1792 msec/iter = 222.32 ROE[avg,max] = [0.234012277, 0.250000000] radices = 224 16 16 16 0 0 0 0 0 0
1920 msec/iter = 243.65 ROE[avg,max] = [0.235016741, 0.281250000] radices = 240 16 16 16 0 0 0 0 0 0
2048 msec/iter = 255.25 ROE[avg,max] = [0.310714286, 0.312500000] radices = 256 16 16 16 0 0 0 0 0 0
2304 msec/iter = 297.26 ROE[avg,max] = [0.228341239, 0.281250000] radices = 288 16 16 16 0 0 0 0 0 0
2560 msec/iter = 339.70 ROE[avg,max] = [0.256682478, 0.312500000] radices = 160 16 16 32 0 0 0 0 0 0
2816 msec/iter = 384.56 ROE[avg,max] = [0.296219308, 0.375000000] radices = 176 16 16 32 0 0 0 0 0 0
3072 msec/iter = 413.85 ROE[avg,max] = [0.239704241, 0.281250000] radices = 192 16 16 32 0 0 0 0 0 0
3584 msec/iter = 370.28 ROE[avg,max] = [0.231487165, 0.281250000] radices = 224 16 16 32 0 0 0 0 0 0
4096 msec/iter = 455.10 ROE[avg,max] = [0.282142857, 0.312500000] radices = 128 16 32 32 0 0 0 0 0 0
So I dusted off the machine and also ran Mlucas: Intel Core2Duo E7400 @2.8GHz NTHREADS = 1 Code:
14.1
1024 msec/iter = 33.76 ROE[avg,max] = [0.264564732, 0.265625000] radices = 32 32 16 32 0 0 0 0 0 0
1152 msec/iter = 40.30 ROE[avg,max] = [0.237220982, 0.273437500] radices = 36 16 32 32 0 0 0 0 0 0
1280 msec/iter = 45.42 ROE[avg,max] = [0.251841518, 0.296875000] radices = 40 16 32 32 0 0 0 0 0 0
1408 msec/iter = 52.31 ROE[avg,max] = [0.285110910, 0.375000000] radices = 44 16 32 32 0 0 0 0 0 0
1536 msec/iter = 53.31 ROE[avg,max] = [0.239299665, 0.281250000] radices = 24 32 32 32 0 0 0 0 0 0
1664 msec/iter = 61.81 ROE[avg,max] = [0.261802455, 0.312500000] radices = 52 16 32 32 0 0 0 0 0 0
1792 msec/iter = 65.81 ROE[avg,max] = [0.267229353, 0.312500000] radices = 28 32 32 32 0 0 0 0 0 0
1920 msec/iter = 70.98 ROE[avg,max] = [0.243638393, 0.281250000] radices = 60 16 32 32 0 0 0 0 0 0
2048 msec/iter = 71.88 ROE[avg,max] = [0.257366071, 0.257812500] radices = 32 32 32 32 0 0 0 0 0 0
2304 msec/iter = 81.60 ROE[avg,max] = [0.236948940, 0.281250000] radices = 36 32 32 32 0 0 0 0 0 0
2560 msec/iter = 90.96 ROE[avg,max] = [0.255691964, 0.312500000] radices = 40 32 32 32 0 0 0 0 0 0
2816 msec/iter = 102.69 ROE[avg,max] = [0.283956473, 0.343750000] radices = 44 32 32 32 0 0 0 0 0 0
3072 msec/iter = 112.85 ROE[avg,max] = [0.233879743, 0.265625000] radices = 48 32 32 32 0 0 0 0 0 0
3328 msec/iter = 123.71 ROE[avg,max] = [0.267947824, 0.312500000] radices = 52 32 32 32 0 0 0 0 0 0
3584 msec/iter = 135.08 ROE[avg,max] = [0.267689732, 0.301757812] radices = 56 32 32 32 0 0 0 0 0 0
3840 msec/iter = 144.52 ROE[avg,max] = [0.242107282, 0.281250000] radices = 60 32 32 32 0 0 0 0 0 0
4096 msec/iter = 154.69 ROE[avg,max] = [0.263169643, 0.281250000] radices = 64 32 32 32 0 0 0 0 0 0
4608 msec/iter = 177.26 ROE[avg,max] = [0.236798968, 0.281250000] radices = 36 16 16 16 16 0 0 0 0 0
5120 msec/iter = 201.17 ROE[avg,max] = [0.257240513, 0.312500000] radices = 40 16 16 16 16 0 0 0 0 0
5632 msec/iter = 224.76 ROE[avg,max] = [0.291057478, 0.375000000] radices = 44 16 16 16 16 0 0 0 0 0
6144 msec/iter = 244.47 ROE[avg,max] = [0.233741978, 0.265625000] radices = 48 16 16 16 16 0 0 0 0 0
6656 msec/iter = 271.08 ROE[avg,max] = [0.264965820, 0.312500000] radices = 52 16 16 16 16 0 0 0 0 0
7168 msec/iter = 292.72 ROE[avg,max] = [0.274094936, 0.312500000] radices = 56 16 16 16 16 0 0 0 0 0
7680 msec/iter = 312.74 ROE[avg,max] = [0.249065290, 0.290039062] radices = 60 16 16 16 16 0 0 0 0 0
Code:
14.1
1024 msec/iter = 21.01 ROE[avg,max] = [0.273214286, 0.281250000] radices = 32 16 32 32 0 0 0 0 0 0
1152 msec/iter = 25.43 ROE[avg,max] = [0.237220982, 0.273437500] radices = 36 16 32 32 0 0 0 0 0 0
1280 msec/iter = 28.85 ROE[avg,max] = [0.259319196, 0.312500000] radices = 20 32 32 32 0 0 0 0 0 0
1408 msec/iter = 35.14 ROE[avg,max] = [0.280566406, 0.343750000] radices = 176 16 16 16 0 0 0 0 0 0
1536 msec/iter = 33.98 ROE[avg,max] = [0.239299665, 0.281250000] radices = 24 32 32 32 0 0 0 0 0 0
1664 msec/iter = 38.98 ROE[avg,max] = [0.261802455, 0.312500000] radices = 52 16 32 32 0 0 0 0 0 0
1792 msec/iter = 40.84 ROE[avg,max] = [0.267229353, 0.312500000] radices = 28 32 32 32 0 0 0 0 0 0
1920 msec/iter = 45.63 ROE[avg,max] = [0.243638393, 0.281250000] radices = 60 16 32 32 0 0 0 0 0 0
2048 msec/iter = 45.92 ROE[avg,max] = [0.257366071, 0.257812500] radices = 32 32 32 32 0 0 0 0 0 0
2304 msec/iter = 54.36 ROE[avg,max] = [0.236948940, 0.281250000] radices = 36 32 32 32 0 0 0 0 0 0
2560 msec/iter = 54.64 ROE[avg,max] = [0.255691964, 0.312500000] radices = 40 32 32 32 0 0 0 0 0 0
2816 msec/iter = 63.06 ROE[avg,max] = [0.283956473, 0.343750000] radices = 44 32 32 32 0 0 0 0 0 0
3072 msec/iter = 67.77 ROE[avg,max] = [0.233879743, 0.265625000] radices = 48 32 32 32 0 0 0 0 0 0
3328 msec/iter = 74.36 ROE[avg,max] = [0.267947824, 0.312500000] radices = 52 32 32 32 0 0 0 0 0 0
3584 msec/iter = 79.71 ROE[avg,max] = [0.267689732, 0.301757812] radices = 56 32 32 32 0 0 0 0 0 0
3840 msec/iter = 87.04 ROE[avg,max] = [0.242107282, 0.281250000] radices = 60 32 32 32 0 0 0 0 0 0
4096 msec/iter = 92.87 ROE[avg,max] = [0.263169643, 0.281250000] radices = 64 32 32 32 0 0 0 0 0 0
4608 msec/iter = 106.31 ROE[avg,max] = [0.238187081, 0.281250000] radices = 288 16 16 32 0 0 0 0 0 0
5120 msec/iter = 116.95 ROE[avg,max] = [0.241458566, 0.312500000] radices = 160 16 32 32 0 0 0 0 0 0
5632 msec/iter = 147.80 ROE[avg,max] = [0.278641183, 0.312500000] radices = 176 16 32 32 0 0 0 0 0 0
6144 msec/iter = 150.32 ROE[avg,max] = [0.247349330, 0.281250000] radices = 192 16 32 32 0 0 0 0 0 0
6656 msec/iter = 164.51 ROE[avg,max] = [0.250781250, 0.289062500] radices = 208 16 32 32 0 0 0 0 0 0
7168 msec/iter = 172.77 ROE[avg,max] = [0.277169364, 0.343750000] radices = 224 16 32 32 0 0 0 0 0 0
7680 msec/iter = 191.50 ROE[avg,max] = [0.253627232, 0.281250000] radices = 240 16 32 32 0 0 0 0 0 0
Code:
[Tue Mar 14 19:28:48 2017] Compare your results to other computers at http://www.mersenne.org/report_benchmarks Intel(R) Core(TM)2 Duo CPU E7400 @ 2.80GHz CPU speed: 2800.02 MHz, 2 cores CPU features: Prefetch, SSE, SSE2, SSE4 L1 cache size: 32 KB L2 cache size: 3 MB L1 cache line size: 64 bytes L2 cache line size: 64 bytes TLBS: 256 Prime95 64-bit version 28.7, RdtscTiming=1 Best time for 1024K FFT length: 16.199 ms., avg: 16.704 ms. Best time for 1280K FFT length: 20.961 ms., avg: 21.575 ms. Best time for 1536K FFT length: 26.163 ms., avg: 27.718 ms. Best time for 1792K FFT length: 30.755 ms., avg: 32.141 ms. Best time for 2048K FFT length: 34.946 ms., avg: 38.731 ms. Best time for 2560K FFT length: 43.191 ms., avg: 46.909 ms. Best time for 3072K FFT length: 53.965 ms., avg: 59.120 ms. Best time for 3584K FFT length: 69.864 ms., avg: 83.959 ms. Best time for 4096K FFT length: 71.973 ms., avg: 72.495 ms. Best time for 5120K FFT length: 87.800 ms., avg: 88.870 ms. Best time for 6144K FFT length: 110.473 ms., avg: 111.362 ms. Best time for 7168K FFT length: 131.831 ms., avg: 132.743 ms. Best time for 8192K FFT length: 146.812 ms., avg: 147.631 ms. Timing FFTs using 2 threads. Best time for 1024K FFT length: 15.401 ms., avg: 15.644 ms. Best time for 1280K FFT length: 18.143 ms., avg: 19.026 ms. Best time for 1536K FFT length: 21.927 ms., avg: 22.995 ms. Best time for 1792K FFT length: 26.605 ms., avg: 27.481 ms. Best time for 2048K FFT length: 30.460 ms., avg: 31.351 ms. Best time for 2560K FFT length: 38.699 ms., avg: 39.689 ms. Best time for 3072K FFT length: 47.988 ms., avg: 49.353 ms. Best time for 3584K FFT length: 85.181 ms., avg: 85.865 ms. Best time for 4096K FFT length: 62.209 ms., avg: 66.705 ms. Best time for 5120K FFT length: 79.554 ms., avg: 80.260 ms. Best time for 6144K FFT length: 92.489 ms., avg: 94.000 ms. Best time for 7168K FFT length: 116.309 ms., avg: 119.709 ms. Best time for 8192K FFT length: 125.236 ms., avg: 128.261 ms. Timings for 1024K FFT length (1 cpu, 1 worker): 16.37 ms. Throughput: 61.08 iter/sec. Timings for 1024K FFT length (2 cpus, 2 workers): 30.59, 31.69 ms. Throughput: 64.25 iter/sec. Timings for 1280K FFT length (1 cpu, 1 worker): 21.24 ms. Throughput: 47.07 iter/sec. Timings for 1280K FFT length (2 cpus, 2 workers): 37.86, 39.14 ms. Throughput: 51.96 iter/sec. Timings for 1536K FFT length (1 cpu, 1 worker): 26.08 ms. Throughput: 38.34 iter/sec. Timings for 1536K FFT length (2 cpus, 2 workers): 45.43, 47.68 ms. Throughput: 42.99 iter/sec. Timings for 1792K FFT length (1 cpu, 1 worker): 31.05 ms. Throughput: 32.21 iter/sec. Timings for 1792K FFT length (2 cpus, 2 workers): 52.50, 53.32 ms. Throughput: 37.81 iter/sec. Timings for 2048K FFT length (1 cpu, 1 worker): 35.05 ms. Throughput: 28.53 iter/sec. Timings for 2048K FFT length (2 cpus, 2 workers): 61.40, 63.17 ms. Throughput: 32.12 iter/sec. Timings for 2560K FFT length (1 cpu, 1 worker): 43.36 ms. Throughput: 23.06 iter/sec. Timings for 2560K FFT length (2 cpus, 2 workers): 77.50, 79.16 ms. Throughput: 25.54 iter/sec. Timings for 3072K FFT length (1 cpu, 1 worker): 53.71 ms. Throughput: 18.62 iter/sec. Timings for 3072K FFT length (2 cpus, 2 workers): 96.11, 97.25 ms. Throughput: 20.69 iter/sec. Timings for 3584K FFT length (1 cpu, 1 worker): 67.86 ms. Throughput: 14.74 iter/sec. Timings for 3584K FFT length (2 cpus, 2 workers): 164.50, 169.02 ms. Throughput: 12.00 iter/sec. Timings for 4096K FFT length (1 cpu, 1 worker): 71.87 ms. Throughput: 13.91 iter/sec. [Tue Mar 14 19:33:59 2017] Timings for 4096K FFT length (2 cpus, 2 workers): 127.57, 128.14 ms. Throughput: 15.64 iter/sec. Timings for 5120K FFT length (1 cpu, 1 worker): 87.87 ms. Throughput: 11.38 iter/sec. Timings for 5120K FFT length (2 cpus, 2 workers): 153.62, 158.10 ms. Throughput: 12.83 iter/sec. Timings for 6144K FFT length (1 cpu, 1 worker): 110.52 ms. Throughput: 9.05 iter/sec. Timings for 6144K FFT length (2 cpus, 2 workers): 187.40, 186.73 ms. Throughput: 10.69 iter/sec. Timings for 7168K FFT length (1 cpu, 1 worker): 132.18 ms. Throughput: 7.57 iter/sec. Timings for 7168K FFT length (2 cpus, 2 workers): 236.89, 243.20 ms. Throughput: 8.33 iter/sec. Timings for 8192K FFT length (1 cpu, 1 worker): 151.83 ms. Throughput: 6.59 iter/sec. Timings for 8192K FFT length (2 cpus, 2 workers): 263.17, 260.16 ms. Throughput: 7.64 iter/sec. Last fiddled with by VictordeHolland on 2017-03-14 at 18:50 Reason: Mlucas on Windows???? |
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Economic prospects for solar photovoltaic power | cheesehead | Science & Technology | 137 | 2018-06-26 15:46 |
| Which SIMD flag to use for Raspberry Pi | BrainStone | Mlucas | 14 | 2017-11-19 00:59 |
| compiler/assembler optimizations possible? | ixfd64 | Software | 7 | 2011-02-25 20:05 |
| Running 32-bit builds on a Win7 system | ewmayer | Programming | 34 | 2010-10-18 22:36 |
| SIMD string->int | fivemack | Software | 7 | 2009-03-23 18:15 |