![]() |
|
|
#23 | |
|
∂2ω=0
Sep 2002
República de California
103·113 Posts |
Quote:
|
|
|
|
|
|
|
#24 |
|
∂2ω=0
Sep 2002
República de California
103×113 Posts |
V14.1 is available - details via the readme-file link in the opening post.
|
|
|
|
|
|
#25 |
|
Romulan Interpreter
Jun 2011
Thailand
7×1,373 Posts |
How does the newer version compares with P95? I mean, I have read your "less than two times slower" stuff there, but I assume that is a figure of speech...
(hey, I am the guy who DC-ed Mike's work, remember? )
|
|
|
|
|
|
#26 | |
|
∂2ω=0
Sep 2002
República de California
103·113 Posts |
Quote:
Code:
FFT(K) msec/iter (4-threaded) ---- --------- 1024 2.65 1152 3.15 1280 3.43 1408 4.01 1536 4.19 1664 4.61 1792 4.81 1920 5.29 2048 5.35 2304 6.07 2560 6.51 2816 7.54 3072 8.40 3328 8.74 3584 9.13 3840 10.16 4096 10.54 4608 11.98 5120 13.80 5632 15.92 6144 17.54 6656 18.62 7168 19.69 7680 22.00 |
|
|
|
|
|
|
#27 |
|
Jan 2008
France
2×52×11 Posts |
For comparison, http://mersenneforum.org/showpost.ph...&postcount=633
i5-4670K @ 3.8 GHz, Dual DDR3 1600 Code:
Best time for 1024K FFT length: 1.336 ms., avg: 1.374 ms. Best time for 1280K FFT length: 1.839 ms., avg: 1.865 ms. Best time for 1536K FFT length: 2.333 ms., avg: 2.370 ms. Best time for 1792K FFT length: 2.833 ms., avg: 3.277 ms. Best time for 2048K FFT length: 3.350 ms., avg: 3.374 ms. Best time for 2560K FFT length: 4.239 ms., avg: 4.276 ms. Best time for 3072K FFT length: 5.124 ms., avg: 5.155 ms. Best time for 3584K FFT length: 6.006 ms., avg: 6.042 ms. Best time for 4096K FFT length: 6.970 ms., avg: 7.000 ms. Best time for 5120K FFT length: 8.705 ms., avg: 8.745 ms. Best time for 6144K FFT length: 10.496 ms., avg: 10.543 ms. Best time for 7168K FFT length: 12.371 ms., avg: 12.451 ms. Best time for 8192K FFT length: 14.673 ms., avg: 14.735 ms. |
|
|
|
|
|
#28 | |
|
∂2ω=0
Sep 2002
República de California
2D7716 Posts |
Quote:
BTW, if anyone has access to a Broadwell system running Linux (or MingGW64 under Windoze), I'd very much appreciate tmings on such, and have some special preprocessor-flags-to-try-for-Broadwell, as well. |
|
|
|
|
|
|
#29 | ||
|
Jan 2008
France
2×52×11 Posts |
Quote:
Quote:
|
||
|
|
|
|
|
#30 |
|
Jan 2008
France
2·52·11 Posts |
I gave Mlucas a try on my i7-4770K.
Code:
gcc -c -Os -m64 -DUSE_AVX2 -DUSE_THREADS *.c rm -f rng*.o util.o qfloat.o gcc -c -O1 -m64 -DUSE_AVX2 -DUSE_THREADS rng*.c util.c qfloat.c gcc -o Mlucas *.o -lm -lpthread -lrt ./Mlucas -fftlen 192 -iters 100 -radset 0 -nthread 2 ... 100 iterations of M3888517 with FFT length 196608 = 192 K Res64: 579D593FCE0707B2. AvgMaxErr = 0.274916295. MaxErr = 0.343750000. Program: E14.1 Res mod 2^36 = 67881076658 Res mod 2^35 - 1 = 21674900403 Res mod 2^36 - 1 = 42893438228 Code:
This particular testcase should produce the following 100-iteration residues, with some platform-dependent variability in the roundoff errors : 100 iterations of M3888509 with FFT length 196608 = 192 K Res64: 71E61322CCFB396C. AvgMaxErr = 0.226967076. MaxErr = 0.281250000. Program: E3.0x Res mod 2^36 = 12028950892 Res mod 2^35 - 1 = 29259839105 Res mod 2^36 - 1 = 50741070790 How do you get an output similar to Prime95 benchmark? |
|
|
|
|
|
#31 | ||
|
∂2ω=0
Sep 2002
República de California
103·113 Posts |
Quote:
./Mlucas -m 3888509 -fftlen 192 -iters 100 -radset 0 -nthread 2 you will see the result indicated on the webpage (which I have since corrected). Thanks for the catch. Quote:
./Mlucas -s m -iters 1000 1000 iters gives cleaner timings (and better roundoff testing) than the "quick look" 100-iter tests. With no #threads specified the code will use all the physical cores on your system. The README page discusses all this stuff. |
||
|
|
|
|
|
#32 | ||
|
∂2ω=0
Sep 2002
República de California
103·113 Posts |
Quote:
Quote:
Here 4-threaded results for my Haswell system: [Worker #1 Dec 19 16:21] Timing FFTs using 4 threads. [Worker #1 Dec 19 16:21] Timing 39 iterations of 1024K FFT length. Best time: 1.293 ms., avg time: 1.344 ms. [Worker #1 Dec 19 16:21] Timing 31 iterations of 1280K FFT length. Best time: 1.825 ms., avg time: 1.850 ms. [Worker #1 Dec 19 16:21] Timing 26 iterations of 1536K FFT length. Best time: 1.993 ms., avg time: 2.305 ms. [Worker #1 Dec 19 16:21] Timing 25 iterations of 1792K FFT length. Best time: 2.317 ms., avg time: 2.356 ms. [Worker #1 Dec 19 16:21] Timing 25 iterations of 2048K FFT length. Best time: 2.766 ms., avg time: 2.785 ms. [Worker #1 Dec 19 16:21] Timing 25 iterations of 2560K FFT length. Best time: 3.462 ms., avg time: 3.500 ms. [Worker #1 Dec 19 16:21] Timing 25 iterations of 3072K FFT length. Best time: 4.141 ms., avg time: 4.190 ms. [Worker #1 Dec 19 16:21] Timing 25 iterations of 3584K FFT length. Best time: 4.957 ms., avg time: 5.009 ms. [Worker #1 Dec 19 16:21] Timing 25 iterations of 4096K FFT length. Best time: 5.639 ms., avg time: 5.722 ms. [Worker #1 Dec 19 16:21] Timing 25 iterations of 5120K FFT length. Best time: 7.151 ms., avg time: 7.202 ms. [Worker #1 Dec 19 16:21] Timing 25 iterations of 6144K FFT length. Best time: 8.471 ms., avg time: 8.639 ms. [Worker #1 Dec 19 16:21] Timing 25 iterations of 7168K FFT length. Best time: 10.197 ms., avg time: 10.272 ms. [Worker #1 Dec 19 16:21] Timing 25 iterations of 8192K FFT length. Best time: 11.917 ms., avg time: 11.952 ms. Now assembling the average times for 4-threaded Prime95 and Mlucas (update of previous table, now using 10000-iter timings run after a reboot, right after which I ran the above Prime95 timing test) at the above FFT lengths (plus the intermediate radix-9/11/13/15-based ones supported by Mlucas) and supplementing with the resulting [Mlucas/Prime95] timing ratio (for cases where the FFT length in question is not supported by Prime95, use its timing at the next-higher length as the denominator): Code:
FFTlen Prime95 Mlucas Timing Ratio (Kdbl) msec/iter msec/iter [Mlucas/P95] ------ --------- --------- ------------ 1024 1.344 2.60 1.93 1152 3.13 1.69 1280 1.850 3.56 1.92 1408 3.98 1.73 1536 2.305 4.02 1.74 1664 4.63 1.97 1792 2.356 4.70 1.99 1920 5.29 1.90 2048 2.785 5.29 1.90 2304 6.00 1.71 2560 3.500 6.44 1.84 2816 7.47 1.78 3072 4.190 8.25 1.97 3328 8.84 1.76 3584 5.009 9.02 1.80 3840 10.06 1.76 4096 5.722 10.46 1.83 4608 11.78 1.64 5120 7.202 13.47 1.87 5632 15.52 1.80 6144 8.639 17.40 2.01 6656 18.48 1.80 7168 10.272 19.02 1.85 7680 21.49 1.80 8192 11.952 22.33 1.87 |
||
|
|
|
|
|
#33 |
|
∂2ω=0
Sep 2002
República de California
1163910 Posts |
Here is the head-to-head comparison on my new Xyzzy-built Broadwell (i3) NUC, both programs run 4-threaded on the 2 physical cores of the system (that setup gives best per-iteration timing for both on this system) - these timings and ratios can be compared to the Haswell ones in the above post:
Code:
FFTlen Prime95 Mlucas Timing Ratio
(Kdbl) msec/iter msec/iter [Mlucas/P95] Comments
------ --------- --------- ------------ ------------
1024 3.894 6.869 1.76
1152 4.634 8.294 1.79
1280 4.990 8.702 1.74
1408 5.502 10.118 1.84 [Prime95 1440K]
1536 6.203 10.298 1.66
1664 6.506 11.562 1.78 [Prime95: average of the 1600K and 1728K timings]
1792 7.473 11.904 1.59
1920 7.843 13.186 1.68
2048 7.898 13.946 1.77
2304 8.889 15.846 1.78
2560 9.930 17.281 1.74
2816 11.369 19.931 1.75 [Prime95 2880K]
3072 12.465 22.373 1.79
3328 13.688 23.541 1.72 [Prime95 3360K]
3584 14.567 25.318 1.74
3840 16.079 27.987 1.74
4096 16.917 29.488 1.74
4608 19.762 34.077 1.72
5120 21.736 37.573 1.73
5632 25.657 43.197 1.68 [Prime95 5760K]
6144 26.867 50.179 1.87
6656 30.958 51.091 1.65 [Prime95 6720K]
7168 32.399 54.929 1.70
7680 34.025 60.411 1.78
8192 34.791 65.911 1.89
Avg: 1.75
Last fiddled with by ewmayer on 2015-05-22 at 06:41 |
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Mlucas v18 available | ewmayer | Mlucas | 48 | 2019-11-28 02:53 |
| Mlucas on ubuntu | Damian | Mlucas | 17 | 2017-11-13 18:12 |
| Mlucas version 17 | ewmayer | Mlucas | 3 | 2017-06-17 11:18 |
| MLucas on IBM Mainframe | Lorenzo | Mlucas | 52 | 2016-03-13 08:45 |
| mlucas on sun | delta_t | Mlucas | 14 | 2007-10-04 05:45 |