![]() |
![]() |
#1 |
P90 years forever!
Aug 2002
Yeehaw, FL
11111110111112 Posts |
![]()
The first beta of version 24.12 can be downloaded from:
Windows: ftp://mersenne.org/gimps/p95v2412.zip Linux: ftp://mersenne.org/gimps/mprime2412.zip.tar.gz or Static Linux: ftp://mersenne.org/gimps/sprime2412.zip.tar.gz This version is faster for all SSE2 machines (2 - 10%). SSE2 code supports FFT sizes up to 32 million, but don't use them yet. They need some more QA (see next post). Has a workaround for Error 3 problem. Should be harder for computers to spontaneously rename themselves. |
![]() |
![]() |
![]() |
#2 |
P90 years forever!
Aug 2002
Yeehaw, FL
1FDF16 Posts |
![]()
I'm down to just one idea left to speed up the code. After working on that for a while, I will need volunteers to time prime95 on various L2 cache sizes and I will need volunteers to QA the large FFT sizes.
I especially need an AMD SSE2 machine with 256K L2 cache. I'll also need a 1MB L2 cache P4 and 512K L2 cache AMD64 machine, but these are very common. I need an AMD64 and P4 machine with 2GB of memory to QA the large FFTs. If you can help, please revisit this section of the forums over the next week or two. I'll announce a test version to download for benchmarking and QAing. |
![]() |
![]() |
![]() |
#3 |
Aug 2002
North San Diego County
14448 Posts |
![]()
Spiffy!
Code:
Intel(R) Pentium(R) 4 CPU 2.40GHz CPU speed: 2392.49 MHz CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE, SSE2 L1 cache size: 8 KB L2 cache size: 512 KB L1 cache line size: 64 bytes L2 cache line size: 128 bytes TLBS: 64 Prime95 version 23.8, RdtscTiming=1 Best time for 512K FFT length: 23.986 ms. Best time for 640K FFT length: 29.059 ms. Best time for 768K FFT length: 35.340 ms. Best time for 896K FFT length: 41.411 ms. Best time for 1024K FFT length: 46.952 ms. Best time for 1280K FFT length: 61.214 ms. Best time for 1536K FFT length: 73.665 ms. Best time for 1792K FFT length: 88.070 ms. Best time for 2048K FFT length: 101.813 ms. [Wed Jun 08 18:07:27 2005] Compare your results to other computers at http://www.mersenne.org/bench.htm That web page also contains instructions on how your results can be included. Intel(R) Pentium(R) 4 CPU 2.40GHz CPU speed: 2392.31 MHz CPU features: RDTSC, CMOV, Prefetch, MMX, SSE, SSE2 L1 cache size: 8 KB L2 cache size: 512 KB L1 cache line size: 64 bytes L2 cache line size: 128 bytes TLBS: 64 Prime95 32-bit version 24.12, RdtscTiming=1 Best time for 512K FFT length: 21.825 ms. Best time for 640K FFT length: 28.142 ms. Best time for 768K FFT length: 34.530 ms. Best time for 896K FFT length: 39.891 ms. Best time for 1024K FFT length: 45.745 ms. Best time for 1280K FFT length: 58.705 ms. Best time for 1536K FFT length: 72.312 ms. Best time for 1792K FFT length: 83.354 ms. Best time for 2048K FFT length: 95.058 ms. Best time for 2560K FFT length: 123.420 ms. Best time for 3072K FFT length: 157.165 ms. Best time for 3584K FFT length: 182.897 ms. Best time for 4096K FFT length: 212.733 ms. Best time for 58 bit trial factors: 12.394 ms. Best time for 59 bit trial factors: 12.347 ms. Best time for 60 bit trial factors: 12.269 ms. Best time for 61 bit trial factors: 12.373 ms. Best time for 62 bit trial factors: 13.325 ms. Best time for 63 bit trial factors: 13.277 ms. Best time for 64 bit trial factors: 16.332 ms. Best time for 65 bit trial factors: 16.332 ms. Best time for 66 bit trial factors: 16.229 ms. Best time for 67 bit trial factors: 16.231 ms. |
![]() |
![]() |
![]() |
#4 |
Jun 2003
15310 Posts |
![]()
George, I wanted to make sure that you were aware of an interesting article by Xbit Labs on the Pentium IV Replay feature. It might explain the weirdness experienced by PhilF and delta_t that you commented on in the Early Beta of Version 24.11 thread.
Link http://www.xbitlabs.com/articles/cpu...ay/replay.html |
![]() |
![]() |
![]() |
#5 | |
Banned
"Luigi"
Aug 2002
Team Italia
3×1,619 Posts |
![]() Quote:
![]() Luigi |
|
![]() |
![]() |
![]() |
#6 | |
Apr 2003
Milan, Italy
22×32 Posts |
![]() Quote:
![]() ![]() ![]() ![]() |
|
![]() |
![]() |
![]() |
#7 |
May 2005
162810 Posts |
![]()
There is a significant gain (>20%) by going from 23.8 to 24.11, however 24.12 does not offer any drastic performance increase over 24.11 (something below 1%).
|
![]() |
![]() |
![]() |
#8 |
Apr 2003
Milan, Italy
22·32 Posts |
![]()
Cool!
![]() ![]() ![]() ![]() About 10% gain. ![]() ![]() ![]() ![]() Code:
[Thu Jun 9 10:00:33 2005] Compare your results to other computers at http://www.mersenne.org/bench.htm That web page also contains instructions on how your results can be included. AMD Athlon(tm) 64 Processor 3800+ CPU speed: 2410.47 MHz CPU features: RDTSC, CMOV, Prefetch, 3DNow!, MMX, SSE, SSE2 L1 cache size: 64 KB L2 cache size: 512 KB L1 cache line size: 64 bytes L2 cache line size: 64 bytes L1 TLBS: 32 L2 TLBS: 512 Prime95 32-bit version 24.12, RdtscTiming=1 Best time for 512K FFT length: 20.036 ms. Best time for 640K FFT length: 26.049 ms. Best time for 768K FFT length: 31.640 ms. Best time for 896K FFT length: 37.771 ms. Best time for 1024K FFT length: 42.064 ms. Best time for 1280K FFT length: 53.510 ms. Best time for 1536K FFT length: 65.706 ms. Best time for 1792K FFT length: 79.678 ms. Best time for 2048K FFT length: 88.757 ms. Best time for 2560K FFT length: 120.701 ms. Best time for 3072K FFT length: 146.729 ms. Best time for 3584K FFT length: 178.234 ms. Best time for 4096K FFT length: 200.030 ms. Best time for 58 bit trial factors: 4.824 ms. Best time for 59 bit trial factors: 4.819 ms. Best time for 60 bit trial factors: 4.815 ms. Best time for 61 bit trial factors: 4.828 ms. Best time for 62 bit trial factors: 9.107 ms. Best time for 63 bit trial factors: 9.123 ms. Best time for 64 bit trial factors: 11.584 ms. Best time for 65 bit trial factors: 11.513 ms. Best time for 66 bit trial factors: 11.522 ms. Best time for 67 bit trial factors: 11.479 ms. ![]() ![]() Last fiddled with by Kaboom on 2005-06-09 at 08:05 |
![]() |
![]() |
![]() |
#9 | |
Jan 2003
CE16 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#10 |
Jan 2003
2×103 Posts |
![]()
The following were run on a 1MB cache P4, so we can see the new cache size tuning come into play:
Intel(R) Pentium(R) 4 CPU 2.80GHz CPU speed: 3227.28 MHz CPU features: RDTSC, CMOV, Prefetch, MMX, SSE, SSE2 L1 cache size: 16 KB L2 cache size: 1024 KB L1 cache line size: 64 bytes L2 cache line size: 128 bytes TLBS: 64 Prime95 version 24.11, RdtscTiming=1 Best time for 512K FFT length: 17.486 ms. Best time for 640K FFT length: 21.240 ms. Best time for 768K FFT length: 25.714 ms. Best time for 896K FFT length: 30.632 ms. Best time for 1024K FFT length: 34.529 ms. Best time for 1280K FFT length: 45.018 ms. Best time for 1536K FFT length: 54.450 ms. Best time for 1792K FFT length: 65.444 ms. Best time for 2048K FFT length: 73.603 ms. Best time for 58 bit trial factors: 8.512 ms. Best time for 59 bit trial factors: 8.567 ms. Best time for 60 bit trial factors: 8.483 ms. Best time for 61 bit trial factors: 8.549 ms. Best time for 62 bit trial factors: 11.898 ms. Best time for 63 bit trial factors: 11.961 ms. Best time for 64 bit trial factors: 13.826 ms. Best time for 65 bit trial factors: 13.817 ms. Best time for 66 bit trial factors: 13.801 ms. Best time for 67 bit trial factors: 13.754 ms. -------------------------------------------------------- Intel(R) Pentium(R) 4 CPU 2.80GHz CPU speed: 3227.31 MHz CPU features: RDTSC, CMOV, Prefetch, MMX, SSE, SSE2 L1 cache size: 16 KB L2 cache size: 1024 KB L1 cache line size: 64 bytes L2 cache line size: 128 bytes TLBS: 64 Prime95 32-bit version 24.12, RdtscTiming=1 Best time for 512K FFT length: 16.216 ms. Best time for 640K FFT length: 20.766 ms. Best time for 768K FFT length: 25.187 ms. Best time for 896K FFT length: 29.880 ms. Best time for 1024K FFT length: 34.054 ms. Best time for 1280K FFT length: 42.345 ms. Best time for 1536K FFT length: 51.512 ms. Best time for 1792K FFT length: 61.213 ms. Best time for 2048K FFT length: 69.289 ms. Best time for 2560K FFT length: 90.056 ms. Best time for 3072K FFT length: 110.067 ms. Best time for 3584K FFT length: 131.981 ms. Best time for 4096K FFT length: 148.198 ms. Best time for 58 bit trial factors: 8.525 ms. Best time for 59 bit trial factors: 8.543 ms. Best time for 60 bit trial factors: 8.531 ms. Best time for 61 bit trial factors: 8.528 ms. Best time for 62 bit trial factors: 11.948 ms. Best time for 63 bit trial factors: 11.947 ms. Best time for 64 bit trial factors: 13.613 ms. Best time for 65 bit trial factors: 13.751 ms. Best time for 66 bit trial factors: 13.720 ms. Best time for 67 bit trial factors: 13.816 ms. -------------------------------------------------------- So from my analysis - percentage speedup in (brackets): 512K : 17.486 vs 16.216 (+7.263%) 640K : 21.24 vs 20.766 (+2.232%) 768K : 25.714 vs 25.187 (+2.049%) 896K : 30.632 vs 29.88 (+2.455%) 1024K : 34.529 vs 34.054 (+1.376%) 1280K : 45.018 vs 42.345 (+5.938%) 1536K : 54.45 vs 51.512 (+5.396%) 1792K : 65.444 vs 61.213 (+6.465%) 2048K : 73.603 vs 69.289 (+5.861%) Conclusion: 1. There is no improvement for trial factoring. 2. Nice small improvements throughout the FFT range. Is it safe to switch to this new version of the client yet? Last fiddled with by db597 on 2005-06-09 at 12:35 |
![]() |
![]() |
![]() |
#11 | |
P90 years forever!
Aug 2002
Yeehaw, FL
41×199 Posts |
![]() Quote:
I've switched my P4s, my first double-check will complete on Monday. |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
LLR beta Version 3.8.13 (deprecated) | Jean Penné | Software | 111 | 2015-01-26 21:41 |
Prime95 beta version 28.4 | Prime95 | Software | 20 | 2014-03-02 02:51 |
Prime95 beta version 28.3 | Prime95 | Software | 68 | 2014-02-23 05:42 |
Early Beta of version 24.11 | Prime95 | Software | 113 | 2005-05-24 17:05 |
Beta version of PRP | Prime95 | PSearch | 15 | 2004-09-17 19:21 |