mersenneforum.org  

Go Back   mersenneforum.org > Fun Stuff > Lounge

Reply
 
Thread Tools
Old 2003-01-29, 21:19   #12
outlnder
 
outlnder's Avatar
 
Aug 2002

2·3·53 Posts
Default

Will V. 23.1 help any other machine besides Celeron and P4?

Does it only help on LL testing?

Does it only help with certain FFT lengths?
outlnder is offline   Reply With Quote
Old 2003-01-29, 22:15   #13
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

165468 Posts
Default

Quote:
Originally Posted by outlnder
Will V. 23.1 help any other machine besides Celeron and P4?
Does it only help on LL testing?
Does it only help with certain FFT lengths?
The answer is probably only P4 and P4 celeron and definitely just LL testing. The new FFTs are for:
P4 Celeron - 640K to 2M FFTs
P4 Northwood - 1280K to 4M FFTs

Why would any other situations be faster? Well, remember in the IRC chat I mentioned the weird problem where the debug version was faster than the non-debug version? It turns out this is because the debug version filled memory with 0xCD. Doing so walked through the pages linearly which makes it more likely the VM manager will allocate them linearly in physical memory. Prime95's FFT assembly code is optimized for this situation. This is especially important in the FFTs where close to all of the L2 cache was being utilized. If pages are not in contiguous physical memory, then some page reads will force other pages to be kicked out.

What I did was add a call to memset right after allocating memory so that the non-debug version behaves just like the debug version. Which FFT were "close to all of the L2 cache being utilized" and thus seeing the biggest benefit? The P4 celeron - 896K and 1M. The P4 willamette - 1792K and 2M. There should be some improvement in many FFT sizes but it will not be very noticeable for most.

I've no data on Athlon, P3, etc. as to whether the memset fix will make a big impact on any of their FFTs. We know that timings vary from run to run, so it may be hard to pinpoint. Please post any CPU/FFT combinations where you consistently see a 3% or more improvement in v23 - others will be interested.
Prime95 is online now   Reply With Quote
Old 2003-01-29, 22:46   #14
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

11101011001102 Posts
Default

OK, try the latest v23.1 to see if it detects the L2 cache size correctly
Prime95 is online now   Reply With Quote
Old 2003-01-30, 02:11   #15
outlnder
 
outlnder's Avatar
 
Aug 2002

13E16 Posts
Default

23.1 helps my P4s do there 33Ms.

About 3%.
outlnder is offline   Reply With Quote
Old 2003-01-30, 15:55   #16
priwo
 
Jan 2003

610 Posts
Default

here are your results

Intel(R) Celeron(R) CPU 2.00GHz
CPU speed: 2425.00 MHz
CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE, SSE2
L1 cache size: 8 KB
L2 cache size: unknown
L1 cache line size: 64 bytes
L2 cache line size: unknown
TLBS: 64
Prime95 version 22.13, RdtscTiming=1
Best time for 256K FFT length: 13.283 ms.
Best time for 320K FFT length: 20.281 ms.
Best time for 384K FFT length: 23.058 ms.
Best time for 448K FFT length: 28.685 ms.
Best time for 512K FFT length: 35.766 ms.
Best time for 640K FFT length: 68.557 ms.
Best time for 768K FFT length: 103.835 ms.
Best time for 896K FFT length: 126.564 ms.
Best time for 1024K FFT length: 148.212 ms.
Best time for 1280K FFT length: 231.566 ms.
Best time for 1536K FFT length: 279.798 ms.
Best time for 1792K FFT length: 342.215 ms.
[Thu Jan 30 16:33:01 2003]

Intel(R) Celeron(R) CPU 2.00GHz
CPU speed: 2425.14 MHz
CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE, SSE2
L1 cache size: 8 KB
L2 cache size: 128 KB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 64
Prime95 version 23.1, RdtscTiming=1
Best time for 384K FFT length: 20.535 ms.
Best time for 448K FFT length: 25.709 ms.
Best time for 512K FFT length: 33.254 ms.
Best time for 640K FFT length: 42.347 ms.
Best time for 768K FFT length: 55.776 ms.
Best time for 896K FFT length: 70.974 ms.
Best time for 1024K FFT length: 83.636 ms.
Best time for 1280K FFT length: 96.240 ms.
Best time for 1536K FFT length: 120.434 ms.
Best time for 1792K FFT length: 164.635 ms.
Best time for 2048K FFT length: 178.784 ms.
[Thu Jan 30 16:35:09 2003]

Intel(R) Celeron(R) CPU 2.00GHz
CPU speed: 2017.71 MHz
CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE, SSE2
L1 cache size: 8 KB
L2 cache size: 128 KB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 64
Prime95 version 23.1, RdtscTiming=1
Best time for 384K FFT length: 24.646 ms.
Best time for 448K FFT length: 30.928 ms.
Best time for 512K FFT length: 39.963 ms.
Best time for 640K FFT length: 50.842 ms.
Best time for 768K FFT length: 67.195 ms.
Best time for 896K FFT length: 85.514 ms.
Best time for 1024K FFT length: 100.128 ms.
Best time for 1280K FFT length: 115.591 ms.
Best time for 1536K FFT length: 146.711 ms.
Best time for 1792K FFT length: 196.979 ms.
Best time for 2048K FFT length: 215.330 ms.
[Thu Jan 30 16:41:09 2003]

Intel(R) Celeron(R) CPU 2.00GHz
CPU speed: 2018.10 MHz
CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE, SSE2
L1 cache size: 8 KB
L2 cache size: unknown
L1 cache line size: 64 bytes
L2 cache line size: unknown
TLBS: 64
Prime95 version 22.13, RdtscTiming=1
Best time for 256K FFT length: 15.998 ms.
Best time for 320K FFT length: 24.348 ms.
Best time for 384K FFT length: 27.829 ms.
Best time for 448K FFT length: 34.437 ms.
Best time for 512K FFT length: 43.061 ms.
Best time for 640K FFT length: 82.321 ms.
Best time for 768K FFT length: 124.978 ms.
Best time for 896K FFT length: 152.649 ms.
Best time for 1024K FFT length: 178.291 ms.
Best time for 1280K FFT length: 277.927 ms.
Best time for 1536K FFT length: 336.477 ms.
Best time for 1792K FFT length: 410.994 ms.
priwo is offline   Reply With Quote
Old 2003-01-30, 22:13   #17
Paulie
 
Paulie's Avatar
 
Aug 2002

223 Posts
Default Thinkpad A31 - P4/1.6/512c Mobile

Intel(R) Pentium(R) 4 Mobile CPU 1.60GHz
CPU speed: 1199.02 MHz
CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE, SSE2
L1 cache size: 8 KB
L2 cache size: 512 KB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 64
Prime95 version 22.12, RdtscTiming=1
Best time for 256K FFT length: 15.023 ms.
Best time for 320K FFT length: 19.940 ms.
Best time for 384K FFT length: 24.177 ms.
Best time for 448K FFT length: 28.870 ms.
Best time for 512K FFT length: 32.792 ms.
Best time for 640K FFT length: 42.319 ms.
Best time for 768K FFT length: 51.536 ms.
Best time for 896K FFT length: 63.530 ms.
Best time for 1024K FFT length: 69.550 ms.
Best time for 1280K FFT length: 99.515 ms.
Best time for 1536K FFT length: 123.621 ms.
Best time for 1792K FFT length: 153.544 ms.

Intel(R) Pentium(R) 4 Mobile CPU 1.60GHz
CPU speed: 1199.07 MHz
CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE, SSE2
L1 cache size: 8 KB
L2 cache size: 512 KB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 64
Prime95 version 23.1, RdtscTiming=1
Best time for 384K FFT length: 24.114 ms.
Best time for 448K FFT length: 28.829 ms.
Best time for 512K FFT length: 32.651 ms.
Best time for 640K FFT length: 42.160 ms.
Best time for 768K FFT length: 51.535 ms.
Best time for 896K FFT length: 63.071 ms.
Best time for 1024K FFT length: 68.382 ms.
Best time for 1280K FFT length: 91.864 ms.
Best time for 1536K FFT length: 111.431 ms.
Best time for 1792K FFT length: 138.291 ms.
Best time for 2048K FFT length: 148.697 ms.
Paulie is offline   Reply With Quote
Old 2003-01-30, 22:15   #18
Paulie
 
Paulie's Avatar
 
Aug 2002

223 Posts
Default Compaq DL360/Gen3 P4/2.8/512c Xeon

Intel(R) Xeon(TM) CPU 2.80GHz
CPU speed: 2787.36 MHz
CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE, SSE2
L1 cache size: 8 KB
L2 cache size: 512 KB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 64
Prime95 version 22.12, RdtscTiming=1
Best time for 256K FFT length: 9.437 ms.
Best time for 320K FFT length: 12.028 ms.
Best time for 384K FFT length: 14.538 ms.
Best time for 448K FFT length: 17.354 ms.
Best time for 512K FFT length: 19.651 ms.
Best time for 640K FFT length: 25.286 ms.
Best time for 768K FFT length: 30.692 ms.
Best time for 896K FFT length: 37.598 ms.
Best time for 1024K FFT length: 40.882 ms.
Best time for 1280K FFT length: 57.051 ms.
Best time for 1536K FFT length: 72.264 ms.
Best time for 1792K FFT length: 89.022 ms.

Intel(R) Xeon(TM) CPU 2.80GHz
CPU speed: 2787.37 MHz
CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE, SSE2
L1 cache size: 8 KB
L2 cache size: 512 KB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 64
Prime95 version 23.1, RdtscTiming=1
Best time for 384K FFT length: 14.415 ms.
Best time for 448K FFT length: 17.315 ms.
Best time for 512K FFT length: 19.521 ms.
Best time for 640K FFT length: 25.115 ms.
Best time for 768K FFT length: 30.626 ms.
Best time for 896K FFT length: 37.338 ms.
Best time for 1024K FFT length: 40.576 ms.
Best time for 1280K FFT length: 53.926 ms.
Best time for 1536K FFT length: 65.628 ms.
Best time for 1792K FFT length: 81.447 ms.
Best time for 2048K FFT length: 87.222 ms.
Paulie is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Is any GTX 750 the GeForce GTX 750 Ti owner here? pepi37 Hardware 12 2016-07-17 22:35
Pentium 90 // Pentium ][ 400 years ValerieVonck Programming 4 2006-12-12 17:06
Celeron 2.40 too slow? rudi_m Hardware 14 2005-10-11 03:31
New celeron. look, look! E_tron Hardware 5 2004-07-13 05:16
Celeron vs. P4 PrimeCruncher Hardware 7 2003-11-14 02:19

All times are UTC. The time now is 21:31.


Fri Jul 16 21:31:24 UTC 2021 up 49 days, 19:18, 1 user, load averages: 1.81, 2.07, 1.99

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.