mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Hardware (https://www.mersenneforum.org/forumdisplay.php?f=9)
-   -   Perpetual benchmark thread... (https://www.mersenneforum.org/showthread.php?t=59)

petrw1 2008-11-26 17:02

[QUOTE=petrw1;150462]Ok...so now I did a Manual Communication and I NOW have a second computer line added (for the same computer ... same name) but the second one shows as a Q9550. However, the first entry still has shows the assignments and progress.[/QUOTE]

Last night Windows did one of thoses automatic updates and restarts and now the Xeon computer is back in my list :shock:

[CODE]Model Intel Pentium III Xeon processor
Features 4 core, Prefetch,SSE,SSE2,SSE4
Speed 2.833 GHz (2.801 GHz P4 effective equivalent)
L1/L2 Cache 32 / 6144 KB
Computer Memory 4094 MB configured usage 1600 MB day / 1600 MB night [/CODE]

[QUOTE]P.S. I have changed the setting to "Let me decide when to update" so I have no more surprises.[/QUOTE]

I believe I can redo what I did last time to get rid of it again BUT will this continue to happen with every reboot?

Part II
I renamed one of each pair (I now have two pairs) and found that not only do I have assignments charged to each member of each pair BUT I also have results associated to each member of each pair. SO... I am afraid that if I drop one CPU from each pair I will also lose the associated credits.

Please advise.

cheesehead 2008-11-26 21:14

[quote=stars10250;150799]Wow, 141 hours...that's crazy! I thought 24 hrs was a good enough indicator of stability.

< snip >

I do plan to continue to lower the voltages more, so I can find the stability point, but I note your advice of failure at 141 hours.[/quote]Consider this point of view:

How long will an L-L test take? Do you think a system that fails before the length of time of even one LL test is "Prime95 stable"?

After all, the Prime95 "torture test" is simply a set of partial LL tests with known results. Nothing there that isn't in a real LL test ... except that the torture test is shorter. (Well ... it takes no break for writing a save file, as a real test does every half-hour or whatever, so it's slightly more strenuous in that regard.)

Also: Suppose your system completes an LL test in which there was one hardware error that changed a bit or two. The residue it reports will not match the residue reported by a doublecheck on someone else's (stable) system, _and thus it won't contribute a useful result to the GIMPS goal_ (someone's triplecheck run will be necessary) despite all the time it spent on that LL test. Prime95 LL calculations are probably the most stressful things your system will ever do for which not even a single bit error is acceptable.

OTOH, Prime95 does some crosschecks to try to catch errors during the LL test so that it can back up to the previous save file and run the most recent portion again, so it _is_ possible for some errors to occur without invalidating the result. So where I wrote "error" previously, substitute "undetected error".

[quote] Maybe it got particularly hot in your room at that time?[/quote]Might the room where your system resides experience an occasional thermal excursion over the course of ... (how long you plan to use this system for GIMPS)? :smile:

stars10250 2008-11-26 21:53

I'm not sure I know the right answer here. What is a satisfactory amount of time to determine stability of an OC'd system specifically designed to run prime95 24/7? My new system is running stable for about 30 hours now, which is encouraging, but like you say that's nowhere near 1 LL test which will take over a month. The last thing I want to do is contribute meaningless work or add confusion to GIMPS. OTOH, how infallible are stock computers? I religiously watch and clean all my computers of dust because they tend to accumulate a lot running 24/7, but if others don't then they may overheat slightly and cause bit errors. Then there's the random error rate that any computer is subject to. Advice? Regarding room temperature changes, I watch mine regularly and air condition when needed.

Jeff Gilchrist 2008-11-27 01:00

[QUOTE=stars10250;150886]I'm not sure I know the right answer here. What is a satisfactory amount of time to determine stability of an OC'd system specifically designed to run prime95 24/7? My new system is running stable for about 30 hours now, which is encouraging, but like you say that's nowhere near 1 LL test which will take over a month. The last thing I want to do is contribute meaningless work or add confusion to GIMPS. [/QUOTE]

I would suggest setting up Prime95 to do Double Checks for now, the ones the server is handing out should finish within 10 days or so, then you can check the results (you can do 4 at a time with your Quad) on the server to make sure they match with the previous submission. If all 4 double checks are good, you can be pretty sure your system is stable.

retina 2008-11-27 01:15

[QUOTE=Jeff Gilchrist;150896]..., you can be pretty sure your system is stable.[/QUOTE]I think that is the important statement that everyone must realise. The absolute most one can ever expect is to be "pretty sure" of stability. Absolute certainty is never possible. Even a machine that has been running for many months (or years) without issue can still throw an error at any time. It all comes down to probability, which is never 100%.

CADavis 2008-11-27 04:01

[QUOTE=stars10250;150799]Maybe it got particularly hot in your room at that time?[/QUOTE]

Well that [i]could[/i] be a possibility but I am running with water cooling and my cores stay under 40 degrees Celcius at all times, even when it gets in the 80s Fahrenheit in my room and there is dust clogging all my fan filters. A better explanation might have been some sort of anomaly in the power source (temporary under-voltage), but again I'm running a PC Power & Cooling psu (built prior to OCZ acquiring them) that are well known for extreme stability and it doesn't drop volts even going from 0% to 100% load. Maybe God just intervened?

petrw1 2008-11-28 16:41

:question:[QUOTE=petrw1;150820]Last night Windows did one of thoses automatic updates and restarts and now the Xeon computer is back in my list :shock:

I believe I can redo what I did last time to get rid of it again BUT will this continue to happen with every reboot?

Part II
I renamed one of each pair (I now have two pairs) and found that not only do I have assignments charged to each member of each pair BUT I also have results associated to each member of each pair. SO... I am afraid that if I drop one CPU from each pair I will also lose the associated credits.

Please advise.[/QUOTE]

Another shutdown and not I have 3 entries for the same CPU. Maybe I just need to let the CPU report be for now because I know how many CPUs I have and all the results are getting reported properly.

stars10250 2008-11-30 17:30

I've confirmed the obvious..C2Q=3C LL (even if OC)
 
I have some data on my cheap $360 C2Q build (lots of newegg rebates). My goal was to see how much performance/price I could get and if the new i7 is worth buying for dedicated crunching (I don't own an i7 but there are benchmarks to compare with). My other goal was to learn and have fun, which I did.

For hardware, the configuration I settled on was a Q6600, Gigabyte GA-EP45-DS3L, 2Gb Corsair PC8500, air cooled. I based this on recommendations made by members here, and I'm happy with it. This is my first OC ever.

Regarding OC, the system just seems to like 3.2 GHz (8x), 400 MHz FSB (1600 MHz rated FSB), and 4-4-4-12 ram timing. I can run it a little faster (3.4 GHz), but it just doesn't "feel" right for the 24/7 performance I want out of it. I've spent several days playing around with the voltages and timings, and I've come to sense when it is going to go unstable or when it will run but it doesn't like it. This is hard to explain without trying it.

Here are my benchmarks. Note that it thinks the CPU is running at 3.6 GHz while I can definitively state that it was running at 3.2 GHz. I'm not sure why it does this.

Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
CPU speed: 3600.08 MHz, 4 cores
CPU features: RDTSC, CMOV, Prefetch, MMX, SSE, SSE2
L1 cache size: 32 KB
L2 cache size: 4 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 256
Prime95 32-bit version 25.7, RdtscTiming=1
Best time for 768K FFT length: 13.290 ms.
Best time for 896K FFT length: 15.930 ms.
Best time for 1024K FFT length: 18.304 ms.
Best time for 1280K FFT length: 22.636 ms.
Best time for 1536K FFT length: 27.548 ms.
Best time for 1792K FFT length: 32.760 ms.
Best time for 2048K FFT length: 36.395 ms.
Best time for 2560K FFT length: 47.995 ms.
Best time for 3072K FFT length: 58.576 ms.
Best time for 3584K FFT length: 69.480 ms.
Best time for 4096K FFT length: 77.767 ms.
Best time for 5120K FFT length: 99.739 ms.
Best time for 6144K FFT length: 121.398 ms.
Best time for 7168K FFT length: 147.373 ms.
Best time for 8192K FFT length: 161.898 ms.
Timing FFTs using 2 threads.
Best time for 768K FFT length: 9.597 ms.
Best time for 896K FFT length: 10.831 ms.
Best time for 1024K FFT length: 19.158 ms.
Best time for 1280K FFT length: 14.285 ms.
Best time for 1536K FFT length: 17.300 ms.
Best time for 1792K FFT length: 20.300 ms.
Best time for 2048K FFT length: 22.771 ms.
Best time for 2560K FFT length: 29.233 ms.
Best time for 3072K FFT length: 35.827 ms.
Best time for 3584K FFT length: 41.938 ms.
Best time for 4096K FFT length: 47.609 ms.
Best time for 5120K FFT length: 59.309 ms.
Best time for 6144K FFT length: 71.110 ms.
Best time for 7168K FFT length: 84.468 ms.
Best time for 8192K FFT length: 95.391 ms.
Timing FFTs using 3 threads.
Best time for 768K FFT length: 9.665 ms.
Best time for 896K FFT length: 10.809 ms.
Best time for 1024K FFT length: 16.281 ms.
Best time for 1280K FFT length: 10.957 ms.
Best time for 1536K FFT length: 13.239 ms.
Best time for 1792K FFT length: 15.507 ms.
Best time for 2048K FFT length: 17.861 ms.
Best time for 2560K FFT length: 23.127 ms.
Best time for 3072K FFT length: 28.356 ms.
Best time for 3584K FFT length: 33.093 ms.
Best time for 4096K FFT length: 37.591 ms.
Best time for 5120K FFT length: 45.997 ms.
Best time for 6144K FFT length: 56.665 ms.
Best time for 7168K FFT length: 68.545 ms.
Best time for 8192K FFT length: 77.433 ms.
Timing FFTs using 4 threads.
Best time for 768K FFT length: 8.898 ms.
Best time for 896K FFT length: 9.781 ms.
Best time for 1024K FFT length: 13.938 ms.
Best time for 1280K FFT length: 9.415 ms.
Best time for 1536K FFT length: 11.331 ms.
Best time for 1792K FFT length: 13.370 ms.
Best time for 2048K FFT length: 15.248 ms.
Best time for 2560K FFT length: 19.062 ms.
Best time for 3072K FFT length: 23.626 ms.
Best time for 3584K FFT length: 27.530 ms.
Best time for 4096K FFT length: 31.789 ms.
Best time for 5120K FFT length: 39.873 ms.
Best time for 6144K FFT length: 48.571 ms.
Best time for 7168K FFT length: 58.240 ms.
Best time for 8192K FFT length: 67.463 ms.
Best time for 58 bit trial factors: 3.370 ms.
Best time for 59 bit trial factors: 3.338 ms.
Best time for 60 bit trial factors: 3.341 ms.
Best time for 61 bit trial factors: 3.325 ms.
Best time for 62 bit trial factors: 5.568 ms.
Best time for 63 bit trial factors: 5.567 ms.
Best time for 64 bit trial factors: 5.176 ms.
Best time for 65 bit trial factors: 5.141 ms.
Best time for 66 bit trial factors: 5.142 ms.
Best time for 67 bit trial factors: 5.135 ms.

Personally, I've never liked this benchmark technique because its not what I want to know about a system. I don't run 4 cores on 1 exponent and I don't think others do either. Maybe one can infer quad-core performance based on these numbers but I can't. Instead, here are some real life numbers to see what running the different cores does to performance. The 4 cores were working on 4 different exponents (all of order 47.842M, 2560K FFT) for these results:

core 0 running, iteration time: 49 ms
cores 0 and 2 running, iteration times: 51 ms
cores 0 and 1 and 2 running, iteration times: 53 ms
cores 0 and 1 and 2 and 3 running, iteration times: 65 ms

(side note: when running multiple cores, the iteration times were the same across all cores)

If you work some math, you'll find that running 4 cores accomplishes the most work...but it is only slightly faster than running 3 cores. This confirms what others have said about getting ~3 cores worth of performance from a C2Q.

I suspected a large slowdown might occur when the FFT size no longer fit nicely into the L2 cache (when running cores 0 and 1 for instance), but this didn't really happen. The Q6600 has 4 MB of L2 per 2 cores, so one 2560K FFT fits nicely but two doesn't. As you can see, however, going from cores 0 and 2 running to cores 0, 1 and 2 running only increased the iteration time by 2 ms. The real penalty was felt when going from 3 to 4 cores running, which implies the main memory is the bottleneck. People have always referred to the memory bottleneck, but I didn't know if they meant L2 cache memory or main motherboard memory. Now I see it is the latter.

I thought I would be able to do better than this since my FSB is running pretty fast at 400 MHz. But while it is running fast, the processor is also pumping out more data as it too is OC'd. I played around with faster processor and FSB speeds, but I could only shave a few ms off the numbers and it wasn't worth it in my opinion.

For fun I ran the computer at it's proper settings of 2.4 GHz, 266 MHz FSB, 4 cores, and got iteration times of 96 ms (vice 65 ms). That's a pretty big improvement from OC. For those that have never OC'd and are afraid (for lack of a better word), my advice is to try it with a proven system like this. The cost is low, the performance gain (~30%) is significant, and its actually quite fun. Read up on it a bit but then go for it. Reading forever isn't like doing it.

For comparison to the 920 i7 (2.6GHz), the benchmark page lists the iteration time for a 2560K FFT at 52.37 ms. Also, a different user reported four instances ran on the i7 at about the same speed, indicating that the improved memory management likely did away with the bottleneck. So let's assume the i7 will run all 4 cores at this speed. My OC'd system was only able to achieve 65 ms for 4 instances, so the i7 is about 25% faster. But a system built around a 920 i7 will readily cost twice as much ($720 vs $360) as the one here, making the performance/price better for the Q6600.

Overall a fun experiment.

Prime95 2008-11-30 18:43

[QUOTE=stars10250;151377]I suspected a large slowdown might occur when the FFT size no longer fit nicely into the L2 cache (when running cores 0 and 1 for instance), but this didn't really happen. The Q6600 has 4 MB of L2 per 2 cores, so one 2560K FFT fits nicely but two doesn't.[/QUOTE]

One 2560K FFT takes 2560K * 8 bytes (size of a double precision float) = 20MB. And this is just for the FFT data. Sin/cos data and IBDWT weights add to the total. No FFTs presently being run fit in the L2 or L3 cache of any processor currently available.

stars10250 2008-11-30 21:17

[quote=Prime95;151381]One 2560K FFT takes 2560K * 8 bytes (size of a double precision float) = 20MB. And this is just for the FFT data. Sin/cos data and IBDWT weights add to the total. No FFTs presently being run fit in the L2 or L3 cache of any processor currently available.[/quote]

Thanks for the clarification :)

stars10250 2008-11-30 21:25

[quote=stars10250;151377] a system built around a 920 i7 will readily cost twice as much ($720 vs $360) as the one here, making the performance/price better for the Q6600.
[/quote]

I guess to be fair, one could consider overclocking the i7 system to achieve even better performance for the price. I've seen several reports on this already, and I believe all the current X58 motherboards are overclockable. I don't know of any P95 benchmarks for such a system however.


All times are UTC. The time now is 22:58.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.