![]() |
|
|
#12 |
|
Jan 2005
19 Posts |
I concur right there that load average is not an issue here. Proper usage of the CPU might be, but the cpu in this case is showing as having a 97.2% utilization rate for user processes, with some Hardware Interrupts (hi) and Software interrupts (si) thrown in. This CPU is doing it's job and running EVERYTHING it is told to.
The program 'top' will allow for the usage display of each CPU *CORE* (one per line) just like above except CPU0, CPU1, CPU2, and CPU3 will each be listed individually in multi core AND multi cpu systems. This is the 'SGI mode' on/off feature as listed in the help (the '1' key). Read up on it and then also check the displays from groups (G key) 1,2, 3, and 4. Regarding the Wikipedia explanation of load average, there are a few VERY minor changes to how load average is computed when doing virtualizations (multiple cpus/boxen put together as 1 'virtual cpu'), but Xyzzy is correct, as is the Wiki. This is EXACTLY how I expect a proper Linux machine to behave. Any new, 'worth it's weight' linux that is thread/interrupt based, typically kernel 2.6 and up IMHO, and will show a more accurate instantaneous display (1/sec rate). The difference between BSD 4.3 (1980s era, i/o first) and todays Linux is not worth mentioning as it relates to top and power consumption limiters / load limiters. I think there is an application limitation (check the # of active threads for each pid) or the # of system calls per second is overloading the effective utilization. If you have any CPU power management limiters turned on, then all bets are off as it has negated the reason for having multiple cores. If you have the intels and the Intel cpu has a 'throttle protection' in it, then again, all bets are off if you above the temperature threshold. (read up on the P4 temperature throttle diode feature horrors of 1999-2001) Also, you must pay close attention to which cpu is the boot cpu (where the scheduler runs). I grant that Linux should (and usually does) allow the scheduler to run anywhere, there's a bias to run some things on the boot (initial load) cpu. This is a per board thing and typically (but not a hard rule) that the cpu used for IPL (after bios is done) will be the 'boot' cpu. There is a bit of asymmetry in this regard to an otherwise symmetric environment. My Tyan board uses socket 1, Core 0 as it's boot CPU, and Socket 0, Core 0 as a regular cpu. Check your machine thoroughly. As a way of conducting load balancing tests to confirm or debunk any theories about what's going on, use of the 'taskset' command will allow processes to be 'LOCKED' onto specific cpu(s). Notice I said PROCESSES... not threads.... but a proper virtual process (cheater's thread) (created with vfork() ) will show up as its own process and be scheduleable / manageable as one. Move entire DC pids (using taskset) to manually load balance the process load of USER applications. See: man taskset Pay attention to the '-c' option as this is the control mechanism for setting which CPU(s) may be used, and the -p option (specific existing pid(s) to modify). When starting a new process, the -p is not needed as the process is created with the cpu mask you specified using -c. I hope I'm awake enough while writing this and I make sense. I have written enough kernel code to make one's stomach churn. Please let me know if there is anything I can help with by looking at, etc. Chuck PS: You could have a 'trimmed down' kernel build.... aka, small table limits which can be easily corrected by building & installing a new kernel (or specifying some command-line options at boot time.. depending on the kernel). I need more info to see what's going on, but Xyzzy is on target. Last fiddled with by Tumo on 2007-10-29 at 18:29 Reason: kernel build comment |
|
|
|
|
|
#13 |
|
"Jason Goatcher"
Mar 2005
3·7·167 Posts |
It's right before dinner, which is when I'm stupidest, I'll probably come back to this thread in about 3.5-4 hours.
Thanks for all the help. |
|
|
|
|
|
#14 |
|
"Jason Goatcher"
Mar 2005
DB316 Posts |
I'll keep an eye on the computer. There was a power failure, and when I restarted it, everything was fine.
Anything I should search for in the log files? Nothing jumps out at me, but then again I don't know what to look for. |
|
|
|
|
|
#15 |
|
"Jason Goatcher"
Mar 2005
3×7×167 Posts |
ecm definitely causes it, and as far as my experimenting went, it only happened when I told it to save Step 1.
Could any of the following be the problem? I basically greped everything around the time of the error. Code:
/var/log/daemon.log:Oct 28 21:30:04 jason-desktop dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 18 /var/log/daemon.log:Oct 28 21:30:22 jason-desktop dhclient: No DHCPOFFERS received. /var/log/daemon.log:Oct 28 21:30:22 jason-desktop dhclient: No working leases in persistent database - sleeping. /var/log/daemon.log.0:Oct 23 21:30:55 jason-desktop dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 5 /var/log/daemon.log.0:Oct 26 21:30:16 jason-desktop dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 5 /var/log/daemon.log.0:Oct 26 21:30:21 jason-desktop dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 9 /var/log/daemon.log.0:Oct 26 21:30:30 jason-desktop dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 17 /var/log/daemon.log.0:Oct 26 21:30:47 jason-desktop dhclient: No DHCPOFFERS received. /var/log/daemon.log.0:Oct 26 21:30:47 jason-desktop dhclient: No working leases in persistent database - sleeping. /var/log/daemon.log.0:Oct 27 21:30:21 jason-desktop dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 6 /var/log/daemon.log.0:Oct 27 21:30:27 jason-desktop dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 8 /var/log/daemon.log.0:Oct 27 21:30:35 jason-desktop dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 8 /var/log/daemon.log.0:Oct 27 21:30:43 jason-desktop dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 9 /var/log/daemon.log.0:Oct 27 21:30:52 jason-desktop dhclient: No DHCPOFFERS received. /var/log/daemon.log.0:Oct 27 21:30:52 jason-desktop dhclient: No working leases in persistent database - sleeping. /var/log/messages.0:Oct 24 21:30:02 jason-desktop -- MARK -- /var/log/messages.0:Oct 25 21:30:19 jason-desktop -- MARK -- /var/log/messages.0:Oct 26 21:30:36 jason-desktop -- MARK -- /var/log/messages.0:Oct 27 21:30:54 jason-desktop -- MARK -- /var/log/syslog.0:Oct 28 21:30:04 jason-desktop dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 18 /var/log/syslog.0:Oct 28 21:30:22 jason-desktop dhclient: No DHCPOFFERS received. /var/log/syslog.0:Oct 28 21:30:22 jason-desktop dhclient: No working leases in persistent database - sleeping. |
|
|
|
|
|
#16 |
|
Jan 2005
1316 Posts |
Someone please update me if wrong, but isn't ECM still GMP based? If so, then bottlenecks to memory will cause strange things to happen (or *appear* to happen). That being said, if memory *is* the issue, I strongly suggest you run the p95 torture test to ensure your cpu is up to par with the system. Also, if the cpu is flaking out (which has been known to happen, as it did to me with a L1 cache failure recently), the torture test will heat things up and make it visible. Also, use Memtest 86+ (the latest ver) on a boot CD and let it show you what it sees and finds to ensure the memory itself is up to spec. The electrical balance (capacitive loading) has got to be correct so make sure things are in the right DIMM slots. There should be NO suprises.... I assume you are DDR2 667 or 800?
My DDR2-800 gets fussy (MUCH warmer than normal) if matched pairs of TwinX are put in the wrong banks (dual channel lanes). Check this out as defined by your board and memory manufacturer. A single bit error in ECM will drive it nuts. I suggest starting out with a nice, cool, idle machine and working from that point.... running memtest first, then rebooting and adding instances of P95 (taskset/affinity utilized) for each core after you are sure that all preceding tests passed. You are likely to find a bad bit somewhere in your memory before anything else. High memory traffic (usage) like ECM will definitely fail out on that. Watch your syslog as well as boot (init) startup for any pages that are blocked out and that all sizes show up correctly. Also, capacitive load of DIMMS in the correct slots is a must at DDR2 speeds as well as for dual-channel operation (which I assume you have). the 'dmesg' will show recent errrors from the /var/log/syslog, but these programs (memtest and p95 stop at the first sign of trouble). Hope this helps. FWIW, I have 4 barcelonas (leftover from beta testing on 2 boards), two other multi-core cpus, 1/board (one is a simple A64 X2 - 4400) and a lonely p4b & K6-2. I keep them below 40C to guarantee proper operation AND maximum life. They WILL stay below 40C (k6-2 & p4 below 35C) with the right cooling (copper and AS-5) under full load. The memory does get warm and therefore is in direct path of air flow. Corsair DDR2 gets warm even. All memory is Corsair... from pc-3200 XL (2-2-2-5) up to ECC (3-2-3-6) and 5-5-5-12 DDR2. Running multi boot configuration here.... 2x 2k/FC7 dual boot, FC7 and 2k single boots, with xp sp2 on the p4b 2.4G. Also have a virtual cpu constructed (PVM) that does use multiple cpus and mobos (dual gig pvt backbone for this) and another gig network as the main w/ 13TB of storage. (k6-2 is primary network file server, with sata-2 boards and gigabit networking, FC6 + SMB), but I want reflective memory on fibre. I am working on data for another thesis. This monster keeps me warm in the winter, and I *SHOULD* buy stock in the local power utility OR get my own wind generator. :) Hope this helps C. Last fiddled with by Tumo on 2007-10-30 at 08:52 Reason: fixed typo |
|
|
|
|
|
#17 |
|
"Jason Goatcher"
Mar 2005
3·7·167 Posts |
Can a wall power problem cause my problem AND be so bad that a computer will end up refusing to boot up, even though being "protected" by a battery backup?
The reason I ask this is my Windows computer is now refusing to boot AND my KVM switch is misbehaving. I'm going to stick to just using my cheapest Linux box until I can figure out what's going on. Last fiddled with by jasong on 2007-10-31 at 21:54 |
|
|
|
|
|
#18 | |
|
Jan 2005
19 Posts |
Quote:
a faulty or non-connected ground AND a floating ground in the UPS will definitely cause problems. I had an APCC wall chord go bad (UPS was fine) but it left a floating ground (which shows up as a wiring problem on the UPS) and fails to ground the 3rd leg of the transformer. this is magnified if any of the switches, bridges, routers, or modem are NOT plugged in. Get a meter and check the case -> TRUE earth ground (a metal pipe or something.. or even the wall socket's ground if you must). ground is ground is ground and should be true throughout and you don't want a ground loop (as it's known). There should be ZERO volts AC difference also make sure you have respectable tolerance settings for incoming power levels on the UPS. I have to run mine on HIGH here, but it works. Setting it on low would allow a brownout or high-line level to start before the UPS reacts. The faster it reacts, the better. Last fiddled with by Tumo on 2007-11-01 at 05:37 Reason: ups sensitivity settings. |
|
|
|
|
|
|
#19 |
|
"Jason Goatcher"
Mar 2005
3×7×167 Posts |
Other than the fact that the previous post was in English, I'm clueless. ;)
My dad worked in military communications back in the 80s, so I suspect he knows enough to figure out that post. I'll ask him tomorrow afternoon to test for grounding problems. Thanks, Chuck. :) |
|
|
|
|
|
#20 |
|
"Jason Goatcher"
Mar 2005
1101101100112 Posts |
Okay, I reread Chuck's(Tumo's) post, and I've managed to understand it a little better. Me and my dad basically do this via a strange sort of teamwork. I get ideas from the various forums, tell them to my dad, and together we manage to puzzle out what actually needs to be done.
I'll talk to him and see what we come up with. Okay, I talked to him and, according to his large but somewhat antiquated knowledge, the battery backup and the plugs are all properly grounded. In his opinion(not claiming he's right OR wrong) as long as the power is grounded up until the point it enters the case, the power isn't the problem. Although, to be honest, when it came to power fluctuations, his concern is my Dell box, which went belly up last night or the night before. At the moment, my concern is to get my quad-core back online to where the kvm switch is no longer necessary. I'm planning on moving totally to Linux, whether my Linux box can be fixed or not. Please keep the ideas coming, but for the moment, I'm going to be concerning myself with a few other things. |
|
|
|
|
|
#21 |
|
Sep 2002
Database er0rr
3,739 Posts |
A dead box, right? Does the CPU fan spin when the box is switched on? Does the keyboard momentarily flash a few seconds after the box is switched on?
Last fiddled with by paulunderwood on 2007-11-02 at 01:58 |
|
|
|
|
|
#22 |
|
Jan 2005
19 Posts |
Jasong,
You and your dad basically have it right. I should have waited until I was more awake to answer and hence more brief. the UPS should not be showing any alarms/warnings with "all" machine(s)/device(s) plugged in. As per Paul's test... also see if, after turning off the Power supply and then back on, pressing the ON switch on the case itself causes things to light up. I have seen a few fans go bad. If all is well to this point, you should get a keyboard LED flash and, as Paul also said, the KVM lighting up (mine lights up the 'upper' half of a channel when a machine is connected and 'lower' when selected) should show all 'on' machine(s) as generating video and the 'selected' one should also light up. That *should* get you video on the screen so you can watch bios run... then the standard bios boot flash, kernel bootloader prompt, etc,etc if the PC is working ok. If a cpu is hung, it will take a bit to diagnose. toasted mem and/or crashed drive(s) are much easier. Given you had a 'toes up' situation the other night..... If you get 'da blinken und da flaschen boxen' (LOL) grab a fedora 7 (I think) boot disk (someone please correct me if wrong) so you can put in a boot CD and have it run memtest86..... (I can't remember if that's on the CD or DVD... sorry) or just run plain memtest86 from it's own boot CD. You will know if the CPU got toasted really quick. A core may still be bad, but if the startup core is bad, this will show it. I will look for you in the 'other' places online. Thanks for keeping my steering true Paul. Chuck. |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Dual Core to Quad Core Upgrade | Rodrigo | Hardware | 6 | 2010-11-29 18:48 |
| exclude single core from quad core cpu for gimps | jippie | Information & Answers | 7 | 2009-12-14 22:04 |
| Quad Core and P95 | sgrupp | Hardware | 54 | 2008-01-25 22:01 |
| Quad Core | R.D. Silverman | Hardware | 76 | 2007-11-19 21:57 |
| Optimising work for Intel Core 2 Duo or Quad Core | S485122 | Software | 0 | 2007-05-13 09:15 |