![]() |
|
|
#23 | |
|
"Ed Hall"
Dec 2009
Adirondack Mtns
556410 Posts |
Quote:
|
|
|
|
|
|
|
#24 |
|
"Jacob"
Sep 2006
Brussels, Belgium
111101000102 Posts |
If the machines are kept on (and you do not have the cold start problem) you might play with the profiles through the BIOS or a program. Modifying the case-fans profile to even stop under a certain temperature to keep the heat in the case. I would not stop the processor-cooler fans : there must be a minimum of air circulation in the case to avoid hot spots.
|
|
|
|
|
|
#25 |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
2×112×47 Posts |
Yup. Network monitoring is what Cacti (and RRDTools) was originally created for. But, at the end of the day, anything which can be sampled regularly can be logged, graphed, and analyzed. Temps, utilization, latency, UPS stats, etc.
No serious network engineer doesn't deploy this (or something similar). |
|
|
|
|
|
#26 |
|
"Ed Hall"
Dec 2009
Adirondack Mtns
10101101111002 Posts |
Here's a screenshot of the temps for the machine I had trouble with already this year. This was taken while running all 24 threads as CADO-NFS clients:
Code:
top - 08:59:50 up 1 day, 21:50, 2 users, load average: 23.28, 23.25, 23.12
Tasks: 418 total, 2 running, 416 sleeping, 0 stopped, 0 zombie
%Cpu(s): 94.2 us, 1.5 sy, 0.0 ni, 4.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 24036.7 total, 16509.3 free, 3601.6 used, 3925.8 buff/cache
MiB Swap: 16373.0 total, 16373.0 free, 0.0 used. 20004.4 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
190326 math98 20 0 1476540 863236 9532 S 763.8 3.5 20:09.80 las
190369 math98 20 0 1477580 860944 9408 S 757.1 3.5 12:43.95 las
190286 math98 20 0 1480620 867696 9532 S 756.1 3.5 21:54.07 las
|
|
|
|
|
|
#27 |
|
"Ed Hall"
Dec 2009
Adirondack Mtns
22×13×107 Posts |
Maybe my real trouble is a Motherboard sensor. Does temp2 look suspicious to anyone else?
|
|
|
|
|
|
#28 | |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
2×112×47 Posts |
Quote:
![]() For comparison, here are my sensors: Code:
chalsall@hobbit:~$ sensors
nouveau-pci-0100
Adapter: PCI adapter
fan1: 897 RPM
temp1: +48.0°C (high = +95.0°C, hyst = +3.0°C)
(crit = +105.0°C, hyst = +5.0°C)
(emerg = +135.0°C, hyst = +5.0°C)
acpitz-acpi-0
Adapter: ACPI interface
temp1: +16.8°C (crit = +20.8°C)
temp2: +27.8°C (crit = +119.0°C)
temp3: +29.8°C (crit = +119.0°C)
coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +64.0°C (high = +82.0°C, crit = +100.0°C)
Core 0: +61.0°C (high = +82.0°C, crit = +100.0°C)
Core 1: +60.0°C (high = +82.0°C, crit = +100.0°C)
Core 2: +59.0°C (high = +82.0°C, crit = +100.0°C)
Core 3: +60.0°C (high = +82.0°C, crit = +100.0°C)
Core 4: +59.0°C (high = +82.0°C, crit = +100.0°C)
Core 5: +62.0°C (high = +82.0°C, crit = +100.0°C)
nvme-pci-0b00
Adapter: PCI adapter
Composite: +35.9°C (low = -0.1°C, high = +74.8°C)
(crit = +79.8°C)
|
|
|
|
|
|
|
#29 |
|
"Ed Hall"
Dec 2009
Adirondack Mtns
22·13·107 Posts |
Even though I ran sudo sensors-detect, I've been suspicious of my readings, anyway. This is a thrown together dual processor 6c/12t each, machine. Here's my total sensors display, which is heavy with alarms:
Code:
$ sensors
w83627dhg-isa-0a10
Adapter: ISA adapter
Vcore: 728.00 mV (min = +0.00 V, max = +1.74 V)
in1: 712.00 mV (min = +0.68 V, max = +0.77 V)
AVCC: 3.34 V (min = +2.98 V, max = +3.63 V)
+3.3V: 3.34 V (min = +2.98 V, max = +3.63 V)
in4: 1.02 V (min = +0.51 V, max = +1.13 V)
in5: 712.00 mV (min = +1.28 V, max = +0.65 V) ALARM
in6: 712.00 mV (min = +1.04 V, max = +0.33 V) ALARM
3VSB: 3.39 V (min = +2.98 V, max = +3.63 V)
Vbat: 3.15 V (min = +2.70 V, max = +3.63 V)
fan1: 0 RPM (min = 285 RPM, div = 128) ALARM
fan2: 0 RPM (min = 239 RPM, div = 128) ALARM
fan3: 0 RPM (min = 659 RPM, div = 128) ALARM
fan5: 0 RPM (min = 10546 RPM, div = 128) ALARM
temp1: +48.0�C (high = -128.0�C, hyst = +87.0�C) sensor = thermistor
temp2: -88.0�C (high = +80.0�C, hyst = +75.0�C) sensor = thermistor
temp3: +43.0�C (high = +80.0�C, hyst = +75.0�C) sensor = thermistor
cpu0_vid: +0.000 V
intrusion0: ALARM
coretemp-isa-0000
Adapter: ISA adapter
Core 0: +36.0�C (high = +80.0�C, crit = +96.0�C)
Core 1: +29.0�C (high = +80.0�C, crit = +96.0�C)
Core 2: +33.0�C (high = +80.0�C, crit = +96.0�C)
Core 8: +32.0�C (high = +80.0�C, crit = +96.0�C)
Core 9: +34.0�C (high = +80.0�C, crit = +96.0�C)
Core 10: +35.0�C (high = +80.0�C, crit = +96.0�C)
w83795adg-i2c-0-2f
Adapter: SMBus I801 adapter at 0400
in0: 1.19 V (min = +0.67 V, max = +1.49 V)
in1: 1.18 V (min = +0.67 V, max = +1.49 V)
in2: 1.52 V (min = +1.35 V, max = +1.65 V)
in3: 1.53 V (min = +1.35 V, max = +1.65 V)
in4: 1.26 V (min = +1.13 V, max = +1.38 V)
in5: 1.26 V (min = +1.13 V, max = +1.38 V)
in6: 1.83 V (min = +1.63 V, max = +2.00 V)
in7: 1.47 V (min = +1.42 V, max = +1.53 V)
in11: 1.12 V (min = +1.48 V, max = +1.82 V) ALARM
+3.3V: 3.24 V (min = +2.96 V, max = +3.63 V)
3VSB: 3.26 V (min = +2.96 V, max = +3.63 V)
Vbat: 3.13 V (min = +2.70 V, max = +3.63 V)
fan1: 2566 RPM (min = 712 RPM)
fan2: 2495 RPM (min = 712 RPM)
fan3: 0 RPM (min = 712 RPM) ALARM
fan4: 4838 RPM (min = 712 RPM)
fan5: 0 RPM (min = 712 RPM) ALARM
fan6: 3040 RPM (min = 712 RPM)
temp1: +14.2�C (high = +127.0�C, hyst = +127.0�C)
(crit = +127.0�C, hyst = +127.0�C) sensor = thermal diode
temp5: +10.0�C (high = +127.0�C, hyst = +127.0�C)
(crit = +75.0�C, hyst = +70.0�C) sensor = thermistor
temp7: +40.2�C (high = +95.0�C, hyst = +92.0�C)
(crit = +95.0�C, hyst = +92.0�C) sensor = Intel PECI
temp8: +43.0�C (high = +95.0�C, hyst = +92.0�C)
(crit = +95.0�C, hyst = +92.0�C) sensor = Intel PECI
intrusion0: OK
beep_enable: enabled
coretemp-isa-0001
Adapter: ISA adapter
Core 0: +37.0�C (high = +80.0�C, crit = +96.0�C)
Core 1: +32.0�C (high = +80.0�C, crit = +96.0�C)
Core 2: +30.0�C (high = +80.0�C, crit = +96.0�C)
Core 8: +34.0�C (high = +80.0�C, crit = +96.0�C)
Core 9: +39.0�C (high = +80.0�C, crit = +96.0�C)
Core 10: +39.0�C (high = +80.0�C, crit = +96.0�C)
|
|
|
|
|
|
#30 | |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
2×112×47 Posts |
Quote:
I'd suggest you spend some "quality time" down in the BIOS interface, seeing if you're actually running "in spec". Perhaps a "reset" and/or a BIOS upgrade (if available) would be in order. Also, I have found some situations where sensors-detect doesn't get all the values correct. |
|
|
|
|
|
|
#31 | |
|
"Ed Hall"
Dec 2009
Adirondack Mtns
22×13×107 Posts |
Quote:
|
|
|
|
|
|
|
#32 |
|
"/X\(‘-‘)/X\"
Jan 2013
https://pedan.tech/
24·199 Posts |
Could also be condensation. Do you see any when you look at the system, around the time of day of the restarts?
|
|
|
|
|
|
#33 | |
|
"Ed Hall"
Dec 2009
Adirondack Mtns
22×13×107 Posts |
Quote:
* The systems are configured to power up after AC loss, but I hesitate to use that as a reset. |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| How cold is it? | petrw1 | Lounge | 14 | 2015-01-19 03:22 |
| Julian Schwinger and Cold Fusion | ewmayer | Science & Technology | 1 | 2014-01-24 08:48 |
| Hot and cold running crud | xilman | Science & Technology | 2 | 2013-04-12 00:49 |
| Warming cold ∞ | xilman | Lounge | 7 | 2013-01-21 20:38 |
| Cold Fusion? Is it possible? | Fusion_power | Lounge | 3 | 2003-08-19 01:13 |