mersenneforum.org Intel Xeon PHI?
 Register FAQ Search Today's Posts Mark Forums Read

 2020-11-18, 23:48 #67 ewmayer ∂2ω=0     Sep 2002 República de California 265548 Posts I prefer not having to online-register for anything if it can be avoided. Created boot disk using CentOS iso file (CentOS-8.2.2004-x86_64-minimal.iso, DLed from one of the official mirrors), plugged into the KNL and booted from it using the "install" option, got through a bunch of steps with a green "[ OK ]" in the terminal output, then hit a bunch of repeated warnings of form [time-since-boot] dracut-initqueue[3265]: Warning: dracut-initqueue timeout - starting timeout scripts After a few minutes of that repeating, got Code:  Starting Setup Virtual Console... [ OK ] Started Setup Virtual Console. Starting Dracut Emergency Shell... Warning: /dev/root does not exist Generating "/run/initramfs/rdsosreport.txt" Entering emergency mode. Exit the shell to continue. Type "journalctl" to view system logs. You might want to save "/run/initramfs/rdsosreport.txt" to a USB stick or /boot after mounting them and attach it to a bug report. dracut:/# The journalctl command emulates Linux "more", which is unavailable in this emergency-mode shell environment. Some highlights - note this appears in fact to be a 68-physical-core (4 logical CPUs per physical one, thus 272) system like kriesel's, not the 64-core one I thought I was buying (not that I'm complaining, mind you :) - with a few annotations by me in []: Code: smpboot: Allowing 272 CPUs, 0 hotplug CPUs ... Booting paravirtualized kernel on bare hardware. ... Kernel command line: BOOT_IMAGE=vmlinux [stuff about the CentOS iso] quiet Specific versions of hardware are certified with Red Hat Enterprise Linux 8. Please see the list of hardware are certified with Red Hat Ent[line cuts off] ... x86: Booting SMP configuration: ... smp: brought up 1 node, 272 CPUs ... ACPI FADT declares the system doesn't support PCIe ASPM, so disable it ACPI: bus type PCI registered ... ACPI: [Firmware Bug]: BIOS _OSI(Linux) query ignored [bunch of pci-init stuff] SCSI subsystem initialized ... can't derive routing for PCI INT D PCI INT D: not connected ... New USB device found [stuff re. Linux boot image] ... can't derive routing for PCI INT D PCI INT D: no GSI ... [stuff about loading CentOS kernel signing key] ... [this is in bright red font]usb 3-8: device descriptor read/64, error -110 ... igb: Intel(R) Gigabit Ethernet Network Driver - version 5.6.0-k ... sd 3:0:0:0: [sda] Attached SCSI disk [preceding line confirm 1TB, i.e. the SSD I bought gor this build] Then we get the repeating "dracut" timeout warnings I mentioned above, ending in "Warning: Could not boot." and the Emergency-shell stuff.
 2020-11-19, 02:57 #68 paulunderwood     Sep 2002 Database er0rr 71208 Posts This person claims to have "fixed the problem" https://forums.centos.org/viewtopic.php?t=63043 by running dracut -f in rescue(?) mode. He might have meant emergency mode, https://linux.die.net/man/8/dracut Another solution seems to be to enter "exit" at the dracut prompt. This might be a safer option to try firstly. Last fiddled with by paulunderwood on 2020-11-19 at 03:22
 2020-11-19, 03:57 #69 ewmayer ∂2ω=0     Sep 2002 República de California 265548 Posts @paul: Thanks for digging that out. Interestingly, it proved unnecessary - I powered the system back up with the boot USB plugged in, waited the several minutes this system needs to run all its BIOS stuff, then hit at the SuperMicro boot screen. Now for the new part - the first time I did the above, the boot menu listed 3 items correspondng to the boot USB: 'boot general' and under that 'boot partition 1' and 'boot partition 2'. I recalled that after creating the boot USB, dd copied the iso-image to partition 1, but on try #1 I just hit 'boot general'. This time I selected 'boot partition 1' and everything worked, root/user-info all entered and it's copying files and configuring the kernel as I write this. Fingers crossed, time to get dinner and go offline for the evening. Update tomorrow.
2020-11-19, 06:47   #70
axn

Jun 2003

32×19×29 Posts

Quote:
 Originally Posted by Xyzzy Most likely you need to remove quiet from the kernel command line. We are not sure how to do this with Ubuntu but we know it is possible. Once that is done you can see the boot message log to determine where the failure is.

EDIT:- Obviously you need to boot first to do this :-(

Last fiddled with by axn on 2020-11-19 at 06:48

2020-11-19, 11:44   #71
xilman
Bamboozled!

"𒉺𒌌𒇷𒆷𒀭"
May 2003
Down not across

11×971 Posts

Quote:
 Originally Posted by axn https://askubuntu.com/questions/4778...n-quiet-splash EDIT:- Obviously you need to boot first to do this :-(
Yes, but not necessarily the system you wish to boot.

Extract the boot disk and plug into another Linux box which boots properly. Mount the added disk somewhere, edit to your heart's content, then do a clean shutdown, followed by re-inserting the disk into its own machine.

Done this several times when systems won't boot from their own disk and I don't have a convenient bootable CD/DVD/memstick to hand.

2020-11-19, 14:24   #72
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22×3×52×17 Posts

Quote:
 Originally Posted by ewmayer waited the several minutes this system needs to run all its BIOS stuff, then hit at the SuperMicro boot screen.
Some observations here:
1) Sometimes mine seems to get stuck during the BIOS initialization, requiring a power cycle to try again.
2) The sequence is interminable but the time window for F11, F12, or DEL to select options from the white SuperMicro boot screen is brief. It would be nice to be able to shorten the one or lengthen the other.
3) Haven't experimented with BIOS settings to possibly skip / disable some portions of the initialization.
4) The BIOS seems to support a commercial-size-kitchen-sink set of approaches. Disabling the unused ones might provide a considerable startup speedup, if possible, by eliminating timeout periods for things that ain't gonna happen (IPMI IP# issuance for example).
5) Jumper changes are another possibility. BMC disable.
6) One more way my system is wired oddly; documentation for the motherboard indicates the two adjacent RJ45 jacks are regular LAN ports, but if the one nearer the USB (#7 in fig 5-2 of the manual found online) is connected, DHCP fills in an IPMI IP# (remote console via IP), instead of providing LAN connectivity.

Some good news is checking prime.log and the worker windows of prime95 shows no sign of errors detected, in the 17 workers' 58.3M-59.1M LL DC progress, to 31-35% each so far and a few Jacobi checks each. These should all complete by about month's end.

Last fiddled with by kriesel on 2020-11-19 at 14:26

 2020-11-19, 20:56 #73 ewmayer ∂2ω=0     Sep 2002 República de California 22×32×17×19 Posts Install last night finished successfully and after reboot from the installation-on-SSD we got a login prompt, but in basic-terminal mode ... must've overlooked whatever option is needed to install the windowing system in the initial-install screens. So did 'shutdown -h now', just now replugged in the boot USB ... and the system won't power up. Note, the front-panel power button on this is mis-connected, if it's connected at all - have always simply needed to unplug/replug the power cord in the back to turn it off/on, until successful CentOS install last night at least gave us the 'shutdown -h now' option for the 'off' part. Grrr ... no time right now to poke around in the damn case and wiretrace and whatnot. I was hoping to be building and testing code on this sucker by now ... annoying as hell. Last fiddled with by ewmayer on 2020-11-19 at 20:57
2020-11-19, 21:48   #74
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22·3·52·17 Posts

Quote:
 Originally Posted by ewmayer ... and the system won't power up. Note, the front-panel power button on this is mis-connected, if it's connected at all - have always simply needed to unplug/replug the power cord in the back to turn it off/on, until successful CentOS install last night at least gave us the 'shutdown -h now' option for the 'off' part. Grrr ... no time right now to poke around in the damn case and wiretrace and whatnot. I was hoping to be building and testing code on this sucker by now ... annoying as hell.
Unplug, open case, check JF1 by the main motherboard 24-pin power connector. I had the same won't-start issue after a Windows shutdown -s. But -r (restart) was not typically a problem. It certainly created a sinking feeling. And after that the box was completely unresponsive including to power strip cycling; no POST, no signal at the VGA, most onboard LEDs stayed off, radiator fans did not spin up to mini-propjet-takeoff-sound-level initially, as they previously had. After a while I managed to recover mine by putting a separate power button on JF1 temporarily. Then reconnect power, push the added button, go to BIOS settings, change Power On behavior, from Last State, to Always Start, to reduce the occurrence of repeats.
It's a catch22; system won't restart with power because its last state was off, because that's what the user told it to be through the OS; can't turn it on because the case power switch is not connected; can't change the BIOS because it won't turn on, because...

This case has a sort of "secret compartment". One side panel removed gives a view of cpu, pcie, motherboard-top-side etc and 3.5" drive mountings. The opposite side removed gives 2.5" drive mountings and a view of some cabling. In mine I found an unconnected end of the power sw, reset sw, power LED, drive LED cable in that "secret compartment". So you might be able to activate the front power button by fishing that out and connecting "Power sw" at JF1 pins 1 & 2, if you don't have a spare switches/LEDs cable assembly in your parts drawer.

This thing is like a sports car (or some partners I've known). Expect some cost/pain as the price of the interesting or fun times.
I suspect these were set up to run as part of a server farm, in rows of warehouse style welded-wire racks, with Unit ID LEDs and remote management enabled, local intervention disabled.

Congrats on getting over the OS install hurdles thus far, and thanks for confirming the hardware obstinacy as shipped. I'm wondering how minimal is CentOS -minimal, at 1.6GB. GUI included? The other iso is 7.7GB, too big for a DVD.

FYI prime95 benchmark results on Win10 can be found at https://www.mersenneforum.org/showth...304#post563304. A large screen or a magnifier or some serious zoom may be useful.

Last fiddled with by kriesel on 2020-11-19 at 22:21

2020-11-20, 20:13   #75
ewmayer
2ω=0

Sep 2002
República de California

2D6C16 Posts

@above: Thanks, ken, you're a lifesaver! (Or at least a major time saver).

OK, removed the sidepanel under bottom side of mobo, found the dangling power+led connector bundle. The 2-pin power one has no color coding, all black, no +- polarity marking, just "Power SW" label on one side. I found a downloadable mobo PDF at the Manualslib site I linked to in html-manual form previously, and have attached a snip of the mobo section in question. Say I'm looking at the mobo at the same orientation - big 24-pin main power plug at bottom, smaller 12-pin jumper array above it, 2 power pins at far [strike]left[strike]right end of the latter. Do I want to hook the 2-pin power-button connector to the latter pair of pins so that the "Power SW" label faces leftward or rightward?
Attached Files
 5038ki_pwrconnects.pdf (215.7 KB, 39 views) 5038ki_ctrlpanel.pdf (80.5 KB, 38 views)

Last fiddled with by ewmayer on 2020-11-21 at 03:23

 2020-11-20, 21:35 #76 Nick     Dec 2012 The Netherlands 7·239 Posts Is there no CLR CMOS jumper on the mobo any more these days?
2020-11-20, 21:38   #77
PhilF

Feb 2005

10011010002 Posts

Quote:
 Originally Posted by ewmayer Do I want to hook the 2-pin power-button connector to the latter pair of pins so that the "Power SW" label faces leftward or rightward?
For the power switch the polarity doesn't matter.

 Similar Threads Thread Thread Starter Forum Replies Last Post dtripp Software 3 2013-02-19 20:20 nucleon Hardware 2 2012-05-10 23:53 R.D. Silverman Programming 19 2011-09-17 01:43 mack Information & Answers 7 2009-09-13 01:48 penguain NFSNET Discussion 0 2006-06-12 01:31

All times are UTC. The time now is 05:19.

Sat May 8 05:19:25 UTC 2021 up 30 days, 17 secs, 0 users, load averages: 2.82, 2.68, 2.78