![]() |
[QUOTE=James Heinrich;478016][quote]In order to keep our services ... reliable ... Your server may be unavailable for 6 to 8 hours while the maintenance is taking place. [/quote][/QUOTE]Haha, like Alice in Wonderland. To keep it reliable we have to make it unreliable. :loco:
|
[QUOTE=retina;478019]Haha, like Alice in Wonderland. To keep it reliable we have to make it unreliable. :loco:[/QUOTE]
Reliable != available. Reliable is much closer in meaning to "predictable" in this case, and this is far more predictable than not performing the maintenance. |
Mersenne work distribution has about 5,000 assignments in the 4xM ranges.
Gpu72 has about 9,500 |
[QUOTE=chalsall;473139]So, one of the drives on the GPU72 server is failing.
It is scheduled to be replaced in the next hour or so. It's "hot-swappable", and one of the drives in a RAID1 set, so there should be no downtime.[/QUOTE]And now mersenne.ca has the same problem. [code]A Fail event had been detected on md device /dev/md3. It could be related to component device /dev/sdb3. A Fail event had been detected on md device /dev/md1. It could be related to component device /dev/sdb1.[/code]Likewise, the drive is supposed to be hot-swapped within the next few hours. No surprise, the drives in my server are also Seagate. And yet a quick SMART check passed on both drives :unsure: [code]=== START OF INFORMATION SECTION === Model Family: Seagate Constellation CS Device Model: ST1000NC000-1CX162 Serial Number: Z1D7MCPW LU WWN Device Id: 5 000c50 05d012139 Firmware Version: CE02 User Capacity: 1,000,204,886,016 bytes [1.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Fri Apr 13 14:20:25 2018 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED === START OF INFORMATION SECTION === Model Family: Seagate Constellation CS Device Model: ST1000NC000-1CX162 Serial Number: Z1DAZ0ZY LU WWN Device Id: 5 000c50 0669d8a73 Firmware Version: CE02 User Capacity: 1,000,204,886,016 bytes [1.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Fri Apr 13 14:19:28 2018 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED[/code] |
[QUOTE=James Heinrich;485248]And now mersenne.ca has the same problem. [code]A Fail event had been detected on md device /dev/md3.
It could be related to component device /dev/sdb3. A Fail event had been detected on md device /dev/md1. It could be related to component device /dev/sdb1.[/code][/QUOTE] VERY sorry to hear that. The good news is the failures were both reported from the same physical drive in the pair (/dev/sdb) so you hopefully won't need to do a full restore from backups. [QUOTE=James Heinrich;485248]Likewise, the drive is supposed to be hot-swapped within the next few hours. No surprise, the drives in my server are also Seagate. And yet a quick SMART check passed on both drives :unsure:[/QUOTE] You will still have to reboot your server to bring the hot-swapped drive back into the "correct" logical location. After the hot-swap the new drive will likely appear as /dev/sdc; you can then work with it (setting up the partition table, etc). One last thing... Don't trust the SMART short test; always periodically run the long. Oh, also... My previously documented experience taught me that minimizing CPU usage and HD writes helped a great deal with the RAID rebuild time. |
Just received accessing GPU72
[B][SIZE=2]FYI only Chris. It resolved within a minute.[/SIZE]
[/B] [B]Error 1001 Ray ID: 410b6b51e2235534 • 2018-04-24 20:46:25 UTC [/B] [B]DNS resolution error[/B] [B]What happened?[/B] You've requested a page on a website ([url]www.gpu72.com[/url]) that is on the [URL="https://www.cloudflare.com/5xx-error-landing?utm_source=error_100x"]Cloudflare[/URL] network. Cloudflare is currently unable to resolve your requested domain ([url]www.gpu72.com[/url]). There are two potential causes of this: [LIST][*][B]Most likely:[/B] if the owner just signed up for Cloudflare it can take a few minutes for the website's information to be distributed to our global network.[*][B]Less likely:[/B] something is wrong with this site's configuration. Usually this happens when accounts have been signed up with a partner organization (e.g., a hosting provider) and the provider's DNS fails.[/LIST] |
[QUOTE=kladner;486125]You've requested a page on a website ([url]www.gpu72.com[/url]) that is on the [URL="https://www.cloudflare.com/5xx-error-landing?utm_source=error_100x"]Cloudflare[/URL] network.[/QUOTE]
Hmmm... Something's odd there. GPU72.com is NOT on Cloudflare; DNS should resolve to 74.208.74.21, which is a 1&1 dedicated server. Possibly someone is trying a DNS poisoning attack (although I can't imagine why). Is anyone else seeing this? And please ensure your browser doesn't give any SSL Cert warnings when you try to log in. |
There was a BGP attack against Route53 this morning.
[url]https://www.theregister.co.uk/2018/04/24/myetherwallet_dns_hijack/[/url] It's possible related shenanigans are afoot. |
Are we short on workers doing trial factoring at the LL wavefront?
I've noticed that a lot of exponents are being assigned for P-1 just days after being factored to 76 bits. I also haven't seen that many TF assignments at the LL wavefront turned in lately. |
[QUOTE=ixfd64;487911]Are we short on workers doing trial factoring at the LL wavefront?[/QUOTE]
There are 50,000 LL assignments available in the 80-90M range that have been TF-ed to 75 or 76bits so no need to worry :smile:. |
[QUOTE=ixfd64;487911]Are we short on workers doing trial factoring at the LL wavefront? I've noticed that a lot of exponents are being assigned for P-1 just days after being factored to 76 bits. I also haven't seen that many TF assignments at the LL wavefront turned in lately.[/QUOTE]
Sorry for the latency in the reply; been busy doing a CFD analysis. Yes, we are a bit low in LLTF resources at the moment. One of our biggest contributors has dropped off his throughput by about 70% because he's had to reallocate some of his resources. We should still be OK in that we're far ahead of the LL'ing wavefront. But we might have to start giving the P-1'ers work "only" TF'ed to 75, and then take them up to 76 later, before letting the LL'ers at them. But if anyone has any resources they could allocate to LLTF'ing (preferably to at least 75 bits) it would be appreciated. |
| All times are UTC. The time now is 23:13. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.