![]() |
|
|
#45 |
|
"/X\(‘-‘)/X\"
Jan 2013
22·733 Posts |
There are five total zones in us-east-1. Our primary account has access to all five since it was made long ago. Our other accounts don't get access to all the zones. I could probably pester them but we haven't had a pressing need.
|
|
|
|
|
|
#46 |
|
Sep 2003
5×11×47 Posts |
Has anyone ever tried out Google Cloud or Microsoft Azure?
I took a brief look at the documentation for Google Compute Engine, it says that spot instances ("preemptible instances") are automatically "terminated" after they run for 24 hours. However, Google's definition of a "terminated" instance is really a "stopped" instance; it can be restarted (unless it has a local SSD device attached). So I guess you could just keep restarting it every day... Last fiddled with by GP2 on 2016-05-27 at 04:52 |
|
|
|
|
|
#47 |
|
"Patrik Johansson"
Aug 2002
Uppsala, Sweden
52·17 Posts |
I just saw this discussion and wanted to tell you that I made a script last year that successfully attached an EBS volume to a persistent spot instance request. (Persistant means that, if it has terminated due to the spot price going above the maximum price you have given, it automatically lauches again when the price goes below that level again.)
I have prepared an EBS volume with an mprime executable and configured with my primenet user-id and preferences. I made a modified AMI with /mnt/xvda created and the following lines added to /etc/rc.local, (which on this linux system is the a that executes during boot): Code:
# Attach volume with mprime and run /home/ec2-user/start_mprime.sh Code:
#! /bin/sh # # Attach volume with partial result and continue work with mprime # # Attach volume to only instance (script only works with one) /home/ec2-user/attach_vol.sh # Wait for device to appear while [ ! -e /dev/xvdf ] do sleep 1 done # Sleep 10 extra seconds just to be safe sleep 10 # Mount disk mount /dev/xvdf /mnt/xvdf # Wait if mount command is not blocking (didn't check) while [ ! -e /mnt/xvdf/mprime ] do sleep 1 done # Start mprime nohup /mnt/xvdf/mprime/mprime -d >> /mnt/xvdf/mprime/log.txt & Code:
aws ec2 attach-volume --volume-id vol-[my-id-masked] --instance-id `/home/ec2-user/instance_id.sh` --device /dev/xvdf Code:
#! /bin/sh
aws ec2 describe-instances --filters Name=instance-state-name,Values=running | grep "InstanceId" | awk '{print $2}' | sed -e s/\"//g -e s/,//
Btw, welcome back, GP2! Last fiddled with by patrik on 2016-07-06 at 10:16 Reason: Welcome GP2 |
|
|
|
|
|
#48 | |||
|
Sep 2003
258510 Posts |
Quote:
If there is enough interest, perhaps a Cloud Computing sub-forum could be created. Are you currently still using Amazon? There is a better solution available now: in just the past few days, Amazon began general availability of the EFS (Elastic File System), which is a variation of the standard NFS networked file system. It's currently only available in us-east-1 (N. Virginia), us-west-2 (Oregon) and eu-west-1 (Ireland), but presumably they will soon deploy it more widely. With EFS you no longer need to allocate a separate do-not-delete-on-termination EBS volume, with fixed 1 GB allocation. Rather, EFS just grows automatically, as necessary, and it lets you store all the work directories (with the worktodo.txt and save files) of all the instances as sibling subdirectories. The mprime executable and configuration files (prime.txt and local.txt) also live on your EFS filesystem. All the availability zones within a single region can share the same EFS filesystem. I wrote a user-data (startup) script that lets newly-launched instances automatically locate orphaned subdirectories (of instances that were terminated for whatever reason) and resume the work. I can share if anyone's interested. This uses the User Data field that is filled in when the instance is configured and launched, so you can just use the standard Amazon AMI rather than creating one of your own just to modify the operating system startup scripts. This allows for any number of simultaneous instances, and multiple instance types (c4.large, c4.xlarge, etc). Quote:
I do use describe-instances in my user-data startup script, to discover all the other running instances, so I can figure out which EFS subdirectories are orphaned. Each subdirectory has a name which matches the instance id that created it, so if that instance is no longer running then the subdirectory is orphaned and a newly-launched instance can take over its worktodo and save files. Note that your IAM role has to grant permission to run describe-instances. Quote:
Thanks. By the way, there was an amusing case where you did both the original first-time LL test and the double check, eight years apart: M35062633. I think the exponent was randomly assigned both times. I've also seen this happen with Curtis C. as well. |
|||
|
|
|
|
|
#49 |
|
Sep 2003
50318 Posts |
|
|
|
|
|
|
#50 |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
260216 Posts |
Just wondering... How do you avoid race-conditions in the case two or more instances come back online at the same instant? File-system lock files? Or do you manually launch instances so this isn't an issue?
|
|
|
|
|
|
#51 | |
|
Sep 2003
5×11×47 Posts |
Quote:
I'm no expert on handling race conditions, so any critiques would be welcome. Each subdirectory has the same name as the instance id that created it. When a new instance launches, it first creates a tmp file that contains a list (one per line) of all orphaned subdirectories, i.e., subdirectories with names starting with "i-*" that don't correspond to any running instances. If orphaned subdirectories exist, it renames one of them to its own instance id, and then it will take over and resume any pending worktodo and save files. Here's how I try to handle race conditions: the script reads the list of orphaned subdirectories one line at a time. For each line, it attempts to rename the orphaned subdirectory to its own instance id, and then checks for the existence of the renamed subdirectory. Normally it will succeed on the first line and break out of the loop, but if there's a race condition another instance might have renamed that subdirectory in the meantime, so the rename will have failed. It then continues looping to the next line and tries that one, and so forth. If it reaches end of file without finding any suitable orphaned subdirectory to rename (or perhaps the list was empty to begin with), then it simply creates a new subdirectory named after its own instance id. Note: the IAM role that your instances run under must include a policy that allows describe-instances. Code:
availability_zone=$(curl http://169.254.169.254/latest/meta-data/placement/availability-zone)
region=$(echo -n ${availability_zone} | sed 's/[a-z]$//')
all_subdirs_tmpfile=$(mktemp)
ls -d -1 i-* > ${all_subdirs_tmpfile}
all_instances_tmpfile=$(mktemp)
#Make sure to filter out recently terminated instances, which otherwise remain for up to one hour
aws ec2 describe-instances --region=${region} --output=text --query 'Reservations[*].Instances[*].InstanceId' --filters "Name=instance-state-name,Values=running" | sed 's/\t/\n/g' | sort > ${all_instances_tmpfile}
orphaned_subdirs_tmpfile=$(mktemp)
comm -2 -3 ${all_subdirs_tmpfile} ${all_instances_tmpfile} > ${orphaned_subdirs_tmpfile}
instance_id=$(curl http://169.254.169.254/latest/meta-data/instance-id)
while read -r line; do
mv ${line} ${instance_id}
if [ -d ${instance_id} ]; then
break
fi
done < ${orphaned_subdirs_tmpfile}
if [ ! -d ${instance_id} ]; then
mkdir ${instance_id}
fi
|
|
|
|
|
|
|
#52 |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
2×5×7×139 Posts |
|
|
|
|
|
|
#53 |
|
Serpentine Vermin Jar
Jul 2014
CEF16 Posts |
One other technique that might be helpful is to randomize a delay before starting the process, just in case two systems started up at the exact time. This is somewhat common in recovery situations where race conditions (or power surges if we're talking hardware, or bandwidth surges when things are scheduled at the same time) are a possibility.
|
|
|
|
|
|
#54 |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
260216 Posts |
This doesn't actually eliminate the race condition, just makes it less likely. What GP2 has implemented should be deterministically sane (again, assuming renames are atomic; since EFS is based on NFSv4.1, I believe it is).
|
|
|
|
|
|
#55 | |
|
Sep 2003
5·11·47 Posts |
Quote:
So if the virtual machine is ever rebooted for any reason, all the stuff in the user-data startup script won't get done, including running mprime. The virtual machine will just sit there doing nothing, but of course the instance is still running and billable. Normally, this isn't a problem, since you would rarely need to reboot your virtual machine, and if you ever did you could just manually re-run the user-data script commands. However, one of my instances spontaneously rebooted today for an unknown reason, and I didn't discover that until ten hours later. It might have been hardware related, because when I clicked on it in the EC2 console it displayed "Retiring: This instance is scheduled for retirement after [date about two weeks from now]". So I just terminated it and the spot fleet request automatically launched a new instance. The moral of the story is that you have to monitor: just because your instance is up and running doesn't mean that mprime is up and running. With EFS it's easy to just do an "ls -lt" command periodically to check the last-modified dates of the sibling subdirectories that contain worktodo and save files, because any subdirectory that's older than half an hour (i.e. the DiskWriteTime interval) means that mprime isn't writing save files, so it probably isn't running. For traditional setups that don't use EFS (which is currently only available in us-east-1, us-west-2 and eu-west-1 regions), I'm not sure what would be the best solution to ensure that mprime is still running. Maybe screen-scrape http://www.mersenne.org/workload/ and check to make sure the percentage-complete statistic keeps incrementing daily. Or you could theoretically go to http://www.mersenne.org/cpus/ and click on each registered computer, and then set the Email option to send an email if the computer is late contacting the PrimeNet server, but that's completely impractical if you're dealing with ephemeral virtual machines rather than physical hardware, especially spot instances that can get terminated at any time, and then you'd get bogus e-mails when the terminated instances no longer talk to PrimeNet. |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Did Amazon just join GIMPS? | ixfd64 | Lounge | 20 | 2018-04-24 06:53 |
| How about using Amazon's hardware instead? | GP2 | Cloud Computing | 154 | 2017-03-29 16:02 |
| Amazon Cloud Outrage | kladner | Science & Technology | 7 | 2017-03-02 14:18 |
| doing large NFS jobs on Amazon EC2? | ixfd64 | Factoring | 3 | 2012-06-06 08:27 |
| Amazon is a greedy bastard of a company. | jasong | jasong | 14 | 2007-12-13 21:02 |