mersenneforum.org Strange behaviour
 Register FAQ Search Today's Posts Mark Forums Read

 2017-07-24, 16:17 #1 ET_ Banned     "Luigi" Aug 2002 Team Italia 2×41×59 Posts Strange behaviour I created a spot instance of type c4.large (2 vcpu) on eu-west-1 and I got a E5-2666@2.9GHz running 1-threaded llr at 0.373 msec/bit using zero padded FMA3 FFT. I then created another spot instance of type c4.xlarge (4vcpu) on us-west-1 and I got a E5-2666@2.9GHz running 2-threaded llr at 0.360 msec/bit using zero padded AVX FFT. The software was identical, and downloaded from the same link. The type of the processor comes from /proc/cpuid. The frequency was 2.9 GHz on both servers. The data was the same stuff. Why did I get a slower machine? I wasted 3 hours to configure and tune everyting... I guess it's the karma telling me not to compulsively waste my money on cloud servers. Last fiddled with by ET_ on 2017-07-24 at 16:20
 2017-07-24, 18:48 #2 GP2     Sep 2003 5·11·47 Posts As you mentioned, c4.large is one core (2 vCPUs = two hyperthreads), and c4.xlarge is two cores (4 vCPUs = four hyperthreads) The c4 servers are Haswell, so they have FMA and AVX2. You can verify that by looking for "fma" and "avx2" in the output of the command grep flags /proc/cpuinfo The servers all run the same custom chip at the same 2.9 GHz clock frequency. Of course these are virtual servers sharing an 18-core physical server with other AWS users, so maybe they are competing for cache usage. But usually benchmarks are fairly consistent, it's very unlikely that you could get a machine that was somehow slowed down so much. It's hard to know more without knowing exactly how the program behaves and what flags it was compiled with. PS, Three hours on a c4.xlarge costs not much more than 10 cents in total... keep trying Last fiddled with by GP2 on 2017-07-24 at 18:50
2017-07-24, 19:33   #3
chalsall
If I May

"Chris Halsall"
Sep 2002

22×2,551 Posts

Quote:
 Originally Posted by GP2 Of course these are virtual servers sharing an 18-core physical server with other AWS users, so maybe they are competing for cache usage. But usually benchmarks are fairly consistent, it's very unlikely that you could get a machine that was somehow slowed down so much.
This is why I tend to rent instances which consume most of the resources of the machine. Then if an instance isn't preforming well, I shut it down and spin up another one.

Rince; repeat. Manage the situation.

2017-07-24, 20:35   #4
ET_
Banned

"Luigi"
Aug 2002
Team Italia

2×41×59 Posts

Quote:
 Originally Posted by GP2 have FMA and AVX2. You can verify that by looking for "fma" and "avx2" in the output of the command grep flags /proc/cpuinfo It's hard to know more without knowing exactly how the program behaves and what flags it was compiled with. PS, Three hours on a c4.xlarge costs not much more than 10 cents in total... keep trying
I will keep trying and follow chalsall hint
The issue I was pointing out is not about the timing: both instances (c4.large and c4.xlarge) had the same flags on /proc/cpuinfo (obviously the c4.xlarge had 4 (virtual) cores instead of just 2), the program (llr statically linked) was downloaded in both cases from Jean Penné site, same link. And still, on th c4.large instance the FMA3 code was activated, while on the c4.xlarge instance the AVX code was activated.

I will try a new instance (as chalsall says) and this time I will keep track of every testing log. I just thought it had happened before to someone else.

Will keep you informed.

Luigi

 2017-07-24, 20:54 #5 ET_ Banned     "Luigi" Aug 2002 Team Italia 2×41×59 Posts The content of /proc/cpuinfo on the c4.xlarge instance: Code: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 63 model name : Intel(R) Xeon(R) CPU E5-2666 v3 @ 2.90GHz stepping : 2 microcode : 0x25 cpu MHz : 2900.105 cache size : 25600 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt bugs : bogomips : 5800.22 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: As you can see, avx2 is present. Quering sllr 3.8.10 program (based on gwnum 28.x): Code: ./sllr -m Main Menu 1. Test/Input Data 2. Test/Continue 3. Test/Exit 4. Options/CPU 5. Options/Preferences 6. Advanced/Priority 7. Help/About Your choice: 4 CPU Information: Intel(R) Xeon(R) CPU E5-2666 v3 @ 2.90GHz CPU speed: 2900.15 MHz CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE, SSE2, AVX, FMA3 L1 cache size: 32 KB L2 cache size: 256 KB The program recognizes the processor and its FMA3 instructions. But when I run the program: Code: ./sllr -d -t2 800k_900k.txt Resuming probable prime test of 1600122*2^859433-1600121 at bit 152154 [17.70%] Using zero-padded AVX FFT length 84K, Pass1=448, Pass2=192, 2 threads, a = 3 ^C600122*2^859433-1600121, bit: 210000 / 859454 [24.43%]. Time per bit: 0.337 ms. To make a comparison, the other instance (a c4.large) runs as follow: Code: Starting probable prime test of 1497090*2^859433-1497089 Using zero-padded FMA3 FFT length 84K, Pass1=448, Pass2=192, a = 3 1497090*2^859433-1497089, bit: 130000 / 859454 [15.12%]. Time per bit: 0.372 ms. I also used the same input file on both instances, with the same results. I close-terminated the spot instance, will recreate a new one tomorrow. Let's hope it will work! Luigi Last fiddled with by ET_ on 2017-07-24 at 20:57
 2017-07-25, 18:25 #6 ET_ Banned     "Luigi" Aug 2002 Team Italia 2·41·59 Posts I keep getting connection reset from Amazon on the Ohio instance, and I can't get rid of the instance itself: when I terminate it, it gets automatically restarted, while when it gets disconnected (broken pipe), it restarts automatically and stays idle. I hate losing time to analyze this... :-) Luigi ---
2017-07-25, 19:08   #7
Mark Rose

"/X\(‘-‘)/X\"
Jan 2013

3×11×89 Posts

Quote:
 Originally Posted by ET_ I keep getting connection reset from Amazon on the Ohio instance, and I can't get rid of the instance itself: when I terminate it, it gets automatically restarted, while when it gets disconnected (broken pipe), it restarts automatically and stays idle. I hate losing time to analyze this... :-) Luigi ---
Check if you have any Auto Scaling Groups or open Spot Requests.

2017-07-25, 21:36   #8
ET_
Banned

"Luigi"
Aug 2002
Team Italia

2·41·59 Posts

Quote:
 Originally Posted by Mark Rose Check if you have any Auto Scaling Groups or open Spot Requests.
I have another spot request on a different zone (eu-west-1), I don't think it should care... and no auto scaling group

2017-07-26, 00:06   #9
chalsall
If I May

"Chris Halsall"
Sep 2002

22·2,551 Posts

Quote:
 Originally Posted by ET_ I have another spot request on a different zone (eu-west-1), I don't think it should care... and no auto scaling group
Just quickly throwing out some ideas:

1. Are you absolutely sure that you didn't set your spot request in ohio to be persistent?

2. When you say the connection resets, are you sure the instance terminated? Have you checked the uptime after reconnecting?

3. When you say it restarts when you terminate the instance, have you cancelled the instance request (this would be related to point 1 above)? Although related, instances and instance *requests* are separate records in AWS' knowledge base.

Beyond that, I don't have a clue what to suggest, except possibly contacting AWS' support team. Non-zero probably you've encountered a bug.

Personally I have never had a spot instance re-start except when explicitly requested.

2017-07-26, 00:34   #10
GP2

Sep 2003

5·11·47 Posts

Quote:
 Originally Posted by ET_ I keep getting connection reset from Amazon on the Ohio instance, and I can't get rid of the instance itself: when I terminate it, it gets automatically restarted, while when it gets disconnected (broken pipe), it restarts automatically and stays idle.
If you are starting the program in the user-data script, then you can simply do something like:

Code:
./mprime > /dev/null 2>&1 &
(substitute your own program for "mprime")

The program will run outside of any terminal.

If you are starting it from an SSH terminal window, then you should use nohup:

Code:
nohup ./mprime > /dev/null 2>&1 &
The program will then keep running even if your terminal disconnects (unless the disconnection itself was caused by the spot instance terminating, rather than the terminal remaining idle for too long and timing out).

If you are running PuTTY, it is simple to reconnect to your still-running spot instance. Just right-click on the top bar of the PuTTY terminal window and select "Restart terminal" from the drop-down menu. You can then log in again.

2017-07-26, 11:02   #11
ET_
Banned

"Luigi"
Aug 2002
Team Italia

2×41×59 Posts

Quote:
 Originally Posted by chalsall Just quickly throwing out some ideas: 1. Are you absolutely sure that you didn't set your spot request in ohio to be persistent? 2. When you say the connection resets, are you sure the instance terminated? Have you checked the uptime after reconnecting? 3. When you say it restarts when you terminate the instance, have you cancelled the instance request (this would be related to point 1 above)? Although related, instances and instance *requests* are separate records in AWS' knowledge base. Beyond that, I don't have a clue what to suggest, except possibly contacting AWS' support team. Non-zero probably you've encountered a bug. Personally I have never had a spot instance re-start except when explicitly requested.
persistent = maintain. Got it, thanks chalsall.

Now, let,s see how long it proceed this time.

 Similar Threads Thread Thread Starter Forum Replies Last Post LingUaan Software 13 2015-10-15 16:15 ThomRuley YAFU 7 2012-07-14 04:24 fivemack Factoring 3 2011-09-02 21:04 Brian-E Forum Feedback 53 2011-08-24 12:42 Cruelty Software 5 2008-06-12 21:23

All times are UTC. The time now is 07:48.

Sat Jan 29 07:48:25 UTC 2022 up 190 days, 2:17, 1 user, load averages: 1.23, 1.21, 1.21