mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > Cloud Computing

Reply
 
Thread Tools
Old 2017-07-24, 16:17   #1
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

3×1,601 Posts
Default Strange behaviour

I created a spot instance of type c4.large (2 vcpu) on eu-west-1 and I got a E5-2666@2.9GHz running 1-threaded llr at 0.373 msec/bit using zero padded FMA3 FFT.

I then created another spot instance of type c4.xlarge (4vcpu) on us-west-1 and I got a E5-2666@2.9GHz running 2-threaded llr at 0.360 msec/bit using zero padded AVX FFT.

The software was identical, and downloaded from the same link.
The type of the processor comes from /proc/cpuid.
The frequency was 2.9 GHz on both servers.
The data was the same stuff.

Why did I get a slower machine? I wasted 3 hours to configure and tune everyting...
I guess it's the karma telling me not to compulsively waste my money on cloud servers.

Last fiddled with by ET_ on 2017-07-24 at 16:20
ET_ is offline   Reply With Quote
Old 2017-07-24, 18:48   #2
GP2
 
GP2's Avatar
 
Sep 2003

32·7·41 Posts
Default

As you mentioned, c4.large is one core (2 vCPUs = two hyperthreads), and c4.xlarge is two cores (4 vCPUs = four hyperthreads)

The c4 servers are Haswell, so they have FMA and AVX2. You can verify that by looking for "fma" and "avx2" in the output of the command grep flags /proc/cpuinfo

The servers all run the same custom chip at the same 2.9 GHz clock frequency.

Of course these are virtual servers sharing an 18-core physical server with other AWS users, so maybe they are competing for cache usage. But usually benchmarks are fairly consistent, it's very unlikely that you could get a machine that was somehow slowed down so much.

It's hard to know more without knowing exactly how the program behaves and what flags it was compiled with.

PS,
Three hours on a c4.xlarge costs not much more than 10 cents in total... keep trying

Last fiddled with by GP2 on 2017-07-24 at 18:50
GP2 is offline   Reply With Quote
Old 2017-07-24, 19:33   #3
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

19·499 Posts
Default

Quote:
Originally Posted by GP2 View Post
Of course these are virtual servers sharing an 18-core physical server with other AWS users, so maybe they are competing for cache usage. But usually benchmarks are fairly consistent, it's very unlikely that you could get a machine that was somehow slowed down so much.
This is why I tend to rent instances which consume most of the resources of the machine. Then if an instance isn't preforming well, I shut it down and spin up another one.

Rince; repeat. Manage the situation.
chalsall is offline   Reply With Quote
Old 2017-07-24, 20:35   #4
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

3·1,601 Posts
Default

Quote:
Originally Posted by GP2 View Post
have FMA and AVX2. You can verify that by looking for "fma" and "avx2" in the output of the command grep flags /proc/cpuinfo

It's hard to know more without knowing exactly how the program behaves and what flags it was compiled with.

PS,
Three hours on a c4.xlarge costs not much more than 10 cents in total... keep trying
I will keep trying and follow chalsall hint
The issue I was pointing out is not about the timing: both instances (c4.large and c4.xlarge) had the same flags on /proc/cpuinfo (obviously the c4.xlarge had 4 (virtual) cores instead of just 2), the program (llr statically linked) was downloaded in both cases from Jean Penné site, same link. And still, on th c4.large instance the FMA3 code was activated, while on the c4.xlarge instance the AVX code was activated.

I will try a new instance (as chalsall says) and this time I will keep track of every testing log. I just thought it had happened before to someone else.

Will keep you informed.

Luigi
ET_ is offline   Reply With Quote
Old 2017-07-24, 20:54   #5
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

12C316 Posts
Default

The content of /proc/cpuinfo on the c4.xlarge instance:

Code:
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 63
model name      : Intel(R) Xeon(R) CPU E5-2666 v3 @ 2.90GHz
stepping        : 2
microcode       : 0x25
cpu MHz         : 2900.105
cache size      : 25600 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 2
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt
bugs            :
bogomips        : 5800.22
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:
As you can see, avx2 is present.

Quering sllr 3.8.10 program (based on gwnum 28.x):

Code:
./sllr -m
	     Main Menu

	 1.  Test/Input Data
	 2.  Test/Continue
	 3.  Test/Exit

	 4.  Options/CPU
	 5.  Options/Preferences
	 6.  Advanced/Priority

	 7.  Help/About
Your choice: 4

CPU Information:
Intel(R) Xeon(R) CPU E5-2666 v3 @ 2.90GHz
CPU speed: 2900.15 MHz
CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE, SSE2, AVX, FMA3
L1 cache size: 32 KB
L2 cache size: 256 KB
The program recognizes the processor and its FMA3 instructions.

But when I run the program:

Code:
./sllr -d -t2 800k_900k.txt 
Resuming probable prime test of 1600122*2^859433-1600121 at bit 152154 [17.70%]
Using zero-padded AVX FFT length 84K, Pass1=448, Pass2=192, 2 threads, a = 3
^C600122*2^859433-1600121, bit: 210000 / 859454 [24.43%].  Time per bit: 0.337 ms.
To make a comparison, the other instance (a c4.large) runs as follow:

Code:
Starting probable prime test of 1497090*2^859433-1497089
Using zero-padded FMA3 FFT length 84K, Pass1=448, Pass2=192, a = 3
1497090*2^859433-1497089, bit: 130000 / 859454 [15.12%].  Time per bit: 0.372 ms.
I also used the same input file on both instances, with the same results.

I close-terminated the spot instance, will recreate a new one tomorrow. Let's hope it will work!

Luigi

Last fiddled with by ET_ on 2017-07-24 at 20:57
ET_ is offline   Reply With Quote
Old 2017-07-25, 18:25   #6
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

3×1,601 Posts
Default

I keep getting connection reset from Amazon on the Ohio instance, and I can't get rid of the instance itself: when I terminate it, it gets automatically restarted, while when it gets disconnected (broken pipe), it restarts automatically and stays idle.

I hate losing time to analyze this... :-)

Luigi
---
ET_ is offline   Reply With Quote
Old 2017-07-25, 19:08   #7
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

29×101 Posts
Default

Quote:
Originally Posted by ET_ View Post
I keep getting connection reset from Amazon on the Ohio instance, and I can't get rid of the instance itself: when I terminate it, it gets automatically restarted, while when it gets disconnected (broken pipe), it restarts automatically and stays idle.

I hate losing time to analyze this... :-)

Luigi
---
Check if you have any Auto Scaling Groups or open Spot Requests.
Mark Rose is offline   Reply With Quote
Old 2017-07-25, 21:36   #8
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

3·1,601 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
Check if you have any Auto Scaling Groups or open Spot Requests.
I have another spot request on a different zone (eu-west-1), I don't think it should care... and no auto scaling group
ET_ is offline   Reply With Quote
Old 2017-07-26, 00:06   #9
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

250916 Posts
Default

Quote:
Originally Posted by ET_ View Post
I have another spot request on a different zone (eu-west-1), I don't think it should care... and no auto scaling group
Just quickly throwing out some ideas:

1. Are you absolutely sure that you didn't set your spot request in ohio to be persistent?

2. When you say the connection resets, are you sure the instance terminated? Have you checked the uptime after reconnecting?

3. When you say it restarts when you terminate the instance, have you cancelled the instance request (this would be related to point 1 above)? Although related, instances and instance *requests* are separate records in AWS' knowledge base.

Beyond that, I don't have a clue what to suggest, except possibly contacting AWS' support team. Non-zero probably you've encountered a bug.

Personally I have never had a spot instance re-start except when explicitly requested.
chalsall is offline   Reply With Quote
Old 2017-07-26, 00:34   #10
GP2
 
GP2's Avatar
 
Sep 2003

32×7×41 Posts
Default

Quote:
Originally Posted by ET_ View Post
I keep getting connection reset from Amazon on the Ohio instance, and I can't get rid of the instance itself: when I terminate it, it gets automatically restarted, while when it gets disconnected (broken pipe), it restarts automatically and stays idle.
If you are starting the program in the user-data script, then you can simply do something like:

Code:
./mprime > /dev/null 2>&1 &
(substitute your own program for "mprime")

The program will run outside of any terminal.


If you are starting it from an SSH terminal window, then you should use nohup:

Code:
nohup ./mprime > /dev/null 2>&1 &
The program will then keep running even if your terminal disconnects (unless the disconnection itself was caused by the spot instance terminating, rather than the terminal remaining idle for too long and timing out).

If you are running PuTTY, it is simple to reconnect to your still-running spot instance. Just right-click on the top bar of the PuTTY terminal window and select "Restart terminal" from the drop-down menu. You can then log in again.
GP2 is offline   Reply With Quote
Old 2017-07-26, 11:02   #11
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

3×1,601 Posts
Default

Quote:
Originally Posted by chalsall View Post
Just quickly throwing out some ideas:

1. Are you absolutely sure that you didn't set your spot request in ohio to be persistent?

2. When you say the connection resets, are you sure the instance terminated? Have you checked the uptime after reconnecting?

3. When you say it restarts when you terminate the instance, have you cancelled the instance request (this would be related to point 1 above)? Although related, instances and instance *requests* are separate records in AWS' knowledge base.

Beyond that, I don't have a clue what to suggest, except possibly contacting AWS' support team. Non-zero probably you've encountered a bug.

Personally I have never had a spot instance re-start except when explicitly requested.
persistent = maintain. Got it, thanks chalsall.

Now, let,s see how long it proceed this time.
ET_ is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Strange behaviour of Prime95 LingUaan Software 13 2015-10-15 16:15
Strange YAFU behaviour ThomRuley YAFU 7 2012-07-14 04:24
How do I get comprehensible MPI behaviour fivemack Factoring 3 2011-09-02 21:04
Annoying forum behaviour Brian-E Forum Feedback 53 2011-08-24 12:42
strange LLR behaviour Cruelty Software 5 2008-06-12 21:23

All times are UTC. The time now is 14:37.

Sun Mar 7 14:37:23 UTC 2021 up 94 days, 10:48, 0 users, load averages: 1.44, 1.66, 1.61

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.