mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2008-03-01, 23:56   #1
ShiningArcanine
 
ShiningArcanine's Avatar
 
Dec 2005

1348 Posts
Default Linux is faster than Windows?

I originally posted this on channel9.msdn.com, but I am reposting it here because I want answers as to why my mersenne prime number finder runs faster on Linux than it runs on Windows.

Quote:
Last November, I wrote a C program that can find large prime numbers (using the Lucas Lehmer test for mersenne prime numbers). Today, I installed Ubuntu 6.10 LTS in Microsoft Virtual PC 2007 and proceeded to install gcc, m4, autoconf and gmp. I also installed all of the patches available for Ubuntu and upgraded the kernel to the i686 version.

I downloaded the source code for my program from my university's unix server using sch and compiled my program using GCC. Imagine my surprise when I discovered that my program executed in a Virtual PC far faster than it executes on Windows.

Here are some numbers:

Testing all of the mersenne numbers ((2^x) - 1) between 0 and 2281 takes 1 second on Ubuntu and 3 seconds on windows. Testing all of the mersenne numbers between 0 and 3217 takes 5 seconds on Ubuntu and 10 seconds on Windows. Testing M21701 ((2^21701) - 1) takes 6 seconds on Ubuntu and 20 seconds on Windows.

You could say that my program runs twice as fast on Ubuntu, but that would be wrong, because Microsoft Virtual PC does not have SMP support, so my these tests are being run in a single thread on Ubuntu and two threads on Windows (three if you include the main thread that waits for the other threads to terminate) with the exception of the test of M21701, which uses a single thread on both.

Since I am using win32-pthreads for multithreading, I decided to do another run that would eliminate shared resources between the threads as a source of overhead by ensuring that each thread had a ton of work to do before needing to lock the mutex. So I tested all of the mersenne numbers between 2281 and 3217. It took 4 seconds on Ubuntu and 7 seconds on Windows.

The variables here are the operating systems and the compilers. I am running Windows Media Center 2005 Edition with Visual Studio 2008 Professional and under Microsoft Virtual PC 2007, Ubuntu 6.10 LTS with GCC 4.03. I am compiling my program in Visual Studio 2008 Professional under release mode with all compilation flags that I could find set. That includes /Ox, /Ob2, /Ot, /Oy, /GL, /arch:SSE2, /fp:fast and /GS-. I am compiling my program on Ubuntu with the following command:

gcc -m32 -O2 -fomit-frame-pointer -mtune=k8 -march=k8 mersenne.c /usr/local/lib/gmp.so

There are no background processes running that are sucking up CPU resources and I did each test (particularly on Windows) several times to try to minimize the negativity of the results.

The code is the same and the hardware is the same. The only other variable is Virtual PC, which should be harming performance on Ubuntu, not Windows. So would anyone be able to tell me, exactly what is making my program run so much slower on Windows than it runs on Ubuntu?
Does anyone know why this is happening?
ShiningArcanine is offline   Reply With Quote
Old 2008-03-02, 00:05   #2
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

16FD16 Posts
Default

How are you doing the timing? Is it with a manual stopwatch or is your program reading a system timer somewhere?

Are you doing your windows timing with Virtual PC also running or do you shut it down before testing?

Try making your code as completely single threading and then compare (apples-to-apples).
retina is online now   Reply With Quote
Old 2008-03-02, 00:09   #3
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

603810 Posts
Default

Quote:
Originally Posted by ShiningArcanine View Post
I originally posted this on channel9.msdn.com, but I am reposting it here because I want answers as to why my mersenne prime number finder runs faster on Linux than it runs on Windows.

Does anyone know why this is happening?
It appears that multi-threading is being used. I wouldn't be surprised if the overhead to start a thread in Windows is more expensive than unix.
rogue is offline   Reply With Quote
Old 2008-03-02, 00:15   #4
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

23·419 Posts
Default

My bet is the compiler. I've found gcc to be a much better optimizer than MSVC, but I've only used up through MSVC 6.
bsquared is offline   Reply With Quote
Old 2008-03-02, 00:36   #5
ShiningArcanine
 
ShiningArcanine's Avatar
 
Dec 2005

9210 Posts
Default

Quote:
Originally Posted by retina View Post
How are you doing the timing? Is it with a manual stopwatch or is your program reading a system timer somewhere?

Are you doing your windows timing with Virtual PC also running or do you shut it down before testing?

Try making your code as completely single threading and then compare (apples-to-apples).
I am doing my timing with time.h.

I am doing my timing runs with Virtual PC running in the background (with Ubuntu paused), but the results are no different than what they were before I installed Ubuntu 6.10.

If I was to run my program in single threaded mode on Windows, the numbers would look even worst for Windows, as my program scales as perfectly as time.h allows me to measure and will run with 99% of the CPU according to Task Manager. I will rerun the tests in single threaded mode, but the numbers are only going to become worse for Windows.

Quote:
Originally Posted by rogue View Post
It appears that multi-threading is being used. I wouldn't be surprised if the overhead to start a thread in Windows is more expensive than unix.
The single threaded version is being run on Linux while the multithreading version is being run on Windows. My program starts two threads (1 for each processor) to run Lucas Lehmer and then waits for the two to finish. The two coordinate through two mutexes, one for getting the next exponent to test and another for submitting exponents of mersenne primes.

Quote:
Originally Posted by bsquared View Post
My bet is the compiler. I've found gcc to be a much better optimizer than MSVC, but I've only used up through MSVC 6.
That is what I suspect, although I am trying to be as fair as possible to Visual Studio. Here is the url to the thread at Channel9:

http://channel9.msdn.com/ShowPost.aspx?PostID=387457

No one has replied yet. Both developers that use Microsoft products and developers that work for Microsoft post there, including members of the Visual Studio team, so I am curious what they have to say.

Edit: I reran the tests in single threaded mode on Windows. The new numbers are 6 seconds for 0 - 2281, 20 seconds for 0 - 3217 and 12 seconds for 2281 - 3217. M21701 still takes 20 seconds.

Last fiddled with by ShiningArcanine on 2008-03-02 at 00:42
ShiningArcanine is offline   Reply With Quote
Old 2008-03-02, 00:56   #6
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

10110111111012 Posts
Default

Quote:
Originally Posted by ShiningArcanine View Post
I am doing my timing runs with Virtual PC running in the background (with Ubuntu paused), but the results are no different than what they were before I installed Ubuntu 6.10.

If I was to run my program in single threaded mode on Windows, the numbers would look even worst for Windows, as my program scales as perfectly as time.h allows me to measure and will run with 99% of the CPU according to Task Manager. I will rerun the tests in single threaded mode, but the numbers are only going to become worse for Windows.
You need to shut down VPC because it uses Windows memory, also you need to test with a single thread in case two threads are pushing your code/data out of the cache and causing cache thrashing.
retina is online now   Reply With Quote
Old 2008-03-02, 01:17   #7
ShiningArcanine
 
ShiningArcanine's Avatar
 
Dec 2005

22·23 Posts
Default

Okay. I shut down Virtual PC and Firefox and reran the test results. The numbers were 6 seconds for 0 - 2281, 20 seconds for 0 - 3217 and 13 seconds for 2281 - 3217 with M21701 taking 20 seconds. I even had set the processor affinity to make sure that the single thread was not jumping from core to core.

I then realized that Visual Studio was starting my program with debugging enabled so I started my program without debugging and reran the tests. The numbers after doing that were 5 seconds for 0 - 2281, 17 seconds for 0 - 3217, 12 seconds for 2281 - 3217 and 20 seconds for M21701. Task Manager indicated that my program did not use more than 976 KB of memory during the tests.

Those figures are an improvement, but my program on Ubuntu is still running circles around its Windows counterpart.
ShiningArcanine is offline   Reply With Quote
Old 2008-03-02, 01:33   #8
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

5×11×107 Posts
Default

Are you running you program as a sub-task under MSVC? You need to run it as an independent task the same way as in Ubuntu, I'm assuming that means opening a CMD prompt and typing the exe name to run it. (apples-to-apples). Of course you need to make sure you run the exe generated with all the optimisations turn on.
retina is online now   Reply With Quote
Old 2008-03-02, 01:48   #9
ShiningArcanine
 
ShiningArcanine's Avatar
 
Dec 2005

22×23 Posts
Default

Visual Studio 2008 Professional has an option called "Start without debugging" that opens a program in a CMD window.

In the past, I have navigated to my program through a CMD prompt I opened and run tests on it and I could detect no significant difference in performance. I doubt that there is a difference between the case where I open a CMD prompt to run my program and the case where Visual Studio does.

Retina, if you would like, I will share the source code with you (with both personally identifying comments and the multithreaded code path stripped out; send me a pm with your email address if you are interested), so you can run your own tests, but I expect that you will have the same results I have.

Last fiddled with by ShiningArcanine on 2008-03-02 at 01:52 Reason: Clarified post
ShiningArcanine is offline   Reply With Quote
Old 2008-03-02, 02:00   #10
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

5·11·107 Posts
Default

Quote:
Originally Posted by ShiningArcanine View Post
Retina, if you would like, I will share the source code with you (with both personally identifying comments and the multithreaded code path stripped out; send me a pm with your email address if you are interested), so you can run your own tests, but I expect that you will have the same results I have.
Thanks for the offer. It is not that I disbelieve you, I am just trying to cover the bases, that is all. If you say you have done all of this and it made no difference then that is good enough for me.
retina is online now   Reply With Quote
Old 2008-03-02, 03:17   #11
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

3·7·19·29 Posts
Default

Have you considered that since your code is based on GMP, the speed of the GMP build might be the key factor in the speed difference?

I suggest trying some 100-iteration timing tests using much larger exponents [say in the millions] - you may be seeing some effects that are due to overhead, not core computation. I also don't know precisely at what operand size the GMP modmul routine sets its breakover points from small-operand algorithms [e.g. Toom-Cook multiply] to large-operand FFT-based multiply, but thousands of bits is the typical range for such a breakover. You want t be sure you're comparing FFT-based multiply timings, not integer-based. Also, are you building GMP locally or using a precompiled version?

I use both Linux and MSVC all the time for builds of my Mlucas code, and find MSVC 7 and 8 to have a quite decent optimizer - unless you've somehow buggered the optimization flags, I strongly suspect something other than the compiler is responsible here.
ewmayer is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Switch from Windows to Linux Svenie25 Software 2 2013-12-18 11:23
Linux slower then Windows ( both 64 bit) pepi37 Linux 20 2011-12-14 19:47
Okay to switch between windows and linux? Unregistered Information & Answers 1 2011-06-09 02:28
going from linux to windows nbv4 Software 1 2006-04-04 02:04
Linux that can run Windows programs ThomRuley Linux 9 2004-05-09 03:21

All times are UTC. The time now is 07:26.

Thu Dec 3 07:26:01 UTC 2020 up 3:37, 0 users, load averages: 0.87, 0.95, 0.98

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.