mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2005-08-18, 20:14   #1
JHagerson
 
JHagerson's Avatar
 
May 2005
Naperville, IL, USA

110001002 Posts
Default Comparing Intel and AMD

I am under the impression that given current architectures, AMD is a better platform for factoring than Intel.

I just received a new PC at work. It has an Intel Pentium D, 3.20GHz. Prime95 reports RDTSC, CMOV, Prefetch, MMX, SSE, and SSE2. The L1 cache is 16K and the L2 cache is 1024K. Because this is a true dual-core machine, I am running two copies of Prime95 and get similar results on both. At home, I have an AMD Athlon 3000+ processor (and I don't have any more specifics available here at work).

I have both machines running in the M35M range, factoring to 62 bits. The Intel machine can factor a number in about 14 minutes on each of its cores. The AMD machine requires about 9 minutes.

I have another Intel machine that is factoring in the M37.9M range. Its specs are Intel Pentium 4, 2.66GHz. Prime95 reports RDTSC, CMOV, Prefetch, MMX, SSE, and SSE2. The L1 cache is 8K and the L2 cache is 512K. This machine requires about 17 minutes to factor a number.

I'm surprised that there is not more difference between the performance of the two Intel machines, given what appear to be the large differences between the two processors.

I would appreciate any comments about how the details of the processors account for these differences in performance. Thank you.
JHagerson is offline   Reply With Quote
Old 2005-08-18, 22:58   #2
lycorn
 
lycorn's Avatar
 
"GIMFS"
Sep 2002
Oeiras, Portugal

2·32·83 Posts
Default

Intel P4 processors are very poor performers when factoring under 64 bits, because the SSE2 optimizations won´t kick in below that level. The factoring code is mostly based on integer operations, and the P4 cores are not good at that, at least compared with AMD´s. When factoring to 64 bits or above the P4s perform decently, thank to their excellent implementation of SSE2.
That means that for the levels you are factoring to, Athlons are definitely a much better choice that P4s, and that it is a waste of CPU power to use P4s for that purpose (they should be doing LL work, that it is heavily SSE2 optimized).
Just to give you an idea of how weak P4s are when factoring to low levels:
Sometime ago I compared the timings of a P4 1.6GHz with a P3 700 MHz, both factoring numbers to 59 bits - the P3 was slightly faster than the P4 ...
lycorn is offline   Reply With Quote
Old 2005-08-19, 00:00   #3
garo
 
garo's Avatar
 
Aug 2002
Termonfeckin, IE

22·691 Posts
Default

Agree with what lycorn is saying. Definitely do not use P4s for factoring to 62 bits - though it is your machine and you can do what you want with it, P4s are much better at LL testing or factoring above 64 bits and even to some extent factoring in the 62bit-64bit range. So please reconsider your machine assignment. If you still want to factor with your P4s then please consider factoring numbers from Primenet and not from LMH.

As to your question about performance difference, remember that factoring at a given bit level takes less time as the exponents become larger. Since your 2.66GHz P4 is doing M37.9 it is performing less work per number than the 3.2GHz.
One final note, on the dual core, running two LL tests may not give as much throughput as desired - someone did benchmarks and posted on the forum but I am too lazy to dig them up :) - so running one LL test and one factoring test may possibly be better.
garo is offline   Reply With Quote
Old 2005-08-19, 01:06   #4
JHagerson
 
JHagerson's Avatar
 
May 2005
Naperville, IL, USA

22×72 Posts
Default

Quote:
Originally Posted by garo
One final note, on the dual core, running two LL tests may not give as much throughput as desired - someone did benchmarks and posted on the forum but I am too lazy to dig them up :) - so running one LL test and one factoring test may possibly be better.
I think I found this behavior on the single-core, dual-threaded, Pentia I have worked with.

Thank you for your comments. I have some PIII computers at work assigned to factoring through PrimeNet. Because the P4 machines were making "decent" progress on the LMH assignments, I guess I was satisfied and didn't give much thought to assigning them other work.

If anyone else cares to throw in their JPY2, please do so.
JHagerson is offline   Reply With Quote
Old 2005-08-19, 08:19   #5
lycorn
 
lycorn's Avatar
 
"GIMFS"
Sep 2002
Oeiras, Portugal

101110101102 Posts
Default

I would suggest you to swap them over: assign Primenet factoring work (the bulk of it is well above 62 bits) to P4s, and LMH work (61-62 bits) to the P3s.

The Athlon 3000+ should also be nice to factor to higher limits, as it also implements SSE2. If you install Windows 64-bit on that machine and then run the 64-bit version of Prime95, you will have a LARGE speed increase for Trial Factoring.
lycorn is offline   Reply With Quote
Old 2005-08-19, 10:00   #6
Dresdenboy
 
Dresdenboy's Avatar
 
Apr 2003
Berlin, Germany

16916 Posts
Default

Quote:
Originally Posted by garo
One final note, on the dual core, running two LL tests may not give as much throughput as desired - someone did benchmarks and posted on the forum but I am too lazy to dig them up :) - so running one LL test and one factoring test may possibly be better.
At least the results were much better than for HT as expected - about 3% slower LL testing if there are 2 instances of Prime95 running on a Pentium D. This means ~1.94 LL testing throughput compared to single core.
Dresdenboy is offline   Reply With Quote
Old 2005-08-19, 12:44   #7
garo
 
garo's Avatar
 
Aug 2002
Termonfeckin, IE

22·691 Posts
Default

1.94 is good enough. I will live with two LL tests. If I remember correctly, with HT you get zero speedup. And with two Xeons sharing the memory bus you get about 1.8 or so.
garo is offline   Reply With Quote
Old 2005-08-19, 14:31   #8
Dresdenboy
 
Dresdenboy's Avatar
 
Apr 2003
Berlin, Germany

192 Posts
Default

Quote:
Originally Posted by garo
1.94 is good enough. I will live with two LL tests. If I remember correctly, with HT you get zero speedup. And with two Xeons sharing the memory bus you get about 1.8 or so.
Yes, HT doesn't offer any advantage here besides in a parallel LL test/factoring run.

Instead it is counter productive, since a low priority FPU app can slow down a high priority FPU app. The HT CPUs don't look at priorities.

I also think that running 2 LL tests in parallel is fine on the Pentium D.

When were these Xeon tests made and on whi hardware? Pentium D has at least FSB 800 and the more recent versions of Prime95 (some 24.1x) became less dependend on memory speed.
Dresdenboy is offline   Reply With Quote
Old 2005-08-20, 16:33   #9
garo
 
garo's Avatar
 
Aug 2002
Termonfeckin, IE

22×691 Posts
Default

Well actually, I just ran some tests on a 2-CPU 2.8GHz Xeon system today. It has Registered ECC DRAM with conservative memory settings and I found that the total throughput was only about 1.46 of a single processor. The system is running Linux kernel 2.6 and the cpu affinities were set appropriately in Prime95 v24.12.

So I was a bit optimistic in my estimate with the Xeons.

Last fiddled with by garo on 2005-08-20 at 17:08
garo is offline   Reply With Quote
Old 2005-08-20, 19:05   #10
Dresdenboy
 
Dresdenboy's Avatar
 
Apr 2003
Berlin, Germany

5518 Posts
Default

Quote:
Originally Posted by garo
Well actually, I just ran some tests on a 2-CPU 2.8GHz Xeon system today. It has Registered ECC DRAM with conservative memory settings and I found that the total throughput was only about 1.46 of a single processor. The system is running Linux kernel 2.6 and the cpu affinities were set appropriately in Prime95 v24.12.
How much L2 cache do these Xeons have? I assume, they are Northwood based with 512 kB L2. Do they use FSB533? In combination with reg. ECC DRAM and conservative mem settings this could lead to the result you are seeing there.
Dresdenboy is offline   Reply With Quote
Old 2005-08-20, 21:59   #11
garo
 
garo's Avatar
 
Aug 2002
Termonfeckin, IE

22·691 Posts
Default

vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Xeon(TM) CPU 2.80GHz
stepping : 7
cpu MHz : 2791.929
cache size : 512 KB

This is a production server so a reboot to get into the BIOS is not possible. I did not build it so I do not know what the mem settings and FSB are. Do you know if there is another way of finding out the FSB at least?
garo is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
AMD vs Intel dtripp Software 3 2013-02-19 20:20
Intel NUC nucleon Hardware 2 2012-05-10 23:53
Intel RNG API? R.D. Silverman Programming 19 2011-09-17 01:43
comparing GPU's sleigher Msieve 3 2011-07-02 22:53
Intel Mac? penguain NFSNET Discussion 0 2006-06-12 01:31

All times are UTC. The time now is 21:28.


Tue Oct 26 21:28:41 UTC 2021 up 95 days, 15:57, 0 users, load averages: 1.90, 1.99, 1.77

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.