mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2003-01-28, 17:33   #1
eepiccolo
 
eepiccolo's Avatar
 
Dec 2002
Frederick County, MD

2·5·37 Posts
Default Some info on version 23 please?

Perhaps George, you'll see this thread here and give us some information on what new things might be coming up in version 23 of Prime95? :)

Thanks very much.
eepiccolo is offline   Reply With Quote
Old 2003-01-28, 18:46   #2
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

17×487 Posts
Default Re: Some info on version 23 please?

Quote:
Originally Posted by eepiccolo
Perhaps George, you'll see this thread here and give us some information on what new things might be coming up in version 23 of Prime95?
I don't have any grand plans right now. Most likely a few speed improvements for P4 CPUs. I have a few tricks I'd like to try out as spare time permits.

The next big change is likely to come with any server improvements. P-1 factoring as a work type, separate userids and teamids, etc.
Prime95 is offline   Reply With Quote
Old 2003-02-03, 02:00   #3
pakaran
 
pakaran's Avatar
 
Aug 2002

111110012 Posts
Default Re: Some info on version 23 please?

Quote:
Originally Posted by Prime95
The next big change is likely to come with any server improvements. P-1 factoring as a work type, separate userids and teamids, etc.
George,

Do you know what the timeline is likely to be on those server improvements?

We've been talking here for several months now.

Thanks!
pakaran is offline   Reply With Quote
Old 2003-02-17, 21:09   #4
crash893
 
crash893's Avatar
 
Sep 2002

23·37 Posts
Default

was / is there any plan for athlon speed improvments?
crash893 is offline   Reply With Quote
Old 2003-02-22, 19:09   #5
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

827910 Posts
Default

No plans on Athlon speed improvements. I don't have access to an Athlon and even more important the stack based architecture of x87 FPU registers makes it extremely tedious to try new code ideas. The direct addressable SSE2 registers solve this problem.
Prime95 is offline   Reply With Quote
Old 2003-02-28, 17:38   #6
pakaran
 
pakaran's Avatar
 
Aug 2002

3×83 Posts
Default

Quote:
Originally Posted by Prime95
No plans on Athlon speed improvements. I don't have access to an Athlon and even more important the stack based architecture of x87 FPU registers makes it extremely tedious to try new code ideas. The direct addressable SSE2 registers solve this problem.
Is that the reason for the P4's massive speed advantage over the XP, even comparing their "real" clock speeds?
pakaran is offline   Reply With Quote
Old 2003-02-28, 18:35   #7
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

827910 Posts
Default

Quote:
Originally Posted by pakaran
Is that the reason for the P4's massive speed advantage over the XP, even comparing their "real" clock speeds?
An interesting question. I'm not enough of an Athlon expert to make a definitive assessment.

The P4 and Athlon have the same theoretical FPU thoughput - one add and one mul per clock cycle.

However, the P4 has several other advantages:

1) SSE2 gives you 16 floating point values in registers vs. the Athlon's 8
2) SSE2 gives you direct addressing of registers eliminating the need for fxch instructions (which may make register renaming harder for the Athlon?)
3) The P4 has 128-byte cache lines to main memory, meaning better bandwidth
4) The P4 is supposed to have better bandwidth between the L2 and L1 caches.
5) A single SSE2 instruction does twice the work of an x87 FPU instruction. This means that there are half as many instructions to schedule and retire.

The Athlon has some advantages too:

1) The latency for an add or multiply is significantly less than the P4
2) The penalty for a mis-predicted branch is less.

What I can't tell you is which of the above causes prime95 to shine on the P4. Nor can I tell you how much a rewrite of the FFT routines for the Athlon would reduce iteration times.
Prime95 is offline   Reply With Quote
Old 2003-02-28, 18:55   #8
pakaran
 
pakaran's Avatar
 
Aug 2002

24910 Posts
Default

What about AMD's "3Dnow Pro" instruction set, for the Athlon XP?

Also, does the AXP having 3 FP pipelines help any?
pakaran is offline   Reply With Quote
Old 2003-02-28, 19:08   #9
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

201278 Posts
Default

Quote:
Originally Posted by pakaran
What about AMD's "3Dnow Pro" instruction set, for the Athlon XP?
Also, does the AXP having 3 FP pipelines help any?
My understanding is 3Dnow Pro is single-precision floats - useless to prime95.

By three pipelines do you mean add/sub, mul, load/store? If so, the P4 has those too.
Prime95 is offline   Reply With Quote
Old 2003-02-28, 19:30   #10
pakaran
 
pakaran's Avatar
 
Aug 2002

24910 Posts
Default

I'm not sure - I just vaguely remember that from an OCing website, so that may indeed be all that's meant.

In many senses the AXP is designed to work well with existing programs - which is why so many gamers are fond of it.
pakaran is offline   Reply With Quote
Old 2003-03-04, 00:13   #11
willmore
 
willmore's Avatar
 
Aug 2002

61 Posts
Default

Quote:
Originally Posted by Prime95
However, the P4 has several other advantages:

1) SSE2 gives you 16 floating point values in registers vs. the Athlon's 8
2) SSE2 gives you direct addressing of registers eliminating the need for fxch instructions (which may make register renaming harder for the Athlon?)
3) The P4 has 128-byte cache lines to main memory, meaning better bandwidth
4) The P4 is supposed to have better bandwidth between the L2 and L1 caches.
5) A single SSE2 instruction does twice the work of an x87 FPU instruction. This means that there are half as many instructions to schedule and retire.

The Athlon has some advantages too:

1) The latency for an add or multiply is significantly less than the P4
2) The penalty for a mis-predicted branch is less.

What I can't tell you is which of the above causes prime95 to shine on the P4. Nor can I tell you how much a rewrite of the FFT routines for the Athlon would reduce iteration times.
P4-pro #2 shouldn't be it as those are supposed to not even get to the execution engine nor, if I remember right, are they even supposed to take an execution slot, but they have to be decoded. They're effectively a NOP that just switches around entries in the aliasing table for the FP rename unit. Of course, this feature helps George write and test more code, so in the big picture, it might be the real winner--from a system perspective. :)

My money is that it's 3 and 4 which do most of the work to make the current P4 implementations rock on this code. Of course, the real winner is simply SSE2 allowing you to do 2 DP FP ops/(cycle|instruction). That beats 1 80 bit OP/(cycle|instruction).

If code can be written to increase the tolerable execution/reuse latency of the P4, then the higher throuput of that chip/implementation will clobber the current athlon family. The athlons have been optimized for low latency--which is very useful for simple and relatively unoptimized code. The large L1/L2 of that chip factors in similarly--it's very forgiving of code that's not blocked well or which has a poor data layout.

So, short answer, P4 is harder to code for and less forgiving, but offers more absolute performance while the Athlon is easier to code for, but doesn't have all of the resources of the P4. So, for 90+% of the code out there, the Athlon will be the winner. But, if you optimize and your alg is capable of it, the P4 can beat it out. mprime fits in the latter category thanks to George (prime95).
willmore is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mersenne.info Dubslow GPU to 72 20 2011-12-16 03:07
some sticky info for gpu timsu GPU Computing 18 2011-01-21 20:32
Stale CPU Info rstryk Software 3 2008-12-22 20:12
Help info OmbooHankvald 15k Search 12 2005-09-15 22:02
Info on processors JuanTutors Hardware 3 2004-08-22 10:49

All times are UTC. The time now is 13:52.


Fri Jul 7 13:52:52 UTC 2023 up 323 days, 11:21, 0 users, load averages: 1.83, 1.39, 1.22

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔