mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2007-03-07, 02:52   #1
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

3·11·239 Posts
Default Hyperthread v25.2 benchmarks

I need a volunteer or two to do some advanced hyperthreading benchmarks under Windows. You'll need version 25.2 at ftp://mersenne.org/gimps/p95v252.zip. DO NOT OVERWRITE YOUR CURRENT PRIME95 - put this test software in its own folder.

The question I'm trying to answer is this: Should the default setting for prime95 on a hyperthreaded CPU be run one LL test using two threads? Or do you get more throughput by running 2 LL tests?

We can try to answer this question by using Advanced/Benchmark, but I think we'll need to actually time throughput by running several minutes of one LL test using 2 threads and several minutes of two independent LL tests.

Last fiddled with by Prime95 on 2007-03-07 at 02:53
Prime95 is offline   Reply With Quote
Old 2007-03-07, 07:14   #2
E_tron
 
E_tron's Avatar
 
Sep 2002
Austin, TX

10618 Posts
Default

so, this calls for volunteers possessing:

Intel Pentium 4 with Hyper-Threading technology
Intel Xeon (Pentium 4 based) with Hyper-Threading technology

any others?
E_tron is offline   Reply With Quote
Old 2007-03-07, 12:38   #3
Andi47
 
Andi47's Avatar
 
Oct 2004
Austria

2·17·73 Posts
Default

I will download the file as soon as my internet connection at home is working again (hopefully today in the evening).

I could do some benchmarks on a Hyperthreaded P4. Are there particular numbers or FFT sizes to run for a few minutes?

Do you also need some benchmarks on a Dual Processor system? I could do some on my Core 2 Duo laptop in one or two weeks. (currently it is running a Huge P-1 stage 2 with GMP-ECM, so I don't want to interrupt it.)

Edit: Is 25.2 mature enough to run a few doublechecks on small Mersennes (for testing the software)?

Last fiddled with by Andi47 on 2007-03-07 at 12:41
Andi47 is offline   Reply With Quote
Old 2007-03-07, 14:43   #4
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

11110110011112 Posts
Default

Yes, a hyper-threaded P4 is required. A variety of L2 cache sizes would be nice as that could affect the answer.

I do not need any dual-processor benchmarks. I own a dual-core P4. As expected you get more throughput by running 2 independent LL tests.

After running a simple benchmark, I think running throughput timings on the 1M and 2M FFT sizes should be sufficient.

25.2 ought to work on a double-check. It has not been thoroughly QA'ed. Communication with the server has been disabled so you'd have to email any results.
Prime95 is offline   Reply With Quote
Old 2007-03-07, 19:53   #5
Ethan Hansen
 
Ethan Hansen's Avatar
 
Oct 2005

23×5 Posts
Default

I tested 1M and 2M FFT sizes on two processors. Both CPUs are 3GHz, making the comparisons easier. The first system uses a desktop P4 processor with an 8K L1 cache and 512K L2. The second is a Xeon with 16K L1 and 2M L2. The systems have roughly equivalent memory; 2G DDR-400 for the P4, 3G DDR2-400 for the Xeon.

Notation conventions: F1/T1 = 1M FFT and 1 Thread, F2/T2 = 2M FFT, 2 threads (two simultaneous exponents tested), etc. Timings are average in seconds for 10K LL iterations (reported every 500 iterations, average value used) with the test repeated twice.

Timings:

Code:
Processor      F1/T1   F1/T2    F2/T1    F2/T2
----------------------------------------------
    P4         0.031   0.063    0.068    0.150
  Xeon         0.036   0.076    0.066    0.132
My guess is that the larger L2 cache line size aids the P4 system with smaller exponents. Running two exponents in parallel had minimal effect on the timings at 1M FFT sizes for either processor. The Xeon system showed no degradation in throughput for the 2M FFT size as well.
Ethan Hansen is offline   Reply With Quote
Old 2007-03-07, 21:43   #6
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

3×11×239 Posts
Default

Ethan, if I interpret your numbers correctly, you get absolutely no benefit from hyperthreading by running two independent LL tests.

Can you now try v25.2 (get the one I just made today)? Go into Test/Worker threads dialog box and select 1 worker thread using 2 threads per LL test. Thanks.
Prime95 is offline   Reply With Quote
Old 2007-03-07, 21:50   #7
E_tron
 
E_tron's Avatar
 
Sep 2002
Austin, TX

23116 Posts
Default

There's one more chip capable of Hyper-Threading:

Dual-Core Pentium Extreme Edition codenamed Smithfield. It's basicly 2 prescott pentium 4s with HT enabled.
E_tron is offline   Reply With Quote
Old 2007-03-07, 23:07   #8
Ethan Hansen
 
Ethan Hansen's Avatar
 
Oct 2005

4010 Posts
Default

George,

Sorry, I'm not following you. The Test/Worker Threads dialog has an entry for Number of Worker Threads, but I do not see an option to set the number of threads per LL test. I'm running the P95 timestamped March 7th, 16:30.

E_tron: There were two EE versions that supported hyperthreading. The first used the Smithfield core (EE 840), while the second used Presler (EE 955/965). The difference between these and the standard processors, aside from a hefty price premium, was the unlocked multiplier and hyperthreading being enabled. HT only worked on a very few motherboards; the 840 required the 955X chipset, while the 955/965 only functioned with the i975X. Not all motherboards sporting these chipsets allowed HT to be used. To the 37 or so people who actually purchased these mini space heaters: my condolences.
Ethan Hansen is offline   Reply With Quote
Old 2007-03-08, 01:15   #9
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

24×32×23 Posts
Default

Here is my CPU info from Prime95 and CPU-Z: cpu.jpg

I tested LL test at 36M:

2 LL test at once, 1 on each thread: 2LL.jpg

1 LL test on both threads: 1LL.jpg

Its almost the same. In 1 hour you would get 3600/0.1137 + 3600/0.112 = 63,805 total iterations running 2 LLs, and 3600/0.0569 = 63,269 iterations on the single LL, but thats only 0.84% more and may just be minor fluxtuations.

Let me know if you want more or longer tests, I can do them tomorrow thursday.


Edit: Ops, I just saw I never changed available memory above 8Mb, but that shouldn't affect it right? It did use more than 8 Mb anyway.
Btw, for the 2 LL test it didn't save the worktodo properly:
AdvancedTest=36000109
AdvancedTest=36000199
[Thread #2]
So when I restarted it, worker thread2 did not have any work.

Last fiddled with by ATH on 2007-03-08 at 01:20
ATH is offline   Reply With Quote
Old 2007-03-08, 01:37   #10
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

3×11×239 Posts
Default

Quote:
Originally Posted by Ethan Hansen View Post
Sorry, I'm not following you. The Test/Worker Threads dialog has an entry for Number of Worker Threads, but I do not see an option to set the number of threads per LL test.
It is "Number of CPUs to use". Perhaps I could word that better.
Prime95 is offline   Reply With Quote
Old 2007-03-08, 01:55   #11
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

788710 Posts
Default

Quote:
Originally Posted by ATH View Post
Its almost the same. In 1 hour you would get 3600/0.1137 + 3600/0.112 = 63,805 total iterations running 2 LLs, and 3600/0.0569 = 63,269 iterations on the single LL, but thats only 0.84% more and may just be minor fluxtuations.

Let me know if you want more or longer tests, I can do them tomorrow thursday.
The memory setting is irrelevant. Can you also do 1 LL test with 1 cpu per test? Thanks.


So far on the data I have:

A P4 with 512K L2 cache:
1M FFTs and smaller are 2-3% faster running 1 LL test with 2 threads
FFTs larger than 1M are slower.


Running 2 threads increases number of instructions scheduled on the ALU/FPU but increases the pressure on the L1 and L2 caches.
Prime95 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Benchmarks Pjetrode Information & Answers 3 2018-01-07 23:23
Where are the Benchmarks Sandman192 Homework Help 17 2012-04-05 19:03
Benchmarks MurrayInfoSys Information & Answers 3 2011-04-14 17:10
LLR benchmarks Oddball No Prime Left Behind 11 2010-08-06 21:39
Benchmarks for 24.12 Prime95 Software 60 2005-06-11 07:35

All times are UTC. The time now is 13:16.


Sat May 21 13:16:19 UTC 2022 up 37 days, 11:17, 0 users, load averages: 1.63, 1.51, 1.47

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔