mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2011-11-21, 10:33   #177
nucleon
 
nucleon's Avatar
 
Mar 2003
Melbourne

5·103 Posts
Default

If I wanted to max the amount of GHz-days/day from an ATI/AMD card with mfakto, what should I be getting? And how many GHz-days/day could I hope to achieve.

Just hypothetical questions at this stage.

The best I can do so far is about 300GHz-days/day with a single GTX580.

-- Craig
nucleon is offline   Reply With Quote
Old 2011-11-21, 12:00   #178
KyleAskine
 
KyleAskine's Avatar
 
Oct 2011
Maryland

1001000102 Posts
Default

Quote:
Originally Posted by nucleon View Post
If I wanted to max the amount of GHz-days/day from an ATI/AMD card with mfakto, what should I be getting? And how many GHz-days/day could I hope to achieve.

Just hypothetical questions at this stage.

The best I can do so far is about 300GHz-days/day with a single GTX580.

-- Craig
I am not sure, but I have a 5870 and two 6950's (flashed as 6970's). With only one instance of mfakto for each card and sieving 5000 primes I get around numbers right around 150 on the output (or maybe 150M... I don't remember) for the 5870. Of course I don't know what the column means, other than bigger is better, which I why I can't remember what exactly it said.

Another metric is that I factor one number every half hour from 70 to 71 on the 5870.

I am positive I can do better with more primes being sieved and more instances. I can look a bit more into it when I get home today and let you know more exactly. I am interested how AMDs match up with nVidia's myself.

Last fiddled with by KyleAskine on 2011-11-21 at 12:02
KyleAskine is offline   Reply With Quote
Old 2011-11-21, 12:41   #179
bcp19
 
bcp19's Avatar
 
Oct 2011

7×97 Posts
Default

Quote:
Originally Posted by nucleon View Post
If I wanted to max the amount of GHz-days/day from an ATI/AMD card with mfakto, what should I be getting? And how many GHz-days/day could I hope to achieve.

Just hypothetical questions at this stage.

The best I can do so far is about 300GHz-days/day with a single GTX580.

-- Craig
With an HD 6770 I can get 100 M/s with 2 mfaktos running on an i5 2400, which is similiar to the 120 M/s of my GTS 450, but it is kind of a low end card. Can do roughly 3 48M 69-72 per mfakto ~= 90-105GHz/day. With a 560 Ti running 1 Mfaktc I can get 170 M/s on the 2400. Since 2 only gets me up to 200 M/s I let P95 have the core.
bcp19 is offline   Reply With Quote
Old 2011-11-21, 14:52   #180
Wizzard
 
Feb 2011
Bratislava

1B16 Posts
Default

Hello. Is Radeon HD 3400 supported too? If so, where can I download the latest mfakto? Thank you :)

edit: well, I found version 0.08, and I see "GPU not found", so I assume, it is not supported.

Last fiddled with by Wizzard on 2011-11-21 at 15:06
Wizzard is offline   Reply With Quote
Old 2011-11-21, 15:41   #181
KyleAskine
 
KyleAskine's Avatar
 
Oct 2011
Maryland

12216 Posts
Default

Quote:
Originally Posted by Wizzard View Post
Hello. Is Radeon HD 3400 supported too? If so, where can I download the latest mfakto? Thank you :)

edit: well, I found version 0.08, and I see "GPU not found", so I assume, it is not supported.
OpenCL is supported in 4xxx series and newer.
KyleAskine is offline   Reply With Quote
Old 2011-11-22, 22:26   #182
nucleon
 
nucleon's Avatar
 
Mar 2003
Melbourne

20316 Posts
Default

Quote:
Originally Posted by KyleAskine View Post
I am not sure, but I have a 5870 and two 6950's (flashed as 6970's). With only one instance of mfakto for each card and sieving 5000 primes I get around numbers right around 150 on the output (or maybe 150M... I don't remember) for the 5870. Of course I don't know what the column means, other than bigger is better, which I why I can't remember what exactly it said.

Another metric is that I factor one number every half hour from 70 to 71 on the 5870.

I am positive I can do better with more primes being sieved and more instances. I can look a bit more into it when I get home today and let you know more exactly. I am interested how AMDs match up with nVidia's myself.
On one machine, I have 2x instances with GTX580, using GPU-Z, it's usage hovers around 95-97%, so practically maxed out. Sieve primes=5000, and cpu is 100% constant. I don't have any more cpu cycles to throw at it at this stage.

Some timing data:

Code:
Instance0:
20111123-033143 no factor for M45006901 from 2^69 to 2^72 [mfaktc 0.16-Win barre
20111123-064935 no factor for M45034081 from 2^69 to 2^72 [mfaktc 0.16-Win barre

Instance1:
20111123-044206 no factor for M46251449 from 2^68 to 2^72 [mfaktc 0.16-Win barre
20111123-074124 no factor for M46629067 from 2^68 to 2^72 [mfaktc 0.16-Win barre
The first column is time completed. i.e. YYYYMMDD-hhmmss format.

To get the timing data, I run this command in the background:

Code:
tail -n 0 -F results.txt | xargs -I XX -n 1 bash -c "echo \`date +%Y%m%d-%H%M%S\` \"XX\"" >> results.log &
To timings are accurate within +/-1sec if I understand tail correctly. Yes it's a hack. But it's good start.

By my guess, 70-71 takes about 45mins and I'll have about 2 results in this time.

-- Craig
nucleon is offline   Reply With Quote
Old 2011-11-23, 12:10   #183
KyleAskine
 
KyleAskine's Avatar
 
Oct 2011
Maryland

2×5×29 Posts
Default

Quote:
Originally Posted by nucleon View Post
To get the timing data, I run this command in the background:

Code:
tail -n 0 -F results.txt | xargs -I XX -n 1 bash -c "echo \`date +%Y%m%d-%H%M%S\` \"XX\"" >> results.log &
To timings are accurate within +/-1sec if I understand tail correctly. Yes it's a hack. But it's good start.

By my guess, 70-71 takes about 45mins and I'll have about 2 results in this time.

-- Craig
Alright, I will throw this on my linux box tonight!
KyleAskine is offline   Reply With Quote
Old 2011-11-24, 16:01   #184
KyleAskine
 
KyleAskine's Avatar
 
Oct 2011
Maryland

29010 Posts
Default

Quote:
Originally Posted by KyleAskine View Post
Alright, I will throw this on my linux box tonight!
So I embarrassed myself. When I said I did one factor per half hour from 70 ot 71, I meant one factor per half hour from 69 to 70. Only off by one factor of two!! This is with an HD5870. I have two 6970s that are around the same speed.

Anyway, this is with only one instance of mfakto only sieving 5000 primes on an old AMD Phenom system. I can get a little bit more with two systems, but not enough to make it worthwhile in my opinion.

Code:
20111123-163748 no factor for M50771309 from 2^69 to 2^70 [mfakto 0.09 mfakto_cl_barrett79]
20111123-171212 no factor for M50781161 from 2^69 to 2^70 [mfakto 0.09 mfakto_cl_barrett79]
20111123-174634 no factor for M50781917 from 2^69 to 2^70 [mfakto 0.09 mfakto_cl_barrett79]
20111123-182057 no factor for M50783597 from 2^69 to 2^70 [mfakto 0.09 mfakto_cl_barrett79]
20111123-185520 no factor for M50789623 from 2^69 to 2^70 [mfakto 0.09 mfakto_cl_barrett79]
20111123-192943 no factor for M50801119 from 2^69 to 2^70 [mfakto 0.09 mfakto_cl_barrett79]
20111123-200406 no factor for M50803657 from 2^69 to 2^70 [mfakto 0.09 mfakto_cl_barrett79]
20111123-203829 no factor for M50804087 from 2^69 to 2^70 [mfakto 0.09 mfakto_cl_barrett79]
20111123-211252 no factor for M50806543 from 2^69 to 2^70 [mfakto 0.09 mfakto_cl_barrett79]
20111123-214716 no factor for M50807389 from 2^69 to 2^70 [mfakto 0.09 mfakto_cl_barrett79]
20111123-222140 no factor for M50807563 from 2^69 to 2^70 [mfakto 0.09 mfakto_cl_barrett79]
20111123-225603 no factor for M50807587 from 2^69 to 2^70 [mfakto 0.09 mfakto_cl_barrett79]
20111123-233027 no factor for M50812409 from 2^69 to 2^70 [mfakto 0.09 mfakto_cl_barrett79]
20111124-000449 no factor for M50823419 from 2^69 to 2^70 [mfakto 0.09 mfakto_cl_barrett79]
So it looks like that for this approx. $200 video card it is around 2x as slow as a comprable nVidia???
KyleAskine is offline   Reply With Quote
Old 2011-11-24, 16:25   #185
bcp19
 
bcp19's Avatar
 
Oct 2011

67910 Posts
Default

Quote:
Originally Posted by KyleAskine View Post
So I embarrassed myself. When I said I did one factor per half hour from 70 ot 71, I meant one factor per half hour from 69 to 70. Only off by one factor of two!! This is with an HD5870. I have two 6970s that are around the same speed.

Anyway, this is with only one instance of mfakto only sieving 5000 primes on an old AMD Phenom system. I can get a little bit more with two systems, but not enough to make it worthwhile in my opinion.

So it looks like that for this approx. $200 video card it is around 2x as slow as a comprable nVidia???
Not sure what is considered a comparable nVidia card, but you are a tad slower than my 560 Ti. I see 30-32 minutes on it from ^69-^70. In looking at http://www.hwcompare.com/8915/geforc...adeon-hd-5870/ your card has a higher memory bandwidth than mine. Like yours I can get a little better with 2 instances running (~20%) but it's not really worth it.

Edit: My bad, that was my 560... the Ti does it in ~24 min.

Last fiddled with by bcp19 on 2011-11-24 at 16:48
bcp19 is offline   Reply With Quote
Old 2011-11-24, 19:41   #186
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

1,123 Posts
Default Works!

Quote:
Originally Posted by Bdot View Post
Deinstalling 11.10, removing system32\amdocl64.dll, system32\amdoclcl64.dll, syswow64\amdocl.dll and syswow64\amdoclcl.dll, and then installing 11.9 did the trick, now mfakto runs again, also in 64-bits!

And now that I know that these are the critical files that are not removed during the driver deinstallation, I can as well try the latest version ;-)

Edit: I tried, and 11.11 has the same issues as 11.10. So 11.9 stays the last usable version (for mfakto).
This works, thanks! Also, I can confirm that 11.11 does not work.
flashjh is offline   Reply With Quote
Old 2011-11-24, 22:37   #187
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3×199 Posts
Default

Quote:
Originally Posted by KyleAskine View Post
So I embarrassed myself. When I said I did one factor per half hour from 70 ot 71, I meant one factor per half hour from 69 to 70. Only off by one factor of two!! This is with an HD5870. I have two 6970s that are around the same speed.

Anyway, this is with only one instance of mfakto only sieving 5000 primes on an old AMD Phenom system. I can get a little bit more with two systems, but not enough to make it worthwhile in my opinion.
The OpenCL version of mfaktc is slower than its original for various reasons:
  1. It is a rather plain port, not initially designed for OpenCL. I did some changes to "make it work", but only few optimizations for performance (yet).
  2. OpenCL does not (easily) allow direct access to the hardware's capabilities. For instance, no mul24_hi is available in OpenCL, even though the GPU has that instruction.
  3. ATI GPUs do not have hardware carry. Even if direct access to the whole instruction set was available, arithmetics with more than 32 bits require additional instructions and/or registers to maintain carry/borrow.
  4. The kernel that would most likely get the optimal performance out of the AMD chips is not yet included: A barrett kernel based on 24-bit instructions.
  5. OpenCL's multi-threaded approach to driving the GPU has disadvantages in heavily-loaded systems. mfakto will slow down when prime95 runs on the same box - even though mfakto runs at higher priority.
Given all that I think it would not be too bad if same-price NV cards delivered only double of AMDs. However, the 35 min per your test is what I get on my HD5770 card (and 2 Phenom-cores @3.4GHz, SievePrimes between 130k and 180k). HD5870 should be 50 to 100% faster, so I guess that the limit is not the GPU in your case. HD6970 should add another ~10% speed ... did you test switching to the mul24 kernel which should suit the HD6xxx better?



I have a lot of ideas what I could test/enhance/implement ... however, the current driver issues are not exactly motivating. And time is always limited ...
Bdot is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
gpuOwL: an OpenCL program for Mersenne primality testing preda GpuOwl 2938 2023-06-30 14:04
mfaktc: a CUDA program for Mersenne prefactoring TheJudger GPU Computing 3628 2023-04-17 22:08
LL with OpenCL msft GPU Computing 433 2019-06-23 21:11
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
Program to TF Mersenne numbers with more than 1 sextillion digits? Stargate38 Factoring 24 2011-11-03 00:34

All times are UTC. The time now is 15:16.


Fri Jul 7 15:16:27 UTC 2023 up 323 days, 12:45, 0 users, load averages: 0.70, 1.09, 1.11

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔