mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfakto: an OpenCL program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=15646)

nucleon 2011-11-21 10:33

If I wanted to max the amount of GHz-days/day from an ATI/AMD card with mfakto, what should I be getting? And how many GHz-days/day could I hope to achieve.

Just hypothetical questions at this stage.

The best I can do so far is about 300GHz-days/day with a single GTX580.

-- Craig

KyleAskine 2011-11-21 12:00

[QUOTE=nucleon;279366]If I wanted to max the amount of GHz-days/day from an ATI/AMD card with mfakto, what should I be getting? And how many GHz-days/day could I hope to achieve.

Just hypothetical questions at this stage.

The best I can do so far is about 300GHz-days/day with a single GTX580.

-- Craig[/QUOTE]

I am not sure, but I have a 5870 and two 6950's (flashed as 6970's). With only one instance of mfakto for each card and sieving 5000 primes I get around numbers right around 150 on the output (or maybe 150M... I don't remember) for the 5870. Of course I don't know what the column means, other than bigger is better, which I why I can't remember what exactly it said.

Another metric is that I factor one number every half hour from 70 to 71 on the 5870.

I am positive I can do better with more primes being sieved and more instances. I can look a bit more into it when I get home today and let you know more exactly. I am interested how AMDs match up with nVidia's myself.

bcp19 2011-11-21 12:41

[QUOTE=nucleon;279366]If I wanted to max the amount of GHz-days/day from an ATI/AMD card with mfakto, what should I be getting? And how many GHz-days/day could I hope to achieve.

Just hypothetical questions at this stage.

The best I can do so far is about 300GHz-days/day with a single GTX580.

-- Craig[/QUOTE]

With an HD 6770 I can get 100 M/s with 2 mfaktos running on an i5 2400, which is similiar to the 120 M/s of my GTS 450, but it is kind of a low end card. Can do roughly 3 48M 69-72 per mfakto ~= 90-105GHz/day. With a 560 Ti running 1 Mfaktc I can get 170 M/s on the 2400. Since 2 only gets me up to 200 M/s I let P95 have the core.

Wizzard 2011-11-21 14:52

Hello. Is Radeon HD 3400 supported too? If so, where can I download the latest mfakto? Thank you :)

edit: well, I found version 0.08, and I see "GPU not found", so I assume, it is not supported.

KyleAskine 2011-11-21 15:41

[QUOTE=Wizzard;279383]Hello. Is Radeon HD 3400 supported too? If so, where can I download the latest mfakto? Thank you :)

edit: well, I found version 0.08, and I see "GPU not found", so I assume, it is not supported.[/QUOTE]

OpenCL is supported in 4xxx series and newer.

nucleon 2011-11-22 22:26

[QUOTE=KyleAskine;279371]I am not sure, but I have a 5870 and two 6950's (flashed as 6970's). With only one instance of mfakto for each card and sieving 5000 primes I get around numbers right around 150 on the output (or maybe 150M... I don't remember) for the 5870. Of course I don't know what the column means, other than bigger is better, which I why I can't remember what exactly it said.

Another metric is that I factor one number every half hour from 70 to 71 on the 5870.

I am positive I can do better with more primes being sieved and more instances. I can look a bit more into it when I get home today and let you know more exactly. I am interested how AMDs match up with nVidia's myself.[/QUOTE]

On one machine, I have 2x instances with GTX580, using GPU-Z, it's usage hovers around 95-97%, so practically maxed out. Sieve primes=5000, and cpu is 100% constant. I don't have any more cpu cycles to throw at it at this stage.

Some timing data:

[CODE]Instance0:
20111123-033143 no factor for M45006901 from 2^69 to 2^72 [mfaktc 0.16-Win barre
20111123-064935 no factor for M45034081 from 2^69 to 2^72 [mfaktc 0.16-Win barre

Instance1:
20111123-044206 no factor for M46251449 from 2^68 to 2^72 [mfaktc 0.16-Win barre
20111123-074124 no factor for M46629067 from 2^68 to 2^72 [mfaktc 0.16-Win barre[/CODE]

The first column is time completed. i.e. YYYYMMDD-hhmmss format.

To get the timing data, I run this command in the background:

[CODE]tail -n 0 -F results.txt | xargs -I XX -n 1 bash -c "echo \`date +%Y%m%d-%H%M%S\` \"XX\"" >> results.log &[/CODE]

To timings are accurate within +/-1sec if I understand tail correctly. Yes it's a hack. But it's good start.

By my guess, 70-71 takes about 45mins and I'll have about 2 results in this time.

-- Craig

KyleAskine 2011-11-23 12:10

[QUOTE=nucleon;279534]
To get the timing data, I run this command in the background:

[CODE]tail -n 0 -F results.txt | xargs -I XX -n 1 bash -c "echo \`date +%Y%m%d-%H%M%S\` \"XX\"" >> results.log &[/CODE]

To timings are accurate within +/-1sec if I understand tail correctly. Yes it's a hack. But it's good start.

By my guess, 70-71 takes about 45mins and I'll have about 2 results in this time.

-- Craig[/QUOTE]

Alright, I will throw this on my linux box tonight!

KyleAskine 2011-11-24 16:01

[QUOTE=KyleAskine;279577]Alright, I will throw this on my linux box tonight![/QUOTE]

So I embarrassed myself. When I said I did one factor per half hour from 70 ot 71, I meant one factor per half hour from 69 to 70. Only off by one factor of two!! This is with an HD5870. I have two 6970s that are around the same speed.

Anyway, this is with only one instance of mfakto only sieving 5000 primes on an old AMD Phenom system. I can get a little bit more with two systems, but not enough to make it worthwhile in my opinion.

[CODE]20111123-163748 no factor for M50771309 from 2^69 to 2^70 [mfakto 0.09 mfakto_cl_barrett79]
20111123-171212 no factor for M50781161 from 2^69 to 2^70 [mfakto 0.09 mfakto_cl_barrett79]
20111123-174634 no factor for M50781917 from 2^69 to 2^70 [mfakto 0.09 mfakto_cl_barrett79]
20111123-182057 no factor for M50783597 from 2^69 to 2^70 [mfakto 0.09 mfakto_cl_barrett79]
20111123-185520 no factor for M50789623 from 2^69 to 2^70 [mfakto 0.09 mfakto_cl_barrett79]
20111123-192943 no factor for M50801119 from 2^69 to 2^70 [mfakto 0.09 mfakto_cl_barrett79]
20111123-200406 no factor for M50803657 from 2^69 to 2^70 [mfakto 0.09 mfakto_cl_barrett79]
20111123-203829 no factor for M50804087 from 2^69 to 2^70 [mfakto 0.09 mfakto_cl_barrett79]
20111123-211252 no factor for M50806543 from 2^69 to 2^70 [mfakto 0.09 mfakto_cl_barrett79]
20111123-214716 no factor for M50807389 from 2^69 to 2^70 [mfakto 0.09 mfakto_cl_barrett79]
20111123-222140 no factor for M50807563 from 2^69 to 2^70 [mfakto 0.09 mfakto_cl_barrett79]
20111123-225603 no factor for M50807587 from 2^69 to 2^70 [mfakto 0.09 mfakto_cl_barrett79]
20111123-233027 no factor for M50812409 from 2^69 to 2^70 [mfakto 0.09 mfakto_cl_barrett79]
20111124-000449 no factor for M50823419 from 2^69 to 2^70 [mfakto 0.09 mfakto_cl_barrett79][/CODE]

So it looks like that for this approx. $200 video card it is around 2x as slow as a comprable nVidia???

bcp19 2011-11-24 16:25

[QUOTE=KyleAskine;279716]So I embarrassed myself. When I said I did one factor per half hour from 70 ot 71, I meant one factor per half hour from 69 to 70. Only off by one factor of two!! This is with an HD5870. I have two 6970s that are around the same speed.

Anyway, this is with only one instance of mfakto only sieving 5000 primes on an old AMD Phenom system. I can get a little bit more with two systems, but not enough to make it worthwhile in my opinion.

So it looks like that for this approx. $200 video card it is around 2x as slow as a comprable nVidia???[/QUOTE]

Not sure what is considered a comparable nVidia card, but you are a tad slower than my 560 Ti. I see 30-32 minutes on it from ^69-^70. In looking at [URL]http://www.hwcompare.com/8915/geforce-gtx-560-ti-vs-radeon-hd-5870/[/URL] your card has a higher memory bandwidth than mine. Like yours I can get a little better with 2 instances running (~20%) but it's not really worth it.

Edit: My bad, that was my 560... the Ti does it in ~24 min.

flashjh 2011-11-24 19:41

Works!
 
[QUOTE=Bdot;278750]Deinstalling 11.10, removing system32\amdocl64.dll, system32\amdoclcl64.dll, syswow64\amdocl.dll and syswow64\amdoclcl.dll, and then installing 11.9 did the trick, now mfakto runs again, also in 64-bits!

And now that I know that these are the critical files that are not removed during the driver deinstallation, I can as well try the latest version ;-)

Edit: I tried, and 11.11 has the same issues as 11.10. So 11.9 stays the last usable version (for mfakto).[/QUOTE]

This works, thanks! Also, I can confirm that 11.11 does not work.

Bdot 2011-11-24 22:37

[QUOTE=KyleAskine;279716]So I embarrassed myself. When I said I did one factor per half hour from 70 ot 71, I meant one factor per half hour from 69 to 70. Only off by one factor of two!! This is with an HD5870. I have two 6970s that are around the same speed.

Anyway, this is with only one instance of mfakto only sieving 5000 primes on an old AMD Phenom system. I can get a little bit more with two systems, but not enough to make it worthwhile in my opinion.
[/QUOTE]

The OpenCL version of mfaktc is slower than its original for various reasons:
[LIST=1][*]It is a rather plain port, not initially designed for OpenCL. I did some changes to "make it work", but only few optimizations for performance (yet).[*]OpenCL does not (easily) allow direct access to the hardware's capabilities. For instance, no mul24_hi is available in OpenCL, even though the GPU has that instruction.[*]ATI GPUs do not have hardware carry. Even if direct access to the whole instruction set was available, arithmetics with more than 32 bits require additional instructions and/or registers to maintain carry/borrow.[*]The kernel that would most likely get the optimal performance out of the AMD chips is not yet included: A barrett kernel based on 24-bit instructions.[*]OpenCL's multi-threaded approach to driving the GPU has disadvantages in heavily-loaded systems. mfakto will slow down when prime95 runs on the same box - even though mfakto runs at higher priority.[/LIST]Given all that I think it would not be too bad if same-price NV cards delivered only double of AMDs. However, the 35 min per your test is what I get on my HD5770 card (and 2 Phenom-cores @3.4GHz, SievePrimes between 130k and 180k). HD5870 should be 50 to 100% faster, so I guess that the limit is not the GPU in your case. HD6970 should add another ~10% speed ... did you test switching to the mul24 kernel which should suit the HD6xxx better?



I have a lot of ideas what I could test/enhance/implement ... however, the current driver issues are not exactly motivating. And time is always limited ...


All times are UTC. The time now is 20:55.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.