mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfakto: an OpenCL program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=15646)

KyleAskine 2011-11-25 12:21

[QUOTE=Bdot;279761]
Given all that I think it would not be too bad if same-price NV cards delivered only double of AMDs. However, the 35 min per your test is what I get on my HD5770 card (and 2 Phenom-cores @3.4GHz, SievePrimes between 130k and 180k). HD5870 should be 50 to 100% faster, so I guess that the limit is not the GPU in your case. HD6970 should add another ~10% speed ... did you test switching to the mul24 kernel which should suit the HD6xxx better?

[/QUOTE]

That is confusing to me. Unless you get a 100% speed boost from sieving the extra primes, I agree, my card should be significantly faster than yours. So you are saying you have two instances of mfakto running at around 75M/s each, because that would indeed match what I am able to do with one instance (150M/s).

I tried running multiple instances once on my linux box, but the second instance actually locked up. Is there some trick that other people do to get a second instance up and running on linux? I could have done something wrong. However, I don't think it looked significantly faster so I didn't worry about it. Plus this way I could have two cores devoted to mprime, instead of 0.

However, the 6970s are a touch slower than the 5870, perhaps because they actually have fewer shaders than the 5870 (which is the gold standard for shader count, if those are what really matter with the OpenCL implementation).

KyleAskine 2011-11-25 12:51

I have a general mfakto/c question. Is avg. wait the video card waiting for the processor, or the processor waiting for the video card?

Bdot 2011-11-25 12:56

[QUOTE=KyleAskine;279801]That is confusing to me. Unless you get a 100% speed boost from sieving the extra primes, I agree, my card should be significantly faster than yours. So you are saying you have two instances of mfakto running at around 75M/s each, because that would indeed match what I am able to do with one instance (150M/s).

I tried running multiple instances once on my linux box, but the second instance actually locked up. Is there some trick that other people do to get a second instance up and running on linux? I could have done something wrong. However, I don't think it looked significantly faster so I didn't worry about it. Plus this way I could have two cores devoted to mprime, instead of 0.

However, the 6970s are a touch slower than the 5870, perhaps because they actually have fewer shaders than the 5870 (which is the gold standard for shader count, if those are what really matter with the OpenCL implementation).[/QUOTE]

Actually, in order to get both the CPU and the GPU to (almost) full load, I have to run 3 mfakto instances and 3 prime95 threads on my quad-core Phenom. This way, the 3 mfakto instances add up to almost 2 CPU cores (with peaks to ~220%). Two of the prime95 threads advance at normal speed, the third is just taking what's left over (~5-10% CPU, i.e. rather crawling along). I don't pin mfakto to any core, I let Windows7 choose.

Each mfakto instance is running at ~40M/s, which due to the higher SievePrimes turns out to be as much as 1x 150M/s at SievePrimes 5k, when looking at the test throughput.

The lock-up on Linux happend to me quite often, usually with a line "[fglrx] ASIC hang happened" and a stack dump in /var/log/messages. About at the time when I upgraded to 11.9, I gave the card a little more air (moved to the other PCIe slot and higher fan setting). Since then I had no such lock-ups anymore. Not sure which of the actions helped.

I think I'll add a raw performance measurement mode to mfakto, detailing the pure kernel runtime per kernel. This way it would be easier to compare the cards, also to NV/mfaktc. Until then, use tools like GPU-Z, or "aticonfig --od-getclocks | grep load" to find out how much room the GPU still has. I unfortunately have access to only 2 different ATI cards, one of them bound to 11.11 :-(

[QUOTE=KyleAskine]
I have a general mfakto/c question. Is avg. wait the video card waiting for the processor, or the processor waiting for the video card?
[/quote]
The later.

bcp19 2011-11-25 14:25

I'm guessing the cpu 'quality' is a fair factor here? I'm running 2x mfaktc on a Core 2 Quad Q8200/GTS 450 with SievePrimes set to 15000 and getting ~60M/s each with a bit over 2kus wait times. This setup keeps the GPU at 99% load. When I don't lock SievePrimes, the M/s drops (can't remember to what, seems 40 or so) but the overall time to complete the same assignment increased and if I remember correctly, GPU load was 60-70%. My core 2 Duo took both cores to almost max out my 560, but when I upgraded to the 2500k one core outdoes what the Duo did. Is this a fair assumption?

flashjh 2011-11-25 16:13

I'm running a QX9650 with a Gigabyte EP45-UD3P (oc to 400fsb 3.4Ghz) & 8Gb ram.

I have two Sapphire 5870s in crossfire. I can run two instances (I use -d 11 and -d 12). I get ~120M/s each while TF in the 50M range from 70 - 71. If I run one instance, I only get ~130M/s, so two is definitely better. With two instances, my CPU runs 85% across all four cores. When I start Prime95 with one worker LL test my rates drop to ~105M/s with CPU @ 100%. The system is still usable in all circumstances, but I have to shutdown mfakto to use GPU.

flashjh 2011-11-25 19:46

[QUOTE=bcp19;279817]I'm guessing the cpu 'quality' is a fair factor here? I'm running 2x mfaktc on a Core 2 Quad Q8200/GTS 450 with SievePrimes set to 15000 and getting ~60M/s each with a bit over 2kus wait times. This setup keeps the GPU at 99% load. When I don't lock SievePrimes, the M/s drops (can't remember to what, seems 40 or so) but the overall time to complete the same assignment increased and if From what I remember correctly, GPU load was 60-70%. My core 2 Duo took both cores to almost max out my 560, but when I upgraded to the 2500k one core outdoes what the Duo did. Is this a fair assumption?[/QUOTE]

From what I can tell you need two or more instances to max out the GPUs. If you have a slower CPU, you might max it out before you max the GPU. Like in my case, my GPUs sit at 60% each with two instances, but my CPU doesn't have much more to throw at it, sitting at 85%. My wait times are quite low, mostly 0, but up to 200µs. But, when I change sieve to anything other than 5000, my M/s drops way down. Could be the difference between ATI & nVidia?

KyleAskine 2011-11-25 22:17

Alright, I just started a second instance of mfakto on my linux (5870) box. I now have two instances running at around 30000 primes sieved and 100 M/s.

bcp19 2011-11-26 01:22

[QUOTE=flashjh;279858]From what I can tell you need two or more instances to max out the GPUs. If you have a slower CPU, you might max it out before you max the GPU. Like in my case, my GPUs sit at 60% each with two instances, but my CPU doesn't have much more to throw at it, sitting at 85%. My wait times are quite low, mostly 0, but up to 200µs. But, when I change sieve to anything other than 5000, my M/s drops way down. Could be the difference between ATI & nVidia?[/QUOTE]

I had a 6770 in my i5 2400 and with 2 mfakto running I was only seeing ~70% GPU load and I believe 100 M/s combined and I really didn't want to use up yet another core to try and max it out, but the 560 Ti I replaced the 6770 with works with 1 core at 165 M/s and ~65% load. If I start a second Mfaktc I can get up to 95% and roughly 230 M/s. The 560 is almost maxed with 1 core, 1 core gives me 175 M/s and 2 only gets it up to 200 M/s combined.

flashjh 2011-11-26 05:59

4670
 
So I have a 4670 in my older computer. I tested it out after finally getting the ATI 11.9 driver installed with the AGP card. It all works - I get about 12.2 M/s with the CPU @ ~85%. Prime95 can run one LL test with no impact on the M/s, but it takes the CPU to 100%.

Two questions -

First, what exactly does changing the sieved primes do? On my faster machine any number over 5000 slows things down. I haven't messed with it too much on the slower machine, but 5000 or 200000 gives the same result on the slow system.

Second, what is the comparrison between a GPU and a CPU system? So I get 240 M/s out of the fast machine and 12.2 M/s out of the slow one. But, how do I compare whether I'm better off letting Prime95 do the work or letting the GPU handle it?

BTW - I get a little bit better performace out of the 'mfakto_cl_barrett79' kernal. I don't know how many have reported for HD4xxx cards, but I figured I'd let you know since it still says to report from the .ini file. The 'mfakto_cl_71' kernal gives between 10 - 11 M/s, the 'mfakto_cl_barrett79' kernal gives me the 12.2 M/s.

flashjh 2011-11-26 06:03

[QUOTE=bcp19;279895]I had a 6770 in my i5 2400 and with 2 mfakto running I was only seeing ~70% GPU load and I believe 100 M/s combined and I really didn't want to use up yet another core to try and max it out, but the 560 Ti I replaced the 6770 with works with 1 core at 165 M/s and ~65% load. If I start a second Mfaktc I can get up to 95% and roughly 230 M/s. The 560 is almost maxed with 1 core, 1 core gives me 175 M/s and 2 only gets it up to 200 M/s combined.[/QUOTE]

How are you starting multiple instances of mfakto and how are you specifying a single core or multiple cores on your CPU for each instance? Each time I start mfakto no matter how I run it, I see usage across all four cores.

flashjh 2011-11-26 06:09

[QUOTE=KyleAskine;279876]Alright, I just started a second instance of mfakto on my linux (5870) box. I now have two instances running at around 30000 primes sieved and 100 M/s.[/QUOTE]

What did you do to fix it?


All times are UTC. The time now is 14:13.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.