mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfaktc and PCIe bus width (https://www.mersenneforum.org/showthread.php?t=18011)

Mark Rose 2015-09-17 18:59

[QUOTE=flashjh;410642]Did you ever figure out the answer to your question. I just installed a Titan Z in a PCIe 3.0 socket and I'm no longer able to keep the card at 100%, even with several mfaktc instances. I'm wondering if I need to get a fast MB\CPU combo or if it's a limit of the PCIe 3.0 bus at this point?[/QUOTE]

Is your card running at 16x? Or are the PCIe lanes split with other cards?

LaurV 2015-09-17 19:08

Man, don't waste the Z for factoring, give it a LL and a DC for the same exponent (to check if the residues match), one in each GPU, and let it go.
OTOH, what the "PerfCap Reason" says? You may not be able to max that card for power limitations or thermal/voltage, or whatever other reasons. Also, disable the DP in Nvidia Control Panel if you insist on doing TF with it (it almost doubles the speed).

TheJudger 2015-09-17 19:21

with GPU sieving enabled even PCIe x1 Gen 1 should be more than enough but I don't know for sure.

Oliver

flashjh 2015-09-17 19:35

It's running at 16x, it hovers at @ 97% on both sides. No biggie, just wondering why it won't go to 100 (99%). I have plenty of power to keep the card @ full. I thought the same about the GPU Sieve, so I'll just let it go. Each side is already significantly faster than a 580.

I can run LL\DC on the card, but I had to remove my last TF 580 to put this card in, so if I move to LL, I'll not be doing any TF anymore. Do we have the running TF capacity to lose the 450 GHzDays\Day?

chalsall 2015-09-17 19:43

[QUOTE=flashjh;410679]Do we have the running TF capacity to lose the 450 GHzDays\Day?[/QUOTE]

No. Please.

We should be better off next week, but not today -- we need a bit more of a TF'ed buffer for both LL (various categories) and LLP-1.

I /really/ need to get out more.... :smile:

flashjh 2015-09-17 20:58

[QUOTE=chalsall;410683]No. Please.

We should be better off next week, but not today -- we need a bit more of a TF'ed buffer for both LL (various categories) and LLP-1.

I /really/ need to get out more.... :smile:[/QUOTE]

Yes, agreed! Ok, no problem. I'll leave the 'Z' on TF. It's doing ~1150 GHzDays\Day as of right now. Hope to get a little more out of it, but I need to finish testing.

LaurV 2015-09-18 01:47

[QUOTE=flashjh;410688]Yes, agreed! Ok, no problem. I'll leave the 'Z' on TF. It's doing ~1150 GHzDays\Day as of right now. Hope to get a little more out of it, but I need to finish testing.[/QUOTE]
Now ye talking! I was going to say that 450 is a bit in the lower side for that card... :yucky:

henryzz 2015-09-18 11:32

[QUOTE=flashjh;410642]Did you ever figure out the answer to your question. I just installed a Titan Z in a PCIe 3.0 socket and I'm no longer able to keep the card at 100%, even with several mfaktc instances. I'm wondering if I need to get a fast MB\CPU combo or if it's a limit of the PCIe 3.0 bus at this point?[/QUOTE]

No one got back to me on this. I hope to get a new much faster PC in around a years time. I will do before and after tests for the GPU.

airsquirrels 2015-09-18 12:02

I know mfakto and mfaktc are dealing with different architectures even though they are based off similar code. With that said, PCIe width seems to make a huge difference on fast AMD cards even with GPU sieving on mfakto. I've been looking into why exactly that is but I haven't had much time lately. Is there any reason the number of classes per exponent is set to what it is? In mfakto it looks from the code like we may end up doing a lot of data transfer to/from the card between even GPU sieving and the TF step. Results checking also reads back a bit or so for each k checked which adds up. In my case I'm losing about 30% of my potential capacity due to PCI lane saturation in my hosts.

How similar are the two programs in how they handle scheduling work on the cards?

TheJudger 2015-09-18 17:32

Hi,

not sure how accurate nvidia-smi measures bandwidth, but:
[CODE]# nvidia-smi -l 1 -a | grep Throughput
Tx Throughput : 2000 KB/s
Rx Throughput : 2000 KB/s
Tx Throughput : 24000 KB/s
Rx Throughput : 1000 KB/s
Tx Throughput : 0 KB/s
Rx Throughput : 189000 KB/s
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Tx Throughput : 0 KB/s
[...]
[/CODE]
started at the same time I have started mfaktc on a (factory OCed) GTX 970. Rx/Tx values are shown every second, the first 3 pairs (3 seconds) are during the initial selftest which utilized CPU and GPU sieve enabled kernels. After that it is doing regular work on M73.xxx.xxx with GPU sieving.

Oliver

TheJudger 2015-09-18 17:38

[QUOTE=airsquirrels;410760]Is there any reason the number of classes per exponent is set to what it is?[/QUOTE]

[URL="http://www.mersenneforum.org/showpost.php?p=200887&postcount=37"]yes[/URL] :smile:

There is nothing special about 420/4620, it would work with any other natural number >=1, too, but some numbers allow more efficent sieving than others.

Oliver


All times are UTC. The time now is 11:40.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.