![]() |
[QUOTE=Dubslow;280050]Maximum throughput is determined by time per class. (You'll notice that if you increase SievePrimes and M/s drops a lot, time per class should drop less than M/s; it still will drop however because you are shifting more work to the CPU.) When you say both mfakto instances, you mean one for each card? If you mean two for just one card and SievePrimes is already as low as possible (5000) then there's nothing you can do except run a third instance, if your CPU isn't maxed out already. You're right, the avg. wait times suggest that mfakto is limited by the CPU; since you can't decrease SievePrimes anymore, the only fix is more instances.
Edit: Whoops, cross posting. In mfaktc at least, one of the columns printed to the CLI is "Time per class". If you can't find that, then ETA+classes complete is the next best way to determine overall throughput. If you're comparing two instances with the same SievePrimes, then you can compare M/s. If they're different SievePrimes you need to look at time per class. Edit 2: Can you run two mfakto instances on one GPU, set the affinities (to make sure it's only using two cores of your CPU) and then post their SievePrimes and avg. wait? It sounds like even this won't be pretty... Edit 3: I will come back in ten minutes to avoid more cross posting. Edit 4: Just read your last post in more detail. 2 cores gets one GPU to 65% load? I would still be interested in the numbers.[/QUOTE] Sorry for the cross post. Anyhow, I've got two instances on one GPU with 2 cores. Results: ~ 5.2 sec per post, it running between 1 and 5 classes each time. Avg wait is about 3900 μs now. M/s is 40 on each instance. I was getting about 125 M/s before. GPU load is 41%. SievePrimes autoadjusted itself on both instances from 5000 to 200,000. What do you think? |
The autoadjust is screwing you up. Turn it off and see what you get with SievePrimes=5000, and then try 10,000 since it's trying so hard to get it higher. Do you not have a time per class column?
|
[QUOTE=Dubslow;280052]The autoadjust is screwing you up. Turn it off and see what you get with SievePrimes=5000, and then try 10,000 since it's trying so hard to get it higher. Do you not have a time per class column?[/QUOTE]
Ok, I'll shut it off... no, mfakto does not have a time per class column. |
[QUOTE=flashjh;280053]Ok, I'll shut it off... no, mfakto does not have a time per class column.[/QUOTE]
Hmm... Bdot? Am I remembering mfaktc wrong? (I do not have access to my comp ATM) |
1 Attachment(s)
[QUOTE=Dubslow;280054]Hmm... Bdot? Am I remembering mfaktc wrong? (I do not have access to my comp ATM)[/QUOTE]
Attached a screenshot of 5000. Also, I forced 5000 and 10000. 5000 with 2 instances both on the same GPU and the same two CPU cores puts GPU at 62%. If I change it to 10000 the GPU drops to around 58%. CPU cores are just shy of 100% either way. |
Hmm. That is really odd that the load is at 62% even though avg. wait is so high. Bdot? Maybe you should try letting the CPU sleep?
|
1 Attachment(s)
[QUOTE=Dubslow;280058]Hmm. That is really odd that the load is at 62% even though avg. wait is so high. Bdot? Maybe you should try letting the CPU sleep?[/QUOTE]
If I let it sleep it probably won't wake up again. After testing everything, I think it's clear that my CPU and board architechture just can't keep up with the GPUs. Not that is a bad thing, but I think that for optimum mfakto it's two cores per instance, one instance for each GPU with sieve at 5000 -- thoughts? I attached a screen shot of one of the instances with that setup. CPU sits at about 85%, GPUs are at about 64%. |
[QUOTE=Dubslow;280045]Keep in mind that M/s is not necessarily the best comparison, because M/s changes depending on SievePrimes without affecting actual throughput. Time per class for a similar assignment is a better metric.[/QUOTE]
Correct! [QUOTE=Dubslow;280045]Going with the above, SievePrimes determines how much work is done on the CPU before being sent to the GPU. Essentially, the CPU eliminates ('sieves') out factor candidates that are not prime. The higher SievePrimes is, the more work the CPU does, and the more candidates are eliminated as being composite. The candidates that are not eliminated by the sieve are tested for being a factor on the GPU. Avg. wait tells how long the CPU must wait for the GPU before doing more sieving. If it is less than ~100 μs, then the CPU is being overloaded and out-powered by the GPU. To rectify this, decrease SievePrimes (which shifts more work to the GPU rather than CPU) or run more than one mfakto instance. If Avg. wait is greater than 1000 or 2000 μs, then the process is bottlenecked by the GPU; the CPU is doing a lot of waiting. Fix this by increasing SievePrimes (which shifts more work to the CPU) or run less instances.[/QUOTE] Again correct (and good written)! [QUOTE=Dubslow;280050]Edit: Whoops, cross posting. In mfaktc at least, one of the columns printed to the CLI is "Time per class". If you can't find that, then ETA+classes complete is the next best way to determine overall throughput. If you're comparing two instances with the same SievePrimes, then you can compare M/s. If they're different SievePrimes you need to look at time per class. [/QUOTE] Same SievePrimes and same exponent ist needed for perfect comparison. To be honest it is OK if the exponents are within the same size usually. Oliver |
Correct me if I'm wrong, but I don't think the actual siever is multithreaded, you won't sieve deeper or more efficiently if you assign more than 1 CPU per instance.
So more than 1 CPU per mfakto should not help. Try having 1 or 2 more instance of mfacto running. I'll take my system as an example: [CODE] Instance; sieve; M/s per instance; ETA; Avg. Wait; M/s system; Exp tested per hour 1 26.8M 9000 160M 26m40s 500us 160M 2.25 2 26.8M 26000 120M 33m50s 500us 240M 3.55 3 26.8M 56000 80M 44m15s 600us 240M 4.06 [/CODE] Although I would get more Ghz/Day by running a 3rd instance, Prime95 actually get to have my last 2 cores. |
[QUOTE=diamonddave;280091]Correct me if I'm wrong, but I don't think the actual siever is multithreaded, you won't sieve deeper or more efficiently if you assign more than 1 CPU per instance.
So more than 1 CPU per mfakto should not help. Try having 1 or 2 more instance of mfacto running. I'll take my system as an example: [CODE] Instance; sieve; M/s per instance; ETA; Avg. Wait; M/s system; Exp tested per hour 1 26.8M 9000 160M 26m40s 500us 160M 2.25 2 26.8M 26000 120M 33m50s 500us 240M 3.55 3 26.8M 56000 80M 44m15s 600us 240M 4.06 [/CODE] Although I would get more Ghz/Day by running a 3rd instance, Prime95 actually get to have my last 2 cores.[/QUOTE] I can't speak to the multithreaded siever, but if I only assign 1 core per instance, it way underpowers the GPU. Thanks everyone for the inputs. I was able to max the CPU and push the GPUs fairly hard with three instances, but in the end my CPU doesn't have the ability to maximize the GPUs. All-in-all I can run two instances for about 240 M/s along with Prime95. I'm happy with that. mfakto doesn't have a time per class, per se, but with these settings I get the best throughout and fastest times. |
For whatever reason, the Windows task scheduler seems to be able to put both cores to use on a single thread, and it seems he's gotten the most effective results this way.
|
| All times are UTC. The time now is 22:00. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.