mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfaktc: a CUDA program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=12827)

James Heinrich 2011-07-13 17:18

Each core adds more overall performance, there's never a case where X cores does more total work than X+1 cores, but the performance of each core drops the more loaded the CPU is.

So the answer to your question would be: "4 cores"

apsen 2011-07-13 18:26

[QUOTE=James Heinrich;266298]Each core adds more overall performance, there's never a case where X cores does more total work than X+1 cores, but the performance of each core drops the more loaded the CPU is."[/QUOTE]

mfaktc performance drops 50% on loading 4th core. If all cores suffer the same penalty then 4/4*0.5 is less then 3/4. But Prime95 performance does not seem to drop off as badly on addition of the 4th core... I'll need to do some tests. As it is 8800 GTS performs at 3/4 of GTX 465 :-(

James Heinrich 2011-07-13 19:10

Someone else will likely correct me, but I believe a GTX 465 needs more than 1 instance to show its potential. I'd try 2 instances of mfaktc and 2 Prime95 workers and see what your overall throughput is like.

Karl M Johnson 2011-07-13 19:47

Yes. Even with 64 bit mfaktc.

apsen 2011-07-13 20:12

[QUOTE=James Heinrich;266306]Someone else will likely correct me, but I believe a GTX 465 needs more than 1 instance to show its potential. I'd try 2 instances of mfaktc and 2 Prime95 workers and see what your overall throughput is like.[/QUOTE]

I do not know how to check but it was said that CUDALucas maxes out GPU. Maybe I'll just run CUDALucas on 465 (+4 Prime95 workers) and run mfaktc on my two 8800?

apsen 2011-07-17 17:55

1 Attachment(s)
[QUOTE=Christenson;265657]10200 = 27D8....you sure you have the right return-type declared for cudaStreamCreate?

If you are just trying to run mfaktc, I'd be inclined to ignore the "I can't build it" problem. What do you hope to do with the modification?[/QUOTE]

I figured this one out - toolkit mismatch.

I have also modified mfaktc so it no longer needs atomics and compiles for any cuda compute capability under CUDA 2.2/3.1/3.2. I haven't tried 4.0 but I do not see why it would have a problem with that.

Anyway here's modified mfaktc:

TheJudger 2011-07-17 22:57

aspen: why didn't you change the version string? Seems that you did [B]alot[/B] more changes than just the removal of the atomics...

Oliver

TheJudger 2011-07-17 23:38

aspen: your changes seem to screw something up. :sad:

On my GTX 8800 (CUDA 4.0) it [B]sometimes[/B] fails the short selftest!
Performance is half of the expected value (no async CPU/GPU computation?).

[CODE]
running a simple selftest...
ERROR: selftest failed for M49635893!
expected result: 000F300E 00B13196 00D84F67
reported result: 001DAC4B 001DAC50 001DAC55
reported result: 001DAC57 001DAC5D 001DAC5F
reported result: 001DAC61 001DAC67 001DAC6B
reported result: 001DAC70 001DAC73 001DAC7A
reported result: 001DAC7E 001DAC84 001DAC8A
reported result: 001DAC8C 001DAC8E 001DAC99
reported result: 001DAC9A 001DAC9E 001DACA0
reported result: 001DACA2 001DACA3 001DACA5
reported result: 001DACAF 001DACB0 001DACB5
reported result: 001DACB6 001DACBD 001DACBE
Selftest statistics
number of tests 31
successfull tests 30
wrong factor reported 1

selftest FAILED!
[/CODE]

[COLOR="Red"][SIZE="4"]I don't recommend to run aspens version until this if fixed![/SIZE][/COLOR]

Oliver

Christenson 2011-07-18 01:49

Hi Oliver:

I've been putting my time into parse.c ... gone through 1 re-write, need another to get it organized with a parse_line function that returns as a structure with both the data found and the original line.

I really don't have time to check over apsen's changes right now, as work has gotten rather rough....I'm supposed to be doing something I never have done before, with few resources and little support.

apsen 2011-07-18 01:51

[QUOTE=TheJudger;266725]aspen: your changes seem to screw something up. :sad:

On my GTX 8800 (CUDA 4.0) it [B]sometimes[/B] fails the short selftest!
Performance is half of the expected value (no async CPU/GPU computation?).

[COLOR="Red"][SIZE="4"]I don't recommend to run aspens version until this if fixed![/SIZE][/COLOR]

Oliver[/QUOTE]

Sorry, It wasn't really meant for general consumption That's why I did not post the executable. I was hoping for your to take a look at it. We could transfer this to private conversation.

For me all tests (including long one) come up fine. I did have problem in the interim so maybe I need to check if I posted the right version. Also I haven't tested with CUDA 4.0...

The idea is simple give each thread it's own chunk of memory to write the results so there's no need to have shared variable.

I did have to rearrange the code a little bit to make it possible but I tried to keep it so it's easy to do diff. It could use a little straightening otherwise.

Christenson 2011-07-18 01:51

[QUOTE=James Heinrich;266306]Someone else will likely correct me, but I believe a GTX 465 needs more than 1 instance to show its potential. I'd try 2 instances of mfaktc and 2 Prime95 workers and see what your overall throughput is like.[/QUOTE]

Someone else says you are dead on, James. Here's why: right now mfaktc is sieving on the CPU...so you will either need a very hot CPU core or two cores to reach full potential on a hot GPU card....


All times are UTC. The time now is 23:13.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.