![]() |
|
|
#1068 |
|
"James Heinrich"
May 2004
ex-Northern Ontario
65358 Posts |
Each core adds more overall performance, there's never a case where X cores does more total work than X+1 cores, but the performance of each core drops the more loaded the CPU is.
So the answer to your question would be: "4 cores" Last fiddled with by James Heinrich on 2011-07-13 at 17:19 |
|
|
|
|
|
#1069 |
|
Jun 2011
8316 Posts |
mfaktc performance drops 50% on loading 4th core. If all cores suffer the same penalty then 4/4*0.5 is less then 3/4. But Prime95 performance does not seem to drop off as badly on addition of the 4th core... I'll need to do some tests. As it is 8800 GTS performs at 3/4 of GTX 465 :-(
|
|
|
|
|
|
#1070 |
|
"James Heinrich"
May 2004
ex-Northern Ontario
11·311 Posts |
Someone else will likely correct me, but I believe a GTX 465 needs more than 1 instance to show its potential. I'd try 2 instances of mfaktc and 2 Prime95 workers and see what your overall throughput is like.
|
|
|
|
|
|
#1071 |
|
Mar 2010
41110 Posts |
Yes. Even with 64 bit mfaktc.
|
|
|
|
|
|
#1072 |
|
Jun 2011
131 Posts |
I do not know how to check but it was said that CUDALucas maxes out GPU. Maybe I'll just run CUDALucas on 465 (+4 Prime95 workers) and run mfaktc on my two 8800?
|
|
|
|
|
|
#1073 | |
|
Jun 2011
131 Posts |
Quote:
I have also modified mfaktc so it no longer needs atomics and compiles for any cuda compute capability under CUDA 2.2/3.1/3.2. I haven't tried 4.0 but I do not see why it would have a problem with that. Anyway here's modified mfaktc: |
|
|
|
|
|
|
#1074 |
|
"Oliver"
Mar 2005
Germany
11×101 Posts |
aspen: why didn't you change the version string? Seems that you did alot more changes than just the removal of the atomics...
Oliver |
|
|
|
|
|
#1075 |
|
"Oliver"
Mar 2005
Germany
11·101 Posts |
aspen: your changes seem to screw something up.
![]() On my GTX 8800 (CUDA 4.0) it sometimes fails the short selftest! Performance is half of the expected value (no async CPU/GPU computation?). Code:
running a simple selftest... ERROR: selftest failed for M49635893! expected result: 000F300E 00B13196 00D84F67 reported result: 001DAC4B 001DAC50 001DAC55 reported result: 001DAC57 001DAC5D 001DAC5F reported result: 001DAC61 001DAC67 001DAC6B reported result: 001DAC70 001DAC73 001DAC7A reported result: 001DAC7E 001DAC84 001DAC8A reported result: 001DAC8C 001DAC8E 001DAC99 reported result: 001DAC9A 001DAC9E 001DACA0 reported result: 001DACA2 001DACA3 001DACA5 reported result: 001DACAF 001DACB0 001DACB5 reported result: 001DACB6 001DACBD 001DACBE Selftest statistics number of tests 31 successfull tests 30 wrong factor reported 1 selftest FAILED! Oliver Last fiddled with by TheJudger on 2011-07-17 at 23:48 |
|
|
|
|
|
#1076 |
|
Dec 2010
Monticello
5·359 Posts |
Hi Oliver:
I've been putting my time into parse.c ... gone through 1 re-write, need another to get it organized with a parse_line function that returns as a structure with both the data found and the original line. I really don't have time to check over apsen's changes right now, as work has gotten rather rough....I'm supposed to be doing something I never have done before, with few resources and little support. |
|
|
|
|
|
#1077 | |
|
Jun 2011
131 Posts |
Quote:
For me all tests (including long one) come up fine. I did have problem in the interim so maybe I need to check if I posted the right version. Also I haven't tested with CUDA 4.0... The idea is simple give each thread it's own chunk of memory to write the results so there's no need to have shared variable. I did have to rearrange the code a little bit to make it possible but I tried to keep it so it's easy to do diff. It could use a little straightening otherwise. Last fiddled with by apsen on 2011-07-18 at 02:02 |
|
|
|
|
|
|
#1078 |
|
Dec 2010
Monticello
5×359 Posts |
Someone else says you are dead on, James. Here's why: right now mfaktc is sieving on the CPU...so you will either need a very hot CPU core or two cores to reach full potential on a hot GPU card....
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| The P-1 factoring CUDA program | firejuggler | GPU Computing | 753 | 2020-12-12 18:07 |
| gr-mfaktc: a CUDA program for generalized repunits prefactoring | MrRepunit | GPU Computing | 32 | 2020-11-11 19:56 |
| mfaktc 0.21 - CUDA runtime wrong | keisentraut | Software | 2 | 2020-08-18 07:03 |
| World's second-dumbest CUDA program | fivemack | Programming | 112 | 2015-02-12 22:51 |