![]() |
|
|
#2047 | |
|
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
1C3516 Posts |
Quote:
Like I said, the "gains almost useless" point is (quite a bit) lower than the max he mentions; the max refers to the memory required to process the "standard" amount of relative primes in one pass (where "standard" is a hand-waving over-simplification).
|
|
|
|
|
|
|
#2048 | |
|
"Mike"
Aug 2002
2×23×179 Posts |
Quote:
|
|
|
|
|
|
|
#2049 | |
|
"James Heinrich"
May 2004
ex-Northern Ontario
65358 Posts |
Quote:
However, the P-1 bounds are partially selected based on the amount of RAM available, so a machine with 512MB allocated and another with 20GB allocated won't pick the same bounds for the same exponent. The one with more RAM will pick higher bounds, run a little slower, but have a higher chance of a factor. If they were forced to use the same bounds, the more-RAM machine would run the assignment slightly faster. |
|
|
|
|
|
|
#2050 |
|
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
3·29·83 Posts |
|
|
|
|
|
|
#2051 | ||
|
Bemusing Prompter
"Danny"
Dec 2002
California
95B16 Posts |
Quote:
Quote:
|
||
|
|
|
|
|
#2052 |
|
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
160658 Posts |
Looks like you're missing a header file.
|
|
|
|
|
|
#2053 | |
|
"James Heinrich"
May 2004
ex-Northern Ontario
11×311 Posts |
Quote:
|
|
|
|
|
|
|
#2054 | |
|
"Oliver"
Mar 2005
Germany
45716 Posts |
Hi,
Quote:
The Problem is that you try to compile a file which is just part of another file, so you can't compile tf_96bit_base_math.cu standalone. This file shares some common code which is used by other .cu files (tf_96bit.cu, tf_barrett96.cu and tf_barrett96_gs.cu). Take a look at the makefile and you get those dependencies. I'm not using the Microsoft IDE so I have no project file for you. I'm using GNU Make on Windows, too. Oliver P.S. I plan to upgrade my Windows to CUDA 5.0 within the next few days so I can provide CUDA 5.0 executables, too. Last fiddled with by TheJudger on 2013-01-08 at 15:24 |
|
|
|
|
|
|
#2055 | |
|
Feb 2012
19516 Posts |
Quote:
Thank you |
|
|
|
|
|
|
#2056 |
|
Feb 2012
34·5 Posts |
BTW, I just tried running 0.19 and 0.20 side-by-side, – no problem; everything appears operational (although at a slight [about 5% at a first glance] loss to the overall efficiency).
The 0.20 is insanely fast. Thank you very much. |
|
|
|
|
|
#2057 |
|
Romulan Interpreter
Jun 2011
Thailand
100101101101112 Posts |
Well, not really (LaurV grumpy now!
)Everybody seems to miss the fact that on the old 0.18 you had to run more instances to max the GPU. Comparing the new one with the old one "side by side" is like comparing plums with mangoes: when they are green they look exactly the same except the size. Mangoes are 4 times bigger. With the old version I was able to get 340-360 GHzDays/day from a single card, running 3 or 4 instances, in 3 (non HT) or respective 2 (HT) overclocked CPU cores. With the new one I am able to get 390-410 GHzDays/Day for the same exponent range and the same bit levels, with NO CPU participation. You can not get from a card more that it can give, beside small optimizations. Mfaktc is now a (brilliant) mature product, maybe small future optimizations will make it a bit better and a bit faster, but you won't expect from the future versions to be 100 times faster. Or 10 times faster. Or 3 times faster either! Which I did not expect from 0.20, of course. It is just using the card better, for a small surplus of speed. Of course, if you max the card (like 97-100% busy) with a single instance, than such run would be "insanely fast", theoretically 2-3-4 times faster then the old version for one instance, same as using more cores in P95 to LL/DC the same exponent, the time per iteration halves, or is 3-4 times shorter (and the LL test faster) depending of how many cores you use. Put 3-4 instances of the new mfaktc on the same card, and you will see that the times are comparable. The old one was losing time with CPU/GPU communication, which is "solved" by GPU sieving in the new version. That is where the "additional" speed come (plus other small things ). The biggest advantage of the new version (as I repeatedly said in the past when we were talking of what I want, and what we should expect with the newer versions), is that IT LETS YOUR CPU FREE, beside of the fact that is "a little bit" faster . For me, this (letting the CPU free) is the manna from the heaven! (as everybody knows, my systems are all CPU-bottle-necked). Now I can run P-1, or LL, or DC or aliquots, with the CPU, which before I could not. THIS IS THE BIG ADVANTAGE. For which I bow again to the people who made this possible.
Last fiddled with by LaurV on 2013-01-09 at 08:18 |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| The P-1 factoring CUDA program | firejuggler | GPU Computing | 753 | 2020-12-12 18:07 |
| gr-mfaktc: a CUDA program for generalized repunits prefactoring | MrRepunit | GPU Computing | 32 | 2020-11-11 19:56 |
| mfaktc 0.21 - CUDA runtime wrong | keisentraut | Software | 2 | 2020-08-18 07:03 |
| World's second-dumbest CUDA program | fivemack | Programming | 112 | 2015-02-12 22:51 |