![]() |
|
|
#1024 |
|
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
2·33·109 Posts |
Would it be possible to split mfaktc into two programs? One which makes the candidates and writes them to disk and the other which pass the candidates to the gpu. This would remove completely being cpu-bound. As long as enough computers are available(in theory on mersenneforum not just one person depending on file size).
|
|
|
|
|
|
#1025 | |
|
"Oliver"
Mar 2005
Germany
11×101 Posts |
Quote:
![]() Currently mfaktc needs
If you manage to reduce the the needed bandwidth per factor candidate to one 1 byte integer you'll need 100MB/sec. 1 byte per FC is easy if you evaluate those FCs serially but not so easy if you need to do highly parallel and independend. But even if you get it down to 1 bit per FC you'll need 12.5MB/sec for 100M candidates per second. Oliver |
|
|
|
|
|
|
#1026 | |
|
"Lucan"
Dec 2006
England
11001010010102 Posts |
Quote:
"on the fly". I worked with batches of 15,015 x 8 bits for reasons anyone who has tried sieving (which I know includes you!) will understand. David Last fiddled with by davieddy on 2011-06-16 at 18:40 |
|
|
|
|
|
|
#1027 |
|
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
16FE16 Posts |
Good point the volume is just too great. Even if storage size was available then the disk drive would struggle to keep up.
|
|
|
|
|
|
#1028 | |
|
Mar 2003
Melbourne
5×103 Posts |
Quote:
Sheesh tuff crowd. a) not sure what you exactly mean here, I take it you're annoyed when your video card does the equivalent of x GHz-days work, and you get x/100 credit as it found a 'cheap' factor. Seriously deal with it. We're all in the same boat here. And I'm assuming here it's the same as CPU TF work. b) write a lynx script if it annoys you (it's not difficult) I'm hardly a script guru and I did one in a weekend. I'm a fan of what I suggested previously - prime95 have a generic extensions option, rather than writing custom code to sit on top of mfaktc. As a general rule custom code always costs more in the long run, than using generic. c) so, it's using the best of both worlds. Anything else is a compromise. A decent video card requires decent PC. Match appropriately and you shall be rewarded. d) Do a breadth field search rather than a depth field search. Look at my stats above - more factors are found doing a large number of small TFs than doing large bit depth TFs. Or do a periodic rsync to another media if it really concerns you. If you're worried about GPU efficiency - combining some of the earlier stages with "Stages=0" gives similar efficiencies as doing larger bit depths. e) all issues above have work around if it concerns you. I have nothing but praise for mfaktc and the performance it's getting it's awesome. I'm getting 10x the results for similar cost basis. I'd buy Oliver a beer, if we were near each other. :) I guess I'm getting defensive at the 'slightly broken' phrase. If you said here's a list of suggested improvements. I probably wouldn't get so defensive. -- Craig |
|
|
|
|
|
|
#1029 | |
|
"Oliver"
Mar 2005
Germany
100010101112 Posts |
Hi Craig,
Quote:
Oliver |
|
|
|
|
|
|
#1030 |
|
Dec 2010
Monticello
5·359 Posts |
Nucleon, don't take offense...what you have to understand is that I code for a living, and code is never "perfect"....and I intend to address these opportunities by submitting the patches to Mr Oliver, "The Judger"...it's just taken me longer than I would like to find the "gwthread.c" routines in P95 to re-use, and I'm more distractible than I'd like. P95's code for communications is actually pretty straightforward, and indeed uses mutexes to keep lines, messages, and results in one piece between different threads. The only change I would make in P95 is to add a call to block further low-priority (communications thread) access to mutexes when a high-priority thread is within a few seconds of reporting a result and getting more work to do.
As for "splitting" mfaktc into sieving and factoring parts, in some sense, it is split now. The issue with the sieving is that it is to some degree CPU-bound, since there are now several hundred parallel CPUs on the GPU that can use the output of the sieve to run TF tests. I would argue that it might be useful to move the sieving process onto the GPU. The underlying requirement is significant bandwidth, as described above, from the sieving process to the TF testing process. You wouldn't want the sieve output to cross the disk, just memory, which leads back to the single process, multithreaded model now in use. It is a question of whether we can effectively (low cost task switch, maybe stay off the main PCI bus) run heterogeneous threads on the GPU. To me the major problem with using up a significant part of a good CPU is that that CPU is taken away from other GIMPS work, particularly LL tests....as I calculated above, we might remove 10% additional candidates from the LL pool, so lots and lots of LL still has to be done. |
|
|
|
|
|
#1031 | |
|
"Lucan"
Dec 2006
England
2×3×13×83 Posts |
Quote:
being "slightly pregnant", thinking (narrow-mindedly) that a competent exhaustive search for factors was not likely to miss 30% of them. He (Christenson) was merely pointing out my naivety! Anyway, the mystery of the low factor discovery rate has been solved, with the realization that P-1 had already been done. David |
|
|
|
|
|
|
#1032 |
|
"Mike"
Aug 2002
200528 Posts |
FWIW, we like mfaktc the way it is now. The UI is simple and getting work queued up is no problem. None of our computers used for this are networked externally, so we just load things up manually in two week chunks. Not having a GUI is a plus!
We have had only one issue overall, but that was due to user error. (The forum assistant responsible has been beaten mercilessly.) We have always liked the idea of programs doing one thing well, and chaining programs together to do what we want. |
|
|
|
|
|
#1033 |
|
Dec 2010
Monticello
70316 Posts |
The discussion and proposal has been to simply automate the fetching of work and reporting of results, optionally, just as mprime has an option to use or not use primenet.
I'm still feeding mfaktc manually, just that two weeks at a time seems like an awful lot of assignments to handle at once. |
|
|
|
|
|
#1034 | |
|
Mar 2003
Melbourne
5×103 Posts |
Quote:
But by all means include something in the readme or the ini file under stages. The diminishing point of returns for combining seems to around k-4 or k-3, where k is the last bit depth, for the exponents I looked at. -- Craig |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| The P-1 factoring CUDA program | firejuggler | GPU Computing | 753 | 2020-12-12 18:07 |
| gr-mfaktc: a CUDA program for generalized repunits prefactoring | MrRepunit | GPU Computing | 32 | 2020-11-11 19:56 |
| mfaktc 0.21 - CUDA runtime wrong | keisentraut | Software | 2 | 2020-08-18 07:03 |
| World's second-dumbest CUDA program | fivemack | Programming | 112 | 2015-02-12 22:51 |