![]() |
|
|
#188 | |
|
Banned
"Luigi"
Aug 2002
Team Italia
61×79 Posts |
Quote:
Total success. Luigi |
|
|
|
|
|
|
#189 | |
|
"Oliver"
Mar 2005
Germany
11×101 Posts |
Quote:
![]() To be honest: I was a little bit afraid, but now I'm happy! Oliver |
|
|
|
|
|
|
#190 | |
|
Banned
"Luigi"
Aug 2002
Team Italia
61·79 Posts |
Quote:
It should be at least 5-10 times faster than Prime95 (I will do some benchmarking tonight). As for the comparison, I have some improvements planned for Factor5_64 bits, but they only take advantage of integer k and GCD. You are right, I wrote Factor5 just to play with very big big exponents (like M41234123412341, that I took up to 82,3 bits and Ernst Mayer to 85, finding a nice big factor). As there was no software ready for that, I didn't optimize it to reach its best efficiency, preferring versatility. :guilty smile: Luigi |
|
|
|
|
|
|
#191 |
|
Jun 2003
117358 Posts |
|
|
|
|
|
|
#192 |
|
Banned
"Luigi"
Aug 2002
Team Italia
61·79 Posts |
|
|
|
|
|
|
#193 |
|
"Oliver"
Mar 2005
Germany
11·101 Posts |
Hi,
I had access to a GTX 480, the mfaktc code works without changes but I needed to adjust the compile script. Without code changes the performance is ~50% higher than on a my GTX 275. This is a little bit lower than expected but anyway it's fast. Using 2 cores of a Core i7 750 (by starting 2 intances of mfaktc 0.07-pre2) it can take two exponents (M115.xxx.xxx) from 2^63 to 2^71 in ~2h 15m. This means ~21 per day! ![]() The siever of 0.07 is a little bit faster compared to 0.06 (at least on Core i7 series, untested on other CPU types). Oliver Last fiddled with by TheJudger on 2010-05-10 at 13:31 |
|
|
|
|
|
#194 |
|
Jun 2005
12910 Posts |
Here's my attempt at a 64-bit windows port. It has successfully found a few factors that I found with earlier versions, but I wouldn't say it's heavily tested.
If anyone wants a 32-bit version I can do that as well. Nvidia is out to annoy me (you can't have both 32 and 64 bit versions of the CUDA tools installed at the same time) but it's not a huge effort to switch back and forth. The ZIP file includes source as well as an EXE. The source is mainly there for TheJudger to look at (and see how badly I mangled it), but others are welcome to fix any bugs they find. Report any problems or questions here to the thread. mfaktc-0.06-win.zip Last fiddled with by kjaget on 2010-05-11 at 00:01 |
|
|
|
|
|
#195 |
|
Jun 2005
3·43 Posts |
Here's a version which includes two changes I've been working on in parallel with the main bit of work from TheJudger.
First off, you can specify how many streams (GPU threads) to spawn in the ini file. I added this because in my testing, I had better results with 3-5 threads instead of the 2 hardcoded in the current version. I think this is an issue specific to Win7, but others may find it useful. The second addition is code which tries to minimize the execution time per TF class - turn it on by setting SievePrimesAdjust=2 in the ini file. This is different than the current code which adjusts to keep the average wait time in an optimum range. On my system, I see a noticable improvement using this approach - I'm curious if it helps on other systems. The code is a delta off the code I posted previously and also includes a windows exe. I don't have linux system to build on, sorry. If the Win porting code I added is OK it should build cleanly, but I'm not willing to guarantee that's 100% true. Use this at your own risk. I've done some testing but certainly not enough to say it's ready for prime time. I'm posting it mostly to get another (few) sets of eyes to look at the code but if anyone wants to run it against some known factors that would be great as well. mfaktc-0.06-hack.zip |
|
|
|
|
|
#196 |
|
Jun 2005
3·43 Posts |
OK, already found a small problem with the hacked version. For the new way of changing sieve primes it was starting at the wrong value (picking the value from the ini file rather than searching starting from the middle of the full range from min to max).
Try this instead of the last one. mfaktc-0.06-hack-2.zip |
|
|
|
|
|
#197 |
|
"Oliver"
Mar 2005
Germany
11·101 Posts |
Hi kjaget,
thank you for your modifications / hints / tests. ![]() I'll take a look at them later. I've played around with the number of GPU streams, too. But on my system there was no difference between 2 and 4 streams. When you say it is usefully on Windows I'll add it. I screwed up the code path without USE_ASYNC_COPY in 0.06. (This was known to me but I didn't write it here or in the readme). I think I'll remove this part completly (not needed except for some integrated GPUs?) Oliver |
|
|
|
|
|
#198 |
|
Banned
"Luigi"
Aug 2002
Team Italia
61×79 Posts |
I just tried hack-2 version on Windows 7 64 bit.
it complains about cudart.dll I have cudart64_30_14.dll and cuda32_30_14.dll on my system. No cudart.dll. Luigi |
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| The P-1 factoring CUDA program | firejuggler | GPU Computing | 753 | 2020-12-12 18:07 |
| gr-mfaktc: a CUDA program for generalized repunits prefactoring | MrRepunit | GPU Computing | 32 | 2020-11-11 19:56 |
| mfaktc 0.21 - CUDA runtime wrong | keisentraut | Software | 2 | 2020-08-18 07:03 |
| World's second-dumbest CUDA program | fivemack | Programming | 112 | 2015-02-12 22:51 |