![]() |
[QUOTE=TheJudger;214150]Hi Luigi,
just to be sure: did mfaktc found all allready known factors within the ranges, too? About the tripplecheck: you noticed that one of the three new factors just appeared with Factor5. Is the tripplecheck still running or didn't they came up? (false positives?) Oliver[/QUOTE] I can confirm that all factors discovered by OBD were found by mfaktc (range 1-69 or 1-71). There are still 5 factors not checked because they are above 71 bits. The three new factors found for the first time by mfaktc have been rediscovered by Factor5. Total success. Luigi |
[QUOTE=ET_;214157]I can confirm that all factors discovered by OBD were found by mfaktc (range 1-69 or 1-71). There are still 5 factors not checked because they are above 71 bits. The three new factors found for the first time by mfaktc have been rediscovered by Factor5.
Total success. Luigi[/QUOTE] Yeah, good news! :groupwave: To be honest: I was a little bit afraid, but now I'm happy! Oliver |
[QUOTE=TheJudger;214151]How fast is it compared to Factor5?
AFAIK Factor5 is using GMP functions which allow _MUCH_ bigger factor limits so the comparison is not 100% fair... Oliver[/QUOTE] mfaktc is about 60-80 times faster than Factor5 in the range 1-71 of 3,321,xxx,xxx exponents. It should be at least 5-10 times faster than Prime95 (I will do some benchmarking tonight). As for the comparison, I have some improvements planned for Factor5_64 bits, but they only take advantage of integer k and GCD. You are right, I wrote Factor5 just to play with very big big exponents (like M41234123412341, that I took up to 82,3 bits and Ernst Mayer to 85, finding a nice big factor). As there was no software ready for that, I didn't optimize it to reach its best efficiency, preferring versatility. :guilty smile: Luigi |
[QUOTE=ET_;214161]mfaktc is about 60-80 times faster than Factor5 in the range 1-71 of 3,321,xxx,xxx exponents.
It should be at least 5-10 times faster than Prime95 (I will do some benchmarking tonight). [/QUOTE] What is your GPU? |
[QUOTE=axn;214162]What is your GPU?[/QUOTE]
GTX 275@1404 MHz, a G200 series. I will test my 9500M GS @ 950 MHz as soon as mfaktc code for Windows will be released... :smile: Luigi |
Hi,
I had access to a GTX 480, the mfaktc code works without changes but I needed to adjust the compile script. Without code changes the performance is ~50% higher than on a my GTX 275. This is a little bit lower than expected but anyway it's fast. Using 2 cores of a Core i7 750 (by starting 2 intances of mfaktc 0.07-pre2) it can take two exponents (M115.xxx.xxx) from 2^63 to 2^71 in ~2h 15m. This means ~21 per day! :smile: The siever of 0.07 is a little bit faster compared to 0.06 (at least on Core i7 series, untested on other CPU types). Oliver |
1 Attachment(s)
Here's my attempt at a 64-bit windows port. It has successfully found a few factors that I found with earlier versions, but I wouldn't say it's heavily tested.
If anyone wants a 32-bit version I can do that as well. Nvidia is out to annoy me (you can't have both 32 and 64 bit versions of the CUDA tools installed at the same time) but it's not a huge effort to switch back and forth. The ZIP file includes source as well as an EXE. The source is mainly there for TheJudger to look at (and see how badly I mangled it), but others are welcome to fix any bugs they find. Report any problems or questions here to the thread. [ATTACH]5176[/ATTACH] |
For the truly brave
1 Attachment(s)
Here's a version which includes two changes I've been working on in parallel with the main bit of work from TheJudger.
First off, you can specify how many streams (GPU threads) to spawn in the ini file. I added this because in my testing, I had better results with 3-5 threads instead of the 2 hardcoded in the current version. I think this is an issue specific to Win7, but others may find it useful. The second addition is code which tries to minimize the execution time per TF class - turn it on by setting SievePrimesAdjust=2 in the ini file. This is different than the current code which adjusts to keep the average wait time in an optimum range. On my system, I see a noticable improvement using this approach - I'm curious if it helps on other systems. The code is a delta off the code I posted previously and also includes a windows exe. I don't have linux system to build on, sorry. If the Win porting code I added is OK it should build cleanly, but I'm not willing to guarantee that's 100% true. Use this at your own risk. I've done some testing but certainly not enough to say it's ready for prime time. I'm posting it mostly to get another (few) sets of eyes to look at the code but if anyone wants to run it against some known factors that would be great as well. [ATTACH]5177[/ATTACH] |
1 Attachment(s)
OK, already found a small problem with the hacked version. For the new way of changing sieve primes it was starting at the wrong value (picking the value from the ini file rather than searching starting from the middle of the full range from min to max).
Try this instead of the last one. [ATTACH]5178[/ATTACH] |
Hi kjaget,
thank you for your modifications / hints / tests. :smile: I'll take a look at them later. I've played around with the number of GPU streams, too. But on my system there was no difference between 2 and 4 streams. When you say it is usefully on Windows I'll add it. [B]I screwed up the code path without USE_ASYNC_COPY in 0.06.[/B] (This was known to me but I didn't write it here or in the readme). I think I'll remove this part completly (not needed except for some integrated GPUs?) Oliver |
I just tried hack-2 version on Windows 7 64 bit.
it complains about cudart.dll I have cudart64_30_14.dll and cuda32_30_14.dll on my system. No cudart.dll. Luigi |
| All times are UTC. The time now is 22:00. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.