mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfaktc: a CUDA program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=12827)

TheJudger 2010-03-19 22:31

Hi Luigi,

I don't have to report many new things, I was busy with other stuff.
SIEVE_PRIMES is now a runtime option (at the cost of a very very small performance penalty for the siever).

For some reason the current version doesn't work on G80-GPUs :/

Oliver

ET_ 2010-03-20 20:16

[QUOTE=TheJudger;208951]Hi Luigi,

I don't have to report many new things, I was busy with other stuff.
SIEVE_PRIMES is now a runtime option (at the cost of a very very small performance penalty for the siever).

For some reason the current version doesn't work on G80-GPUs :/

Oliver[/QUOTE]

Oliver, I didn't mean to be exigent about your work, I do understand what "Real Life"(TM) means... :smile:

I was just afraid that you could stop your development. If you progress slowly, that's still a progress.

You know, I am sure your work could really help us all factorers, and when you'll release your code I'd like to make a try extending it to Fermat factorization too :razz: (AFAIK the code is similar).

I had similar ups and downs while developing my Mersenne factoring code, and I assure you that having many followers asking me about my progress helped me a lot in speeding the release code...

Take your time, but please don't quit your application as is. :grin:

Luigi

TheJudger 2010-03-21 14:02

Hi Luigi (and everybody else),

as I'm writing this I'm downloading CUDA 3.0. :)
AFAIK CUDA 3.0 doesn't support device emulation so I can remove some lines of the code. CUDA 3.0 favors other debuging methods.

I would like to see some full-length tests by you and others to see if the code runs stable or not. E.g. [B]re-run[/B] factoring attemps (with known factors).
The built-in selftest shows only that the calculations are correct (at least for the 1300+ test cases).

Oliver

P.S. you are welcome to contribute some code (e.g. parsing a worktodo from prime95). Contact me before you start your attempt.

ET_ 2010-03-22 01:08

Well, at least my shortage of time was a win issue, as I hadn't yet the time to install CUDA 2.6 :smile:

I will soon install CUDA 3.0 and rerun the tests with version 0.05.

As for the parsing request, I will gladly help with a function that reads the file and create the appropriate output variables; if you already have a data-structure in mind, you can PM me.

Luigi

TheJudger 2010-04-10 20:54

Hi,

just a few notes for those who are interested in:

The current code [U]should[/U] work on Fermi based GPUs but is not optimized for them. G80 to GT200b have a throughput of a 24x24 integer multiply every 4 clocks and a 32x32 multiply every 16 clocks (per shader core). Fermi can do a 32x32 multiply every clock! I have no clue how fast (or slow) a 24x24 multiply is on Fermi.

The current GPU code uses 3 (6) 32bit integers to handle numbers up to 72 (144) bits. (Out of the 32 bits 24 bits are used so I have 8 bits for carry handling).
For feature versions I'm thinking about a 2nd GPU code path which uses more than 24 bits per 32bit integer. Perhaps 30 bits per chunk so it could handle factors up to 2^89. This code path will use 32x32 multiply, offcourse.
Expected speed (compared to 72bit code path) on G80..GT200b: ~50-65%
On Fermi based GPUs this might be even faster than the 72bit code.

Another variant would have been to use 4 (8) 32bit integers to handle numbers up to 96 (192) bits (just expanding the current 72bit with a 4th chunk). I would expect a speed of ~60% compared to the 72bit code on G80..GT200b. But with this variant I still need to write another code path for good performance on Fermi based cards. To be honest: I've discarded this idea and prefer the version which uses 32x32 multiplies.
---
My current version supports changes of the SIEVE_PRIMES without recompiling, it is read from a configuration file. I can switch on and off the ability of automatic adjustment of SIEVE_PRIMES during runtime in the configuration file aswell. This is not working perfect but it yields allready good performance numbers.
---
I tried to put the 72bit GPU code into a separate .cu file (as a preparation for having multiple code pathes), in the current state I'm not able to built a binary from the code (linking problems).
---
Luigi (ET_) is coding some functions to handle P95 worktodo files. I'm thankful for his help. This will shorten the time to make mfaktc ready for productive usage. :)


Oliver

ixfd64 2010-04-11 00:35

[QUOTE=TheJudger;211326]Luigi (ET_) is coding some functions to handle P95 worktodo files. I'm thankful for his help. This will shorten the time to make mfaktc ready for productive usage. :)[/QUOTE]

That sounds really exciting!

However, problems may arise when it comes to submitting the results. The source code that generates the PrimeNet checksums is not publicly available, so I don't know if results from mfaktc will be rejected. If that's the case, we have three alternatives:

1) E-mail the results to George every time.
2) Convince George to implement the mfaktc code in Prime95.
3) Convince George to configure PrimeNet such that it accepts results from mfaktc. This is definitely a possibility, since PrimeNet already "trusts" [url=http://mersenneforum.org/showthread.php?t=12576&page=4]MacLucasFFTW[/url] (I think).

Personally, I like option #2. :smile:

Mini-Geek 2010-04-11 01:55

[quote=ixfd64;211344]However, problems may arise when it comes to submitting the results. The source code that generates the PrimeNet checksums is not publicly available, so I don't know if results from mfaktc will be rejected. If that's the case, we have three alternatives:

1) E-mail the results to George every time.[/quote]
You could always make it output Prime95-like results, and submit those through the manual results form (no e-mailing needed), but I've got a feeling that wouldn't be preferred...

A GPU-aware Prime95 (if only for TF to start with) would be a great next step. :smile:
[quote=ixfd64;211344]3) Convince George to configure PrimeNet such that it accepts results from mfaktc. This is definitely a possibility, since PrimeNet already "trusts" [URL="http://mersenneforum.org/showthread.php?t=12576&page=4"]MacLucasFFTW[/URL] (I think).[/quote]
Yes, I believe so. [url]http://www.mersenne.org/manual_result/?data=&B1=Submit[/url] lists "MacLucasFFTW lines" as one of the types it looks for. This is another possibility.

cheesehead 2010-04-11 06:26

[quote=ixfd64;211344]However, problems may arise when it comes to submitting the results. The source code that generates the PrimeNet checksums is not publicly available, so I don't know if results from mfaktc will be rejected.[/quote]Perhaps I'm mistaken, but ... I think that factoring result reports submitted to PrimeNet with incorrect checksums are currently:

1) accepted if they are "has a factor" reports (since the reported factor can be quickly verified, there's no reason to ignore a correct factor), but with no PrimeNet/GIMPS credit given to the submitter, or

2) rejected, discarded, or set aside for "manual" examination only if they are no-factor-found reports.

At least, that's what seems logical.

I should conduct a test: after finding a factor, but before communicating with PrimeNet, make a copy of the prime.spl file and then manually edit the checksum on both a "has a factor" result and a no-factor-found result. Now, all I have to do is find a factor by TF! (No, I'm not going to waste good time by repeating the find of a known factor.)

henryzz 2010-04-11 07:18

[quote=ixfd64;211344]That sounds really exciting!

However, problems may arise when it comes to submitting the results. The source code that generates the PrimeNet checksums is not publicly available, so I don't know if results from mfaktc will be rejected. If that's the case, we have three alternatives:

1) E-mail the results to George every time.
2) Convince George to implement the mfaktc code in Prime95.
3) Convince George to configure PrimeNet such that it accepts results from mfaktc. This is definitely a possibility, since PrimeNet already "trusts" [URL="http://mersenneforum.org/showthread.php?t=12576&page=4"]MacLucasFFTW[/URL] (I think).

Personally, I like option #2. :smile:[/quote]
4) Hack Prime95 to get the checksums.

ET_ 2010-04-11 10:14

[QUOTE=henryzz;211359]4) Hack Prime95 to get the checksums.[/QUOTE]

5) Link the object code available to mfaktc executable...

IIRC, the checksum is just a CRC32 code, while the hidden code is related to client/server authentication. So, If I am correct, you can submit factors once the checksum code is ready, but your software may be considered "untrusted".

Luigi

Mini-Geek 2010-04-11 11:47

[quote=cheesehead;211356]I should conduct a test: after finding a factor, but before communicating with PrimeNet, make a copy of the prime.spl file and then manually edit the checksum on both a "has a factor" result and a no-factor-found result. Now, all I have to do is find a factor by TF! (No, I'm not going to waste good time by repeating the find of a known factor.)[/quote]
You could always do TF-LMH or manually choose numbers TFd to a very low bit level and do 1 bit on it to speed the process (and be playing with less consequential things, so you don't have to potentially give up the credit for two full-size TFs).


All times are UTC. The time now is 13:00.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.