mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfaktc: a CUDA program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=12827)

TheJudger 2010-09-07 11:58

[QUOTE=ET_;228777]No way, yours is more memory-effective! :smile:
[/QUOTE]
Yep, but the algorithm is the same.

[QUOTE=ET_;228777]My question was about "sieving" itself. I didn't have the time to look in deep at your code. The whole process of sieving after the initialization consists in clearing bits, and testing those that were not cleared. Am I correct?

Luigi[/QUOTE]

Right, clear bits and create k_tab (test for non-cleared bits)

Oliver

kjaget 2010-09-11 12:27

Windows binary
 
1 Attachment(s)
As requested, here's a windows binary. I've compiled it for 64 bit CPU, arch sm_11, using CUDA 3.1. It includes the executable, the (required?) CUDA dll and my windows makefile. I used the unmodified source code The Judger posted earlier, so grab that if you want to reproduce what I built.

Let me know if there's any problems.

[ATTACH]5671[/ATTACH]

Karl M Johnson 2010-09-13 08:41

Well, I've got a major problem.On GTX 465 and Win 7 x64.
It crashed immediately after passing selftest.
Windows reports it's some BEX64 thing.
A hint: I can remove cudart dll from the app's folder, and it still will pass selftest and then crash.Usually, if they are coded to use cudart, they wont run at all:smile:
Oh, and toyed with settings in different ways.No luck.
ForceWare 258.96 + CUDA Toolkit 3.1 .

TheJudger 2010-09-13 16:21

Hi Karl,

actually I can only give some hints/ideas. :sad:

Can you provide a "screen shot" (copy&paste the text!)?

BEX64... *hmm* can you check your DEP (Data Execution Prevention) settings in Windows and turn it of for mfaktc (only if it was turned on)?

Perhaps you've discovered a memory leak or buffer overflow. :unsure:
Do earlier versions of mfaktc work on this system?

Oliver

Karl M Johnson 2010-09-13 17:12

Ok, I was wrong.
It worked like a charm for this worktodo.txt:
[code]Factor=bla,3321928097,1,69[/code]I cant reproduce it anymore, so I guess [U]everything is fine[/U].
Btw, DEP cant be disabled for 64 bit apps.

Sooo, can I do more intensive testing?
Maybe some worktodo.txt file with a bunch of numbers?

henryzz 2010-09-13 19:53

[quote=Karl M Johnson;229609]Btw, DEP cant be disabled for 64 bit apps.
[/quote]
You could turn it off in the BIOS under CPU features I think.:smile:

Karl M Johnson 2010-09-14 08:27

Ah, yes, that's a clever way to deal with DEP.
One drawback that it will be disabled for all apps.

TheJudger 2010-09-14 11:39

I have to learn alot about 64bit Windows...

Karl:
First you could/should run the builtin (long) selftest: run 'mfaktc.exe -st'
Than you could try to run some numbers with known exponents as "regular TF assignments" via worktodo.txt.

Thank you,
Oliver

Karl M Johnson 2010-09-14 15:28

Ehehe, great, 1557 tests passed:smile:
Oh, and I found this -tt flag, here's the result(no idea what it means):
[CODE]141351095 time measurements within ten seconds
negative steps: 0
zero steps: 131366735
positive steps: 9984360
smallest (non-zero) time step: 1usec
biggest time step: 374usec[/CODE]

Thanks, Oliver!:grin:

TheJudger 2010-09-14 15:57

Hi Karl,

seems that your system is ready for some productive usage! :smile:

About the -tt output: in short it means that everything is fine with the timing function on your system. No negative steps and a fine granularity (catched allmost all 10000000 possible time values within 10 seconds).

I've used the timer test for a strange problem on a Core i7 Linux system with turbo boost enabled vs. an "old" Linux kernel. Seems that the kernel had problems with those "extra" clockcycles from the turbo boost. The timing function sometimes reported ~5 seconds too much/less... This gave negative steps and huge steps in the timer test

Oliver

TheJudger 2010-09-21 12:48

Hi,

some of you know allready:
mfaktc 0.12 will be final, soon ([B]perhaps[/B] next week).
It features two new kernels based on "barretts modular reduction" which are faster than the currents kernels in some ranges (e.g. up to 50% on 2.x GPUs for factors above 2^75 and below 2^79). :smile:
A second feature is a combined binary for sm_11 and sm_20 code so we can have one binary which delivers "optimal" performance on all currently supported GPUs.

What's next? I don't know?
Feel free to post some ideas what I could implement next. Of course posting an idea does [B]not[/B] guarantee that I'll implement it.

Things which won't happen shortterm if ever
- primenet integration
- a GUI
- multicore support (CPU), if one CPU core isn't fast enough just start another instance of mfaktc on another exponent in a separate directory

Oliver


All times are UTC. The time now is 22:55.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.