mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfakto: an OpenCL program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=15646)

Bdot 2011-06-07 13:24

mfakto: an OpenCL program for Mersenne prefactoring
 
This is an early announcement that I have ported parts of Olivers (aka TheJudger) mfaktc to OpenCL.

Currently, I have only the Win64 binary, running an adapted version of Olivers 71-bit-mul24 kernel. Not yet optimized, not yet making use of the vectors available in OpenCL. A very simple (and slow) 95-bit kernel is there as well so that the complete selftest finished successfully on my box.

On my HD5750 it runs about 60M/s in the 50M exponent range - certainly a lot of headroom :smile:

As I have only this one ATI GPU I wanted to see if anyone would be willing to help testing on different hardware.

Current requirements: OpenCL 1.1 (i.e. only ATI GPUs), Windows 64-bit.

There's still a lot of work until I may eventually release this to the public, but I'm optimistic for the summer.

Next steps (unordered):
[LIST][*]Linux port (Is Windows 32-bit needed too?)[*]check, if [URL]http://mersenneforum.org/showpost.php?p=258140&postcount=7[/URL] can be used (looks like it's way faster)[*]fast 92/95-bit kernels (barrett)[*]use of vector data types[*]various other performance/optimization tests&enhancements[*]of course, bug fixes:boxer:[*]docs and licensing stuff :yucky:[*]clarify if/how this new kid may contribute to primenet[/LIST]Bdot

stefano.c 2011-06-07 14:40

Hi, I can run it on my hardware.
I've an AMD Radeon HD 6950 1GB, OpenCL 1.1, Windows 7 64-bit.

diep 2011-06-07 18:12

[QUOTE=Bdot;263174]This is an early announcement that I have ported parts of Olivers (aka TheJudger) mfaktc to OpenCL.

Currently, I have only the Win64 binary, running an adapted version of Olivers 71-bit-mul24 kernel. Not yet optimized, not yet making use of the vectors available in OpenCL. A very simple (and slow) 95-bit kernel is there as well so that the complete selftest finished successfully on my box.

On my HD5750 it runs about 60M/s in the 50M exponent range - certainly a lot of headroom :smile:

As I have only this one ATI GPU I wanted to see if anyone would be willing to help testing on different hardware.

Current requirements: OpenCL 1.1 (i.e. only ATI GPUs), Windows 64-bit.

There's still a lot of work until I may eventually release this to the public, but I'm optimistic for the summer.

Next steps (unordered):
[LIST][*]Linux port (Is Windows 32-bit needed too?)[*]check, if [URL]http://mersenneforum.org/showpost.php?p=258140&postcount=7[/URL] can be used (looks like it's way faster)[*]fast 92/95-bit kernels (barrett)[*]use of vector data types[*]various other performance/optimization tests&enhancements[*]of course, bug fixes:boxer:[*]docs and licensing stuff :yucky:[*]clarify if/how this new kid may contribute to primenet[/LIST]Bdot[/QUOTE]

If you have written OpenCL, where do you need a linux port for, as OpenCL is working for linux isn't it?

You speak of the sieve?

Note i wrote previous week a sieve for CPU that generates FC's. It's for wagstaff yet it's really 1 byte to change to have it generate Mersenne FC's.

It's speed is bad though, yet i don't see how in C i can get it faster a lot.

With a 80k primebase it generates 17M/s at 2.3Ghz Barcelona core
With 5000 primes as primebase it generates somewhere around 40M/s,
yet this is not relevant as 5000 is too little, majority of what it generates are composites.
Over factor 2.5 or something (i posted somewhere exact statistics on this).

Idea was to write this first to get some experience writing FC sieve and then write the sieve for GPU.

17M/s is sucking slow of course to feed GPU, yet there are no legal issues with this code as i wrote it myself :)
When it would help you i can put a GPL header on top of it. Mind shipping the opencl you wrote to me,
as i can directly connect that then for Wagstaff here :)

p.s. you also test in same manner like TheJudger, just multiplying zero's?

Vincent

diep 2011-06-07 18:28

heh Bdot what speed does your card run at?

my main setup here is a XFX HD6970 running opencl under linux.
the machine is a 16 core 2.3Ghz opteron box (barcelona) with 10GB ram.

Total machine price including gpu is 1300 euro.

Yes, it has no case, that would double the price of the machine!

Obviously it's easy to test onto this.

What's your setup there?

Regards,
Vincent

vsuite 2011-06-07 20:40

Hi, would it work with an integrated ATI GPU?

How soon the windows 32bit version?

Cheers

diep 2011-06-07 21:09

[QUOTE=vsuite;263212]Hi, would it work with an integrated ATI GPU?

How soon the windows 32bit version?

Cheers[/QUOTE]

the gpu needs to be 4000 series or newer to work i guess.
the 5970 is only supported for 1 gpu not for 2 (very bad from AMD, i guess because otherwise this gpu is fastest gpu on planet earth for gpgpu, especially from price viewpoint seen).

i do not know about opencl drivers in windows, am typing this from os/x laptop and production machines here use linux. windows2003 server 32 bits will not work for as they have only driver for Vista and newer stuff. Not for server versions AFAIK.

there is a lot of horror reports regarding opencl and nvidia and amd. Especially nvidia. Yet they have cuda with TheJudgers fast code.

Christenson 2011-06-07 21:43

And Bdot's code is essentially the judger's, I imagine, except for OpenCL versus CUDA. I don't think the Linux port will be too bad, as mfaktc seems to be pretty much the same on either platform; there is no GUI yet.

I have an integrated ATI Radion on my six-core AMD Linux64 box, too.

Do let's keep these codes in touch so we have a common effort on the non-factoring parts of the problem.

And diep, are these horrors for speed, for correctness, or for setting cards on fire? mfakto will be the first OpenCL app I have any contact with. I now have enough results from mfaktc that I can help test mfakto.

diep 2011-06-07 21:45

[QUOTE=Christenson;263215]And Bdot's code is essentially the judger's, I imagine, except for OpenCL versus CUDA. I don't think the Linux port will be too bad, as mfaktc seems to be pretty much the same on either platform; there is no GUI yet.

I have an integrated ATI Radion on my six-core AMD Linux64 box, too.

Do let's keep these codes in touch so we have a common effort on the non-factoring parts of the problem.

And diep, are these horrors for speed, for correctness, or for setting cards on fire? mfakto will be the first OpenCL app I have any contact with. I now have enough results from mfaktc that I can help test mfakto.[/QUOTE]

Very good that you have experience with mfackt. Have fun helping out this guy.

Note my codebase is called mfockt

vsuite 2011-06-07 22:51

[QUOTE=diep;263213]the gpu needs to be 4000 series or newer to work i guess.[/QUOTE] Awwww. Too bad. ATI Radeon 3000 Graphics

KingKurly 2011-06-08 02:09

Would the Radeon HD 5450 work for this, eventually? I am using Linux, so I would have to wait for that port to be ready.

[URL]http://www.amd.com/us/products/desktop/graphics/ati-radeon-hd-5000/hd-5450-overview/pages/hd-5450-overview.aspx#2[/URL]

Linux's 'lspci' says:
02:00.0 VGA compatible controller: ATI Technologies Inc Cedar PRO [Radeon HD 5450]

Christenson 2011-06-08 02:37

[QUOTE=diep;263217]Very good that you have experience with mfackt. Have fun helping out this guy.

Note my codebase is called mfockt[/QUOTE]

That's a terrible name, especially if it is good. Call it something like TF-noCPU, or CPUFreeTF. And stick a C in there for CUDA and an O in there for OPenCL.

*******
Actually, all of this is small enough to live within one code that goes out and finds out what is available if we want it to.
********


All times are UTC. The time now is 18:14.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.