mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2011-06-07, 13:24   #1
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

59710 Posts
Exclamation mfakto: an OpenCL program for Mersenne prefactoring

This is an early announcement that I have ported parts of Olivers (aka TheJudger) mfaktc to OpenCL.

Currently, I have only the Win64 binary, running an adapted version of Olivers 71-bit-mul24 kernel. Not yet optimized, not yet making use of the vectors available in OpenCL. A very simple (and slow) 95-bit kernel is there as well so that the complete selftest finished successfully on my box.

On my HD5750 it runs about 60M/s in the 50M exponent range - certainly a lot of headroom

As I have only this one ATI GPU I wanted to see if anyone would be willing to help testing on different hardware.

Current requirements: OpenCL 1.1 (i.e. only ATI GPUs), Windows 64-bit.

There's still a lot of work until I may eventually release this to the public, but I'm optimistic for the summer.

Next steps (unordered):
  • Linux port (Is Windows 32-bit needed too?)
  • check, if http://mersenneforum.org/showpost.ph...40&postcount=7 can be used (looks like it's way faster)
  • fast 92/95-bit kernels (barrett)
  • use of vector data types
  • various other performance/optimization tests&enhancements
  • of course, bug fixes
  • docs and licensing stuff
  • clarify if/how this new kid may contribute to primenet
Bdot
Bdot is offline   Reply With Quote
Old 2011-06-07, 14:40   #2
stefano.c
 
Aug 2010
sigma gimps team (Italia)

112 Posts
Default

Hi, I can run it on my hardware.
I've an AMD Radeon HD 6950 1GB, OpenCL 1.1, Windows 7 64-bit.
stefano.c is offline   Reply With Quote
Old 2011-06-07, 18:12   #3
diep
 
diep's Avatar
 
Sep 2006
The Netherlands

2·337 Posts
Default

Quote:
Originally Posted by Bdot View Post
This is an early announcement that I have ported parts of Olivers (aka TheJudger) mfaktc to OpenCL.

Currently, I have only the Win64 binary, running an adapted version of Olivers 71-bit-mul24 kernel. Not yet optimized, not yet making use of the vectors available in OpenCL. A very simple (and slow) 95-bit kernel is there as well so that the complete selftest finished successfully on my box.

On my HD5750 it runs about 60M/s in the 50M exponent range - certainly a lot of headroom

As I have only this one ATI GPU I wanted to see if anyone would be willing to help testing on different hardware.

Current requirements: OpenCL 1.1 (i.e. only ATI GPUs), Windows 64-bit.

There's still a lot of work until I may eventually release this to the public, but I'm optimistic for the summer.

Next steps (unordered):
  • Linux port (Is Windows 32-bit needed too?)
  • check, if http://mersenneforum.org/showpost.ph...40&postcount=7 can be used (looks like it's way faster)
  • fast 92/95-bit kernels (barrett)
  • use of vector data types
  • various other performance/optimization tests&enhancements
  • of course, bug fixes
  • docs and licensing stuff
  • clarify if/how this new kid may contribute to primenet
Bdot
If you have written OpenCL, where do you need a linux port for, as OpenCL is working for linux isn't it?

You speak of the sieve?

Note i wrote previous week a sieve for CPU that generates FC's. It's for wagstaff yet it's really 1 byte to change to have it generate Mersenne FC's.

It's speed is bad though, yet i don't see how in C i can get it faster a lot.

With a 80k primebase it generates 17M/s at 2.3Ghz Barcelona core
With 5000 primes as primebase it generates somewhere around 40M/s,
yet this is not relevant as 5000 is too little, majority of what it generates are composites.
Over factor 2.5 or something (i posted somewhere exact statistics on this).

Idea was to write this first to get some experience writing FC sieve and then write the sieve for GPU.

17M/s is sucking slow of course to feed GPU, yet there are no legal issues with this code as i wrote it myself :)
When it would help you i can put a GPL header on top of it. Mind shipping the opencl you wrote to me,
as i can directly connect that then for Wagstaff here :)

p.s. you also test in same manner like TheJudger, just multiplying zero's?

Vincent

Last fiddled with by diep on 2011-06-07 at 18:16
diep is offline   Reply With Quote
Old 2011-06-07, 18:28   #4
diep
 
diep's Avatar
 
Sep 2006
The Netherlands

2·337 Posts
Default

heh Bdot what speed does your card run at?

my main setup here is a XFX HD6970 running opencl under linux.
the machine is a 16 core 2.3Ghz opteron box (barcelona) with 10GB ram.

Total machine price including gpu is 1300 euro.

Yes, it has no case, that would double the price of the machine!

Obviously it's easy to test onto this.

What's your setup there?

Regards,
Vincent

Last fiddled with by diep on 2011-06-07 at 18:34
diep is offline   Reply With Quote
Old 2011-06-07, 20:40   #5
vsuite
 
Jan 2010

2×3×19 Posts
Default

Hi, would it work with an integrated ATI GPU?

How soon the windows 32bit version?

Cheers
vsuite is offline   Reply With Quote
Old 2011-06-07, 21:09   #6
diep
 
diep's Avatar
 
Sep 2006
The Netherlands

2A216 Posts
Default

Quote:
Originally Posted by vsuite View Post
Hi, would it work with an integrated ATI GPU?

How soon the windows 32bit version?

Cheers
the gpu needs to be 4000 series or newer to work i guess.
the 5970 is only supported for 1 gpu not for 2 (very bad from AMD, i guess because otherwise this gpu is fastest gpu on planet earth for gpgpu, especially from price viewpoint seen).

i do not know about opencl drivers in windows, am typing this from os/x laptop and production machines here use linux. windows2003 server 32 bits will not work for as they have only driver for Vista and newer stuff. Not for server versions AFAIK.

there is a lot of horror reports regarding opencl and nvidia and amd. Especially nvidia. Yet they have cuda with TheJudgers fast code.

Last fiddled with by diep on 2011-06-07 at 21:10
diep is offline   Reply With Quote
Old 2011-06-07, 21:43   #7
Christenson
 
Christenson's Avatar
 
Dec 2010
Monticello

5×359 Posts
Default

And Bdot's code is essentially the judger's, I imagine, except for OpenCL versus CUDA. I don't think the Linux port will be too bad, as mfaktc seems to be pretty much the same on either platform; there is no GUI yet.

I have an integrated ATI Radion on my six-core AMD Linux64 box, too.

Do let's keep these codes in touch so we have a common effort on the non-factoring parts of the problem.

And diep, are these horrors for speed, for correctness, or for setting cards on fire? mfakto will be the first OpenCL app I have any contact with. I now have enough results from mfaktc that I can help test mfakto.
Christenson is offline   Reply With Quote
Old 2011-06-07, 21:45   #8
diep
 
diep's Avatar
 
Sep 2006
The Netherlands

2×337 Posts
Default

Quote:
Originally Posted by Christenson View Post
And Bdot's code is essentially the judger's, I imagine, except for OpenCL versus CUDA. I don't think the Linux port will be too bad, as mfaktc seems to be pretty much the same on either platform; there is no GUI yet.

I have an integrated ATI Radion on my six-core AMD Linux64 box, too.

Do let's keep these codes in touch so we have a common effort on the non-factoring parts of the problem.

And diep, are these horrors for speed, for correctness, or for setting cards on fire? mfakto will be the first OpenCL app I have any contact with. I now have enough results from mfaktc that I can help test mfakto.
Very good that you have experience with mfackt. Have fun helping out this guy.

Note my codebase is called mfockt
diep is offline   Reply With Quote
Old 2011-06-07, 22:51   #9
vsuite
 
Jan 2010

11410 Posts
Default

Quote:
Originally Posted by diep View Post
the gpu needs to be 4000 series or newer to work i guess.
Awwww. Too bad. ATI Radeon 3000 Graphics

Last fiddled with by vsuite on 2011-06-07 at 22:51
vsuite is offline   Reply With Quote
Old 2011-06-08, 02:09   #10
KingKurly
 
KingKurly's Avatar
 
Sep 2010
Annapolis, MD, USA

33×7 Posts
Default

Would the Radeon HD 5450 work for this, eventually? I am using Linux, so I would have to wait for that port to be ready.

http://www.amd.com/us/products/deskt...verview.aspx#2

Linux's 'lspci' says:
02:00.0 VGA compatible controller: ATI Technologies Inc Cedar PRO [Radeon HD 5450]
KingKurly is offline   Reply With Quote
Old 2011-06-08, 02:37   #11
Christenson
 
Christenson's Avatar
 
Dec 2010
Monticello

5·359 Posts
Default

Quote:
Originally Posted by diep View Post
Very good that you have experience with mfackt. Have fun helping out this guy.

Note my codebase is called mfockt
That's a terrible name, especially if it is good. Call it something like TF-noCPU, or CPUFreeTF. And stick a C in there for CUDA and an O in there for OPenCL.

*******
Actually, all of this is small enough to live within one code that goes out and finds out what is available if we want it to.
********
Christenson is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
gpuOwL: an OpenCL program for Mersenne primality testing preda GPU Computing 2234 2020-05-23 15:51
mfaktc: a CUDA program for Mersenne prefactoring TheJudger GPU Computing 3271 2020-05-19 22:42
LL with OpenCL msft GPU Computing 433 2019-06-23 21:11
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
Program to TF Mersenne numbers with more than 1 sextillion digits? Stargate38 Factoring 24 2011-11-03 00:34

All times are UTC. The time now is 06:27.

Sat May 30 06:27:19 UTC 2020 up 66 days, 4 hrs, 1 user, load averages: 2.13, 1.94, 1.78

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.