mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2013-09-25, 14:35   #177
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23×271 Posts
Default

Hate to break it to you, but LL tests are very unscalable, and bruteforce style will not work. And I can just imagine the latency of lan/vs PCI-E... LL tests aren't something distributable (etc half of work on one gpu and half on another) each iter depends on the previous one.
kracker is offline   Reply With Quote
Old 2013-09-25, 14:36   #178
Robish
 
"Rob Gahan"
Aug 2013
Ireland

22×32 Posts
Default

Quote:
Originally Posted by Karl M Johnson View Post
Multi-gpu support should be added first then.
It's not an easy undertaking, even if a single thread controls all the gpus.
It makes more sense to add multi-gpu support to CUDALucas/clLucas, since it would ideally cut the LL test time by the number of GPUs in the system (if they are the same and versus a single gpu).
Once that is done, merging several rigs and assigning a single exponent to them could be considered.

That's my thinking exactly, I agree it probably would be difficult (way beyond me) but Atom from OCLHashcat has done something similar so there's a starting point for an approach. and the benefits could be enormous.
Robish is offline   Reply With Quote
Old 2013-09-25, 14:39   #179
Robish
 
"Rob Gahan"
Aug 2013
Ireland

22×32 Posts
Smile

Quote:
Originally Posted by kracker View Post
Hate to break it to you, but LL tests are very unscalable, and bruteforce style will not work. And I can just imagine the latency of lan/vs PCI-E... LL tests aren't something distributable (etc half of work on one gpu and half on another) each iter depends on the previous one.

Point taken, I just thought the discussion was worth exploring
Robish is offline   Reply With Quote
Old 2013-09-25, 14:41   #180
VictordeHolland
 
VictordeHolland's Avatar
 
"Victor de Hollander"
Aug 2011
the Netherlands

23×3×72 Posts
Default

It's not all about computing power, memory bandwidth also plays a big role. PCI-E is quite fast, but not as fast as the GPU memory. Connecting >3 GPUs will also require very expensive motherboards and infinityband (or similar). Running one task on each GPU is probably going to be much more efficiƫnt and less expensive.
VictordeHolland is offline   Reply With Quote
Old 2013-09-25, 19:25   #181
Robish
 
"Rob Gahan"
Aug 2013
Ireland

22×32 Posts
Smile

Quote:
Originally Posted by Robish View Post
Point taken, I just thought the discussion was worth exploring
Its just when you read articles like this you wonder if things can advance - dramatically.

http://arstechnica.com/security/2012...rd-in-6-hours/

"350 billion-guess-per-second " I mean...WOW that's powerful. But I take on board Krackers' comments. I don't know enough about LL to even argue ;-)
Robish is offline   Reply With Quote
Old 2013-09-25, 19:30   #182
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23×271 Posts
Default

Quote:
Originally Posted by Robish View Post
Its just when you read articles like this you wonder if things can advance - dramatically.

http://arstechnica.com/security/2012...rd-in-6-hours/

"350 billion-guess-per-second " I mean...WOW that's powerful. But I take on board Krackers' comments. I don't know enough about LL to even argue ;-)
Well, it's not like LL tests take a humongous time to do these days, anyways.

(Wish I had a GPU setup like that)
kracker is offline   Reply With Quote
Old 2013-09-25, 19:31   #183
Robish
 
"Rob Gahan"
Aug 2013
Ireland

1001002 Posts
Default

Quote:
Originally Posted by VictordeHolland View Post
It's not all about computing power, memory bandwidth also plays a big role. PCI-E is quite fast, but not as fast as the GPU memory. Connecting >3 GPUs will also require very expensive motherboards and infinityband (or similar). Running one task on each GPU is probably going to be much more efficiƫnt and less expensive.

You're right. Infiniband is what Gosney uses on his rig. 10 Gb/sec. However he says he bought all cards connectors and switch for $800 on ebay (used) - less than the price of a single high end GPU.

okay I'll shut up about it now, just thought it worth mentioning.. :-)
Robish is offline   Reply With Quote
Old 2013-09-25, 19:36   #184
Robish
 
"Rob Gahan"
Aug 2013
Ireland

22·32 Posts
Default

Quote:
Originally Posted by kracker View Post
Well, it's not like LL tests take a humongous time to do these days, anyways.

(Wish I had a GPU setup like that)

:-) i'm only a newbie here Kracker, did it take longer a few years ago? I suppose it had to of. Older chips n such.

(trying to build a smaller scale one) Learning Linux is tough (no windows support) but I'm getting there :-)
Robish is offline   Reply With Quote
Old 2013-09-25, 21:53   #185
Robish
 
"Rob Gahan"
Aug 2013
Ireland

22·32 Posts
Smile

Quote:
Originally Posted by Robish View Post
:-) i'm only a newbie here Kracker, did it take longer a few years ago? I suppose it had to of. Older chips n such.

(trying to build a smaller scale one) Learning Linux is tough (no windows support) but I'm getting there :-)

I'm sorry did I get distracted....Breaking Bad ROCKS!!!! :-) hic ahem sorry eh er ahem cough, bit drunk right now hmmm ah ill be ok tomorrow and do more LLs :-)

Ps prime searching is fun is it not? Whiskey however......more. ;-)
Robish is offline   Reply With Quote
Old 2013-09-26, 02:13   #186
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

87816 Posts
Default

@msft: by the way, is there a "worktodo.txt" in clLucas right now?
kracker is offline   Reply With Quote
Old 2013-09-26, 06:59   #187
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

3·137 Posts
Default

Quote:
Originally Posted by kracker View Post
Hate to break it to you, but LL tests are very unscalable... LL tests aren't something distributable (etc half of work on one gpu and half on another) each iter depends on the previous one.
Indeed.
Mlucas' parallel code doesn't scale well beyond 4 cores, but there's still an increase in speed versus 4 cores.
There will be no ideal 300% increase in speed for, say, 4 GTX Titans or 4 7970s, but the status of a single potential Mersenne number will surely be known sooner (versus a single GPU).
Besides, the availability of several gaming GPUs inside a single machine is higher than of several server class CPUs.
The idea has a chance to live.
Karl M Johnson is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
Can't get OpenCL to work on HD7950 Ubuntu 14.04.5 LTS VictordeHolland Linux 4 2018-04-11 13:44
OpenCL accellerated lattice siever pstach Factoring 1 2014-05-23 01:03
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
AMD's Graphics Core Next- a reason to accelerate towards OpenCL? Belteshazzar GPU Computing 19 2012-03-07 18:58

All times are UTC. The time now is 07:07.


Mon Aug 2 07:07:48 UTC 2021 up 10 days, 1:36, 0 users, load averages: 2.13, 1.99, 1.63

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.