mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   LL with OpenCL (https://www.mersenneforum.org/showthread.php?t=18297)

kracker 2013-09-23 14:26

[QUOTE=LaurV;353858]
it would be a 4THzD difference, which might take time (talking about DC, not about TF where thousands of GHzD/D are possible), even if I work full power and you are sleeping...
[/QUOTE]

That's what I mean. I'm not planning on stopping anytime soon! :razz: Also, should be in top 2 in DC there soon.

[QUOTE=LaurV;353858]
But I just looked to my other stats, which are the GIMPS' Lifetime, and thinking that you may be talking about this:
[ATTACH]10294[/ATTACH]
which should be a trifle, just few days of full throttle to put you far behind... :razz:

Still thinking about it... But for me now it seems to be more important to do some P-1, because I will be soon pushed out of Lifetime's Top100, where I am trying to stay, once I was able to reach it... hehe. So, my cards will do some P-1 for the time being. Let you get some more advance, hehe ... you know what the problem with roosters is?
[/QUOTE]
Fine, so be it! I was really curious on your power and if it was really [SIZE=1]true[/SIZE]...
[quote]
Joking apart, I really like the new clLucas! I certainly have to play more with it! I think you guys did a wonderful job! kotgw and kudos![/quote]I didn't do anything, msft did everything. :smile:
Also, 32 bits should be up in a few days at /clLucas.

kracker 2013-09-23 14:42

@Robish
+1 I would suggest doing a few DC's just to make sure.

[quote]
(quote from the web, I spent some time to search for this old joke, I can never tell it properly in English, but I use to tell it to every young engineer who come to work for the company, when he hits the upper threshold of the door's frame with his head... :razz:, buddy, you are the young rooster here!)
[/quote]

Challenge accepted.

kracker 2013-09-23 23:55

x86(32 bit up.)

Robish 2013-09-24 10:44

[QUOTE=Bdot;353810]For me (rather for a 7850), -threads 64 was fastest. Slightly behind was 128, and a lot slower: 256.[/QUOTE]

Thanks mines a 7870 so they should behave similarly, Ill give it a shot thanks Bdot

kracker 2013-09-24 15:19

Thanks to Xyzzy, clLucas is hosted here:
[URL]http://www.mersenneforum.org/cllucas/[/URL]

Robish 2013-09-24 21:56

-aggressive
 
[QUOTE=kracker;353809]@Robish: Try -aggressive if that does anything for you.

(6 hours till my second DC completes :smile:)[/QUOTE]

Hey Kracker

-aggressive = 10% saving ie 284 drops to 252 hours


Cool. Nice one. Thanks. ;-)

clLucas_x64_1.01 62868347 -f 4194304 -threads 256
Platform :Advanced Micro Devices, Inc.
Device 0 : Pitcairn


start M62868347 fft length = 4194304
Iteration 10000 M( 62868347 )C, 0x2fead152a6afa7d8, n = 4194304, clLucas v1.01 e
rr = 0.002441 (2:42 real, 16.2905 ms/iter, ETA 284:24:15)
^C caught. Writing checkpoint.

clLucas_x64_1.01 62868347 -f 4194304 -threads 256 -aggressive
Platform :Advanced Micro Devices, Inc.
Device 0 : Pitcairn


start M62868347 fft length = 4194304
Iteration 10000 M( 62868347 )C, 0x2fead152a6afa7d8, n = 4194304, clLucas v1.01 e
rr = 0.002441 (2:25 real, 14.4776 ms/iter, ETA 252:45:15)
^C caught. Writing checkpoint.

Robish 2013-09-24 23:39

VCL
 
[QUOTE=msft;353648]Hi,
Thank you observations.[/QUOTE]


Hi msft

Have you any thoughts on distributing the code over several Gpu's?
Ok I may be jumping ahead but Ive been following VCL ([URL="http://www.mosix.com"]www.mosix.com[/URL]) about a year now and it allows massive computation power over TCPIP but makes the local PC think all the GPUs are local within the PC.

Epixoip's video and guides

[URL]http://www.youtube.com/watch?v=3axK5P8xw-E[/URL]

[URL]http://hashcat.net/wiki/doku.php?id=linux_server_howto[/URL]

[URL]http://hashcat.net/wiki/doku.php?id=vcl_cluster_howto[/URL]



Food for thought?

VBCurtis 2013-09-25 03:09

[QUOTE=Robish;354063]Hi msft

Have you any thoughts on distributing the code over several Gpu's?
Ok I may be jumping ahead but Ive been following VCL ([URL="http://www.mosix.com"]www.mosix.com[/URL]) about a year now and it allows massive computation power over TCPIP but makes the local PC think all the GPUs are local within the PC.

Epixoip's video and guides

[URL]http://www.youtube.com/watch?v=3axK5P8xw-E[/URL]

[URL]http://hashcat.net/wiki/doku.php?id=linux_server_howto[/URL]

[URL]http://hashcat.net/wiki/doku.php?id=vcl_cluster_howto[/URL]



Food for thought?[/QUOTE]

How would this have any advantage over running one test on each GPU? If we're after maximum workrate, why pay the networking cost?

Karl M Johnson 2013-09-25 10:35

Multi-gpu support should be added first then.
It's not an easy undertaking, even if a single thread controls all the gpus.
It makes more sense to add multi-gpu support to CUDALucas/clLucas, since it would ideally cut the LL test time by the number of GPUs in the system (if they are the same and versus a single gpu).
Once that is done, merging several rigs and assigning a single exponent to them could be considered.

Robish 2013-09-25 14:08

[QUOTE=VBCurtis;354078]How would this have any advantage over running one test on each GPU? If we're after maximum workrate, why pay the networking cost?[/QUOTE]

I'm not an expert in the area by any means but OclHashcat divides the "work" and spreads the load evenly over all GPUs, ie Job done quicker, ie one queue to maintain.

Add VCL and you get up to 128 GPU cores all working together in unison. ie super computer. potentially completing LL tasks in minutes or seconds instead of days.

Robish 2013-09-25 14:20

[QUOTE=Robish;354106]I'm not an expert in the area by any means but OclHashcat divides the "work" and spreads the load evenly over all GPUs, ie Job done quicker, ie one queue to maintain.

Add VCL and you get up to 128 GPU cores all working together in unison. ie super computer. potentially completing LL tasks in minutes or seconds instead of days.[/QUOTE]

in addition, a single hd7990 has a single precision compute power of 8 teraflops and two teraflops double precision.
if someone had the money to build 64 x 7990 (2 gpu's each) cluster with VCL they could achieve 128 teraflops (Double precision) of compute power (almost equal to all GIMPS current volunteers currently)....


All times are UTC. The time now is 22:00.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.