mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2013-01-11, 17:00   #2091
ixfd64
Bemusing Prompter
 
ixfd64's Avatar
 
"Danny"
Dec 2002
California

45338 Posts
Default

Quote:
Originally Posted by TheJudger View Post
Hi,



so you manage the edit the mfaktc.ini but did you read it?
Code:
# Keep in mind that "number of candidates (M/G)" and "rate (M/s)" are NOT
# compareable between CPU- and GPU-sieving. When sieving is done on GPU
# those number count all factor candidates prior to sieving while CPU
# sieving counts the numbers after the sieving process.
#
I'm aware of that; I just didn't think it would make such a big difference. :o
ixfd64 is offline   Reply With Quote
Old 2013-01-11, 17:20   #2092
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

160658 Posts
Default

Quote:
Originally Posted by swl551 View Post
CudaLucas will NOT run at the high OC rates I ran 0.19 on. I learned that instantly with CuLu. The answers regarding CulLus sensitivity to GPU excution errors made sense. No one has stated 0.20 has the similar constraints. Maybe that is what we are uncovering here.
Well that's what he said, is that 0.20 stresses memory where 0.19 does not. CUDALucas' sensitivity is in the memory, and that's what's different between the two versions, so if CUDALucas fails at those higher clocks, then there's the issue.

Thanks for that information TheJudger -- very good to know.
Dubslow is offline   Reply With Quote
Old 2013-01-11, 17:38   #2093
swl551
 
swl551's Avatar
 
Aug 2012
New Hampshire

23·101 Posts
Default

Quote:
Originally Posted by Dubslow View Post
Well that's what he said, is that 0.20 stresses memory where 0.19 does not. CUDALucas' sensitivity is in the memory, and that's what's different between the two versions, so if CUDALucas fails at those higher clocks, then there's the issue.

Thanks for that information TheJudger -- very good to know.
Yes, I am agreeing with the scenario.
swl551 is offline   Reply With Quote
Old 2013-01-11, 21:09   #2094
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

11·101 Posts
Default

Scott,

I'm pretty sure that you had problems with mfaktc 0.19 at your OC clock/voltage, too. But I *guess* that mfaktc has a very high chance for silent errors.
  • once started there is no memory allocation, everything is static after startup
  • read 12 bytes per factor candidate from memory and than run in registers
  • in very very very rare cases some data is written to memory (only when a factor was found), this can be billions of FCs with no data written into memory

The selftest won't catch this usually, each test case typically tests 10-15M FCs but at the end it is only checked whether the known factor was found or not, the other millions of results are not verified. The selftest usually doesn't stress the GPU very hard.

Oliver
TheJudger is offline   Reply With Quote
Old 2013-01-11, 21:20   #2095
swl551
 
swl551's Avatar
 
Aug 2012
New Hampshire

23×101 Posts
Default

Quote:
Originally Posted by TheJudger View Post
Scott,


I'm pretty sure that you had problems with mfaktc 0.19 at your OC clock/voltage, too. But I *guess* that mfaktc has a very high chance for silent errors.
  • once started there is no memory allocation, everything is static after startup
  • read 12 bytes per factor candidate from memory and than run in registers
  • in very very very rare cases some data is written to memory (only when a factor was found), this can be billions of FCs with no data written into memory
The selftest won't catch this usually, each test case typically tests 10-15M FCs but at the end it is only checked whether the known factor was found or not, the other millions of results are not verified. The selftest usually doesn't stress the GPU very hard.

Oliver
Again,
Thank you. I no longer feel there is a code defect causing the problem.
swl551 is offline   Reply With Quote
Old 2013-01-12, 12:09   #2096
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

11×101 Posts
Default

Hi James,

Quote:
Originally Posted by James Heinrich View Post
I'm just curious if you can quantify "horrible"? Presumably it works, but is slower than CPU-sieving even on a slow CPU? By how much?
currently I can't give exact number because my GTX 275 retired (new GTX 680 for my main rig, GTX 470 moved to secondary rig replacing the GTX 275). For mfaktc the GTX 680 is not a very smart decission*1, same speed than GTX 470 (but less electrical energy and noise) but the main purpose is gaming and in this case the 680 is not the worst decission.
From my mind the GPU sieving on GTX 275 was half the speed compared to (GTX 275 + one i7 (Nehalem series) core @ 3.5GHz). The CPU kept the GPU busy easily.
I want to setup a test rig with the 275, I can provide exact numbers someday.

*1I have now permanent access to a Kepler based GPU, perhaps I can tweak the code a little bit but this is no promise.

Oliver
TheJudger is offline   Reply With Quote
Old 2013-01-12, 12:31   #2097
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

11×311 Posts
Default

Quote:
Originally Posted by TheJudger View Post
For mfaktc the GTX 680 is not a very smart decission*1
I thought I remembered you having a GTX 680... can you run a benchmark and submit it on my site, please?
I currently don't have any data on how CC 3.0 performs on v0.20-32 with GPU sieiving. I've updated the chart to reflect the new performance level of CC 2.0 and 2.1 (thanks everyone who submitted benchmarks!) and performance is much more consistent than it was with CPU-sieving before. But nobody with a CC 3.0 card has submitted a benchmark yet.
So until someone does (preferably several someones) the relative performance of all CC 3.0 GPUs (e.g. GTX 6xx) is probably inaccurate.
James Heinrich is offline   Reply With Quote
Old 2013-01-12, 14:47   #2098
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

11·101 Posts
Default

Hi James,
  • stock GTX 680 (1008MHz, avg. Boost 1058MHz, actual clock around 1080MHz (this is no OC!))
  • M66362159 from 270 to 271
  • mfaktc 0.20, Win32, default settings

Jan 12 15:21 | 4617 100.0% | 1.085 0m00s | 298.90 82485 n.a.%
no factor for M66362159 from 2^70 to 2^71 [mfaktc 0.20 barrett76_mul32_gs]
tf(): total time spent: 17m 21.457s

OK?


Oliver

Last fiddled with by TheJudger on 2013-01-12 at 14:47
TheJudger is offline   Reply With Quote
Old 2013-01-12, 15:05   #2099
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

D5D16 Posts
Default

Thanks. I've revised my GFLOPS-GHzd/d ratio down from 15.0 in v0.19, down to 12.0 as a conservative estimate last week, and now with your benchmark down to 11.0 (for comparison, CC 2.0 = 3.6, CC 2.1 = 5.3, CC 1.x = 14.0).
I'll refine this as more users (hopefully) submit benchmarks on my site.
According to these numbers, a GTX 470 is actually still about 7% faster than a GTX 680. But 9% lower power consumption, plus higher gaming performance still counts for something.

Last fiddled with by James Heinrich on 2013-01-12 at 15:07
James Heinrich is offline   Reply With Quote
Old 2013-01-12, 15:53   #2100
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

11×101 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
[...] a GTX 470 is actually still about 7% faster than a GTX 680. But 9% lower power consumption, plus higher gaming performance still counts for something.
well 9% lower TDP, but the real power consumption is much lower.
Those Tesla can measure the powerconsumption directly, Comparing Tesla M2075 (GF110, 448Cores @ 1150MHz) with Tesla K10 (2x GK104, 1536 Cores @ 745MHz) the Tesla K10 has ~70% throuhput per GPU compared to M2075 but power consumption is less than half (~70W per GPU vs. 170W).

Oliver

Last fiddled with by TheJudger on 2013-01-12 at 15:53
TheJudger is offline   Reply With Quote
Old 2013-01-12, 16:16   #2101
bcp19
 
bcp19's Avatar
 
Oct 2011

67910 Posts
Default

Had an interesting situation with the new .20. I have a GTX 560 in a Core2Quad Q6600. Running the 32 bit program by itself on core 4, it holds a steady 210-220 GHzd/d, but if I start P95 and run DC on cores 1 and 3, mfakts starts varying wildly between 140 and 200 GHzd/d. I switched to the 64 bit program to see what would happen, and even with P95 running it stays fairly steady between 210-220 GHzd/d.
bcp19 is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
The P-1 factoring CUDA program firejuggler GPU Computing 753 2020-12-12 18:07
gr-mfaktc: a CUDA program for generalized repunits prefactoring MrRepunit GPU Computing 32 2020-11-11 19:56
mfaktc 0.21 - CUDA runtime wrong keisentraut Software 2 2020-08-18 07:03
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51

All times are UTC. The time now is 07:42.


Mon Aug 2 07:42:53 UTC 2021 up 10 days, 2:11, 0 users, load averages: 1.37, 1.36, 1.36

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.