mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2012-03-16, 19:15   #991
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

1,123 Posts
Exclamation

Quote:
Originally Posted by Dubslow View Post
Even if it is correct, PrimeNet will still require a matching P95 run to complete it. That will happen eventually, but I'm offering my comp so that you guys know in at most 3 days if in fact it is correct or not, without wasting more GPU time.
Yes, in my haste to reply before heading back into work, I replied to the wrong post. I meant this for LaurV's post about the mismatch.

Certainly, either way a P95 run will be required at this point.
flashjh is offline   Reply With Quote
Old 2012-03-17, 04:29   #992
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

41·251 Posts
Default

Thanks for the offer Dubslow.

There is a chance my test is wrong, due to "extreme" conditions I am pushing my hardware. I don't recommend it to anyone, it is not profitable: if you get a 10 or 20 percent more output, but one of the tests is wrong and you need to repeat it, then you are in fact far behind the "normal" "non-extreme" settings, letting apart the fact that the extreme settings can short the lifetime of your hardware a lot. For me this is somehow part of the job and I try to combine business with pleasure

So, with my current setting and hardware, and with CL v1.65 or higher (did not switch yet to 1.66, if the only difference is the spelling of the switch, this does not bother me), I can kill a DC exponent in 8.5 hours, in average. This is the positive side. The negative side is that at this "speed" the probability of errors is high, and I have to repeat one test in x (where x could be 2, 3, 4, no idea, I did not collect enough statistical samples yes, from the current data, it is close to 3).

In this case, the best path to chose would be if I repeat the tests for which a mismatch occurred, by myself. So, it makes no sense for you to run DC and TC (triple checks) with P95, as long as my result could be wrong. I can re-test it MUCH faster. And only if I am confident, if I am sure my result is hardware-errors-free, it makes sense to waste P95 time.

So, the procedure should be like that:
1. I am running DC. If it matches, that is ok.
2. If it does not match, I will not report (to keep the expo) and I will re-run CL1.65 on it, on a a different card (eventually, with a different FFT length). Optional, I can post the result of the first DC test here.
3. If I get a match with original residue, well, my first DC went crazy, let's forget all the story.
4. If I get a match with my initial DC, then here you can come in with your offer to test it with P95. Anyhow, somebody must re-do the (original) P95 test to clear the expo.
5. If there is not match with either my first DC or the original P95 test, go back to step 2.

For 26068439, I am now TC (tripple check) at iteration 19M and it is still a match with my DC test. If I get a final match (in about 5 hours) then it is yours to test it with P95.

Last fiddled with by LaurV on 2012-03-17 at 04:32
LaurV is offline   Reply With Quote
Old 2012-03-17, 04:36   #993
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3×29×83 Posts
Default

My point is, why do a double check on CUDALucas? I can test it almost as fast, and you find out either way if your result is correct or not, without running it twice. (In your terms, skip 2b/3/4 and go straight to P95 for any mismatch, no GPU double check.)
Edit: If you match yourself, don't report it until my test is turned in so we don't have to bother with the reservation system and whatnot. (PM me if you match yourself. I'll have about a 5 minute window around 7 hours from this post to add it immediately, otherwise it'll have to wait another 12.)

Last fiddled with by Dubslow on 2012-03-17 at 04:38
Dubslow is offline   Reply With Quote
Old 2012-03-17, 12:35   #994
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

101000001100112 Posts
Default

Quote:
Originally Posted by Dubslow View Post
My point is, why do a double check on CUDALucas?
First, because is much faster. The CPU can be the same fast only if it uses 4 (or more) cores, all of them in the same time. Those cores can do a better job on some other rice-field.

Second, because I broke the jar, so I should put it back. I don't like to appear with many "bad results" on that list, someone will say I am doing it on purpose, reporting false results to raise my credit. I have already few, from the period of testing CL. So, I decided to refrain from reporting (or say, delay reporting) the DC's for which I have mismatches, and rerun the test to confirm where the bad results lays: is it my DC, or original "first" P95 check? (let's call it FC).

Ok, I don't report it, ok, I don't. But you realize I can not just forget about it, maybe my residue is good, and the original is bad. We found plenty in the past.

So, if FC\neDC, then I will run a TC, using CL, and report my result only:

1. if TC=DC (in such case the expo is still not cleared, a P95 test - in fact is QC, quadruple - must still be done to have a final match, but we only lost 18 hours for my TC)

2. or if TC=FC, in this case my DC was clearly crap, and we don't need to run a P95 test, gaining the 3-4 Days*Core work of the CPU (or one day with 3-4 cores).

It is a win-win, and this way I can make sure that I am only reporting CL DC tests which are free of hardware errors. If there is no mismatch between such CL and a repeated P95 test, then we found a software bug in either CL or P95. It is a win-win-win :D


Ok. So for now I got another match for this:

Code:
Processing result: M( 26248279 )C, 0xccfa579d070618a8, n = 1572864, CUDALucas v1.65
LL test successfully completes double-check of M26248279
Together with the TC for 26068439, which we were discussing before, this makes 7 successes and 2 errors totally with CL v1.65.

I am staying on it for now. It should be nice to have an interactive way to switch between "aggressive" and "polite" by pressing a key, or reading a .ini file every time when there is screen output (not in real time, or after every iteration, even this is possible too, like a CTRL+A or another combination to toggle the agressive_f variable from 0 to 1 and viceversa, and write on the screen "ctrl-a detected, switching to aggressive", or "to polite". When this will be implemented, I will switch :D

So related to 26068439, you see from the attached picture that it would make no sense to waste your time. TC is on the left with lower FFT, DC is on the right with default FFT, I did not see it immediately as I was not at the computer, then I restarted. The final result was FC=TC, so my DC was crap at iteration 24M. Pretty nasty and unlucky too, huh?

edit: grrr I had to rescale it to max 1600..
Attached Thumbnails
Click image for larger version

Name:	3-17-2012 6-37-20 PM.png
Views:	132
Size:	212.7 KB
ID:	7794  

Last fiddled with by LaurV on 2012-03-17 at 12:48
LaurV is offline   Reply With Quote
Old 2012-03-17, 12:38   #995
apsen
 
Jun 2011

131 Posts
Default

Quote:
Originally Posted by Dubslow View Post
If you want, I can run the expo in P95. I could get it done in... (5 days/2.3GHz=x/3.8Ghz) a bit over two days. (Actually probably a bit more due to memory bandwidth, say three.) That's a standing offer, so whenever you guys get a mismatch, don't turn in the result, keep the expo reserved, and I can run it for you.
(The idea is that you don't need to rerun it on the GPUs, when that won't complete the expo.)
I'll take you up on this offer. I've started to run one exponent on P95 but the projected finish time is mid-May :-( so I'd like you to run two exponents:
29027371
29198173

Thanks,
Andriy
apsen is offline   Reply With Quote
Old 2012-03-17, 12:46   #996
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

1,123 Posts
Post

Quote:
Originally Posted by apsen View Post
I'll take you up on this offer. I've started to run one exponent on P95 but the projected finish time is mid-May :-( so I'd like you to run two exponents:
29027371
29198173

Thanks,
Andriy
Dubslow, I can run one if you want the other. Let me know.
flashjh is offline   Reply With Quote
Old 2012-03-17, 19:38   #997
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

1,123 Posts
Default

Quote:
choose fast fft length.
Code:
$ ./CUDALucas -f 1474560 26963099
DEVICE:0------------------------
name                GeForce GTX 550 Ti
~~~
start M26963099 fft length = 1474560
Iteration 10000 M( 26963099 )C, 0x8c15f65348aef031, n = 1474560, CUDALucas v1.66 err = 0.2138 (1:24 real, 8.3918 ms/iter, ETA 62:49:19)
Iteration 20000 M( 26963099 )C, 0x6f319a4dd6b32f62, n = 1474560, CUDALucas v1.66 err = 0.2138 (1:24 real, 8.3752 ms/iter, ETA 62:40:27)
Try.
I'm sure I'm missing something, but what is the method to choose the best FFT size? Where did you get these values?

Quote:
Originally Posted by msft View Post
Code:
CUFFT_Z2Z size= 1474560 time=3.070644 msec
CUFFT_Z2Z size= 1490944 time=4.516933 msec
CUFFT_Z2Z size= 1507328 time=4.897517 msec
CUFFT_Z2Z size= 1523712 time=5.199020 msec
CUFFT_Z2Z size= 1540096 time=5.449145 msec
CUFFT_Z2Z size= 1556480 time=4.972541 msec
CUFFT_Z2Z size= 1572864 time=3.496826 msec

Last fiddled with by flashjh on 2012-03-17 at 19:41
flashjh is offline   Reply With Quote
Old 2012-03-17, 23:24   #998
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

160658 Posts
Default

Quote:
Originally Posted by apsen View Post
I'll take you up on this offer. I've started to run one exponent on P95 but the projected finish time is mid-May :-( so I'd like you to run two exponents:
29027371
29198173

Thanks,
Andriy
The second one has already been double checked (while it was msft both times, one was CL and one was Prime95), and the first one is assigned to ANONYMOUS, so I'd rather not poach. (@Flash: Yes, splitting is perfectly fine by me in the future. Pick one and let me know.)

@Anyone who wants to take this offer: The easiest way to do it is check your CL result BEFORE submitting, and if it doesn't match, DO NOT SUBMIT OR UNRESERVE. When I report my result, you will still have the assignment, and after you report, your result will then clear the expo without it getting reassigned to anyone else.

@LaurV: I haven't tested recently, but I suspect that with just one core, I can get 10-12 ms/iter times on a 26M expo. This is, save perhaps George or Pete with more aggressive OCs, the fastest single-core speed you'll find with Prime95. (Edit: WA predicts 10-11 ms.)

Last fiddled with by Dubslow on 2012-03-17 at 23:45 Reason: found LaurV's iteration times
Dubslow is offline   Reply With Quote
Old 2012-03-17, 23:56   #999
apsen
 
Jun 2011

131 Posts
Default

Quote:
Originally Posted by Dubslow View Post
The second one has already been double checked (while it was msft both times, one was CL and one was Prime95), and the first one is assigned to ANONYMOUS, so I'd rather not poach. (@Flash: Yes, splitting is perfectly fine by me in the future. Pick one and let me know.)

@Anyone who wants to take this offer: The easiest way to do it is check your CL result BEFORE submitting, and if it doesn't match, DO NOT SUBMIT OR UNRESERVE. When I report my result, you will still have the assignment, and after you report, your result will then clear the expo without it getting reassigned to anyone else.
I did not realize msft already reported the second one... But it still looks reserved...


The first one is also me - I just did not realize I was not logged in when I reserved it.

Andriy
apsen is offline   Reply With Quote
Old 2012-03-18, 00:00   #1000
apsen
 
Jun 2011

13110 Posts
Default

Quote:
Originally Posted by apsen View Post
But it still looks reserved...
So much for being reserved... Got an error message submitting it... At least it's no longer reserved.
apsen is offline   Reply With Quote
Old 2012-03-18, 00:02   #1001
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3·29·83 Posts
Default

The second one doesn't look assigned to me, it just looks complete.Cross post

Can you PM me the assignment key for the first one? I can then claim it via PrimeNet. (Normally I wouldn't bother, but since it's currently ANON, there's no reason not to.)

Last fiddled with by Dubslow on 2012-03-18 at 00:04
Dubslow is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Don't DC/LL them with CudaLucas LaurV Data 131 2017-05-02 18:41
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 Brain GPU Computing 13 2016-02-19 15:53
CUDALucas: which binary to use? Karl M Johnson GPU Computing 15 2015-10-13 04:44
settings for cudaLucas fairsky GPU Computing 11 2013-11-03 02:08
Trying to run CUDALucas on Windows 8 CP Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 14:47.


Fri Jul 7 14:47:26 UTC 2023 up 323 days, 12:16, 0 users, load averages: 1.82, 1.47, 1.22

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔