mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

flashjh 2012-03-16 19:15

[QUOTE=Dubslow;293225]Even if it is correct, PrimeNet will still require a matching P95 run to complete it. That will happen eventually, but I'm offering my comp so that you guys know in at most 3 days if in fact it is correct or not, without wasting more GPU time.[/QUOTE]

Yes, in my haste to reply before heading back into work, I replied to the wrong post. I meant this for LaurV's post about the mismatch.

Certainly, either way a P95 run will be required at this point.:smile:

LaurV 2012-03-17 04:29

Thanks for the offer Dubslow.

There is a chance my test is wrong, due to "extreme" conditions I am pushing my hardware. I don't recommend it to anyone, it is not profitable: if you get a 10 or 20 percent more output, but one of the tests is wrong and you need to repeat it, then you are in fact far behind the "normal" "non-extreme" settings, letting apart the fact that the extreme settings can short the lifetime of your hardware a lot. For me this is somehow part of the job and I try to combine business with pleasure :smile:

So, with my current setting and hardware, and with CL v1.65 or higher (did not switch yet to 1.66, if the only difference is the spelling of the switch, this does not bother me), I can kill a DC exponent in 8.5 hours, in average. This is the positive side. The negative side is that at this "speed" the probability of errors is high, and I have to repeat one test in x (where x could be 2, 3, 4, no idea, I did not collect enough statistical samples yes, from the current data, it is close to 3).

In this case, the best path to chose would be if I repeat the tests for which a mismatch occurred, by myself. So, it makes no sense for you to run DC and TC (triple checks) with P95, as long as my result could be wrong. I can re-test it MUCH faster. And only if I am confident, if I am sure my result is hardware-errors-free, it makes sense to waste P95 time.

So, the procedure should be like that:
1. I am running DC. If it matches, that is ok.
2. If it does not match, I will not report (to keep the expo) and I will re-run CL1.65 on it, on a a different card (eventually, with a different FFT length). Optional, I can post the result of the first DC test here.
3. If I get a match with original residue, well, my first DC went crazy, let's forget all the story.
4. If I get a match with my initial DC, then [B]here you can come in with your offer to test it with P95. Anyhow, somebody must re-do the (original) P95 test to clear the expo.
[/B]5. If there is not match with either my first DC or the original P95 test, go back to step 2.

For 26068439, I am now TC (tripple check) at iteration 19M and it is still a match with my DC test. If I get a final match (in about 5 hours) then it is yours to test it with P95.

Dubslow 2012-03-17 04:36

My point is, why do a double check on CUDALucas? I can test it almost as fast, and you find out either way if your result is correct or not, without running it twice. (In your terms, skip 2b/3/4 and go straight to P95 for any mismatch, no GPU double check.)
Edit: If you match yourself, don't report it until my test is turned in so we don't have to bother with the reservation system and whatnot. (PM me if you match yourself. I'll have about a 5 minute window around 7 hours from this post to add it immediately, otherwise it'll have to wait another 12.)

LaurV 2012-03-17 12:35

1 Attachment(s)
[QUOTE=Dubslow;293273]My point is, why do a double check on CUDALucas? [/QUOTE]
First, because is much faster. The CPU can be the same fast only if it uses 4 (or more) cores, all of them in the same time. Those cores can do a better job on some other rice-field.

Second, because I broke the jar, so I should put it back. I don't like to appear with many "bad results" on that list, someone will say I am doing it on purpose, reporting false results to raise my credit. I have already few, from the period of testing CL. So, I decided to refrain from reporting (or say, delay reporting) the DC's for which I have mismatches, and rerun the test to confirm where the bad results lays: is it my DC, or original "first" P95 check? (let's call it FC).

Ok, I don't report it, ok, I don't. But you realize I can not just forget about it, maybe my residue is good, and the original is bad. We found plenty in the past.

So, if FC[TEX]\ne[/TEX]DC, then I will run a TC, using CL, and report my result only:

[B]1.[/B] if TC=DC (in such case the expo is still not cleared, a P95 test - in fact is QC, quadruple - must still be done to have a final match, but we only lost 18 hours for my TC)

[B]2. [/B]or if TC=FC, in this case my DC was clearly crap, and we don't need to run a P95 test, gaining the 3-4 Days*Core work of the CPU (or one day with 3-4 cores).

It is a win-win, and this way I can make sure that I am only reporting CL DC tests which are free of hardware errors. If there is no mismatch between such CL and a repeated P95 test, then we found a software bug in either CL or P95. It is a win-win-win :D


Ok. So for now I got another match for this:

[CODE]Processing result: M( 26248279 )C, 0xccfa579d070618a8, n = 1572864, CUDALucas v1.65
LL test successfully completes double-check of M26248279[/CODE]Together with the TC for 26068439, which we were discussing before, this makes 7 successes and 2 errors totally with CL v1.65.

I am staying on it for now. It should be nice to have an interactive way to switch between "aggressive" and "polite" by pressing a key, or reading a .ini file every time when there is screen output (not in real time, or after every iteration, even this is possible too, like a CTRL+A or another combination to toggle the [B]agressive_f[/B] variable from 0 to 1 and viceversa, and write on the screen "ctrl-a detected, switching to aggressive", or "to polite". When this will be implemented, I will switch :D

So related to 26068439, you see from the attached picture that it would make no sense to waste your time. TC is on the left with lower FFT, DC is on the right with default FFT, I did not see it immediately as I was not at the computer, then I restarted. The final result was FC=TC, so my DC was crap at iteration 24M. Pretty nasty and unlucky too, huh?

edit: grrr I had to rescale it to max 1600..

apsen 2012-03-17 12:38

[QUOTE=Dubslow;293221]If you want, I can run the expo in P95. I could get it done in... (5 days/2.3GHz=x/3.8Ghz) a bit over two days. (Actually probably a bit more due to memory bandwidth, say three.) That's a standing offer, so whenever you guys get a mismatch, don't turn in the result, keep the expo reserved, and I can run it for you.
(The idea is that you don't need to rerun it on the GPUs, when that won't complete the expo.)[/QUOTE]

I'll take you up on this offer. I've started to run one exponent on P95 but the projected finish time is mid-May :-( so I'd like you to run two exponents:
29027371
29198173

Thanks,
Andriy

flashjh 2012-03-17 12:46

[QUOTE=apsen;293296]I'll take you up on this offer. I've started to run one exponent on P95 but the projected finish time is mid-May :-( so I'd like you to run two exponents:
29027371
29198173

Thanks,
Andriy[/QUOTE]

Dubslow, I can run one if you want the other. Let me know.

flashjh 2012-03-17 19:38

[QUOTE]choose fast fft length.
[code]
$ ./CUDALucas -f 1474560 26963099
DEVICE:0------------------------
name GeForce GTX 550 Ti
~~~
start M26963099 fft length = 1474560
Iteration 10000 M( 26963099 )C, 0x8c15f65348aef031, n = 1474560, CUDALucas v1.66 err = 0.2138 (1:24 real, 8.3918 ms/iter, ETA 62:49:19)
Iteration 20000 M( 26963099 )C, 0x6f319a4dd6b32f62, n = 1474560, CUDALucas v1.66 err = 0.2138 (1:24 real, 8.3752 ms/iter, ETA 62:40:27)
[/code]
Try.[/QUOTE]

I'm sure I'm missing something, but what is the method to choose the best FFT size? Where did you get these values?

[QUOTE=msft;293161][code]
CUFFT_Z2Z size= 1474560 time=3.070644 msec
CUFFT_Z2Z size= 1490944 time=4.516933 msec
CUFFT_Z2Z size= 1507328 time=4.897517 msec
CUFFT_Z2Z size= 1523712 time=5.199020 msec
CUFFT_Z2Z size= 1540096 time=5.449145 msec
CUFFT_Z2Z size= 1556480 time=4.972541 msec
CUFFT_Z2Z size= 1572864 time=3.496826 msec
[/code][/QUOTE]

Dubslow 2012-03-17 23:24

[QUOTE=apsen;293296]I'll take you up on this offer. I've started to run one exponent on P95 but the projected finish time is mid-May :-( so I'd like you to run two exponents:
29027371
29198173

Thanks,
Andriy[/QUOTE]

The second one has already been double checked (while it was msft both times, one was CL and one was Prime95), and the first one is assigned to ANONYMOUS, so I'd rather not poach. (@Flash: Yes, splitting is perfectly fine by me in the future. Pick one and let me know.)

@Anyone who wants to take this offer: The easiest way to do it is check your CL result BEFORE submitting, and if it doesn't match, DO NOT SUBMIT OR UNRESERVE. When I report my result, you will still have the assignment, and after you report, your result will then clear the expo without it getting reassigned to anyone else.

@LaurV: I haven't tested recently, but I suspect that with just one core, I can get 10-12 ms/iter times on a 26M expo. This is, save perhaps George or Pete with more aggressive OCs, the fastest single-core speed you'll find with Prime95. (Edit: [URL="http://www.wolframalpha.com/input/?i=2.3*17%3D3.9*x"]WA[/URL] predicts 10-11 ms.)

apsen 2012-03-17 23:56

[QUOTE=Dubslow;293329]The second one has already been double checked (while it was msft both times, one was CL and one was Prime95), and the first one is assigned to ANONYMOUS, so I'd rather not poach. (@Flash: Yes, splitting is perfectly fine by me in the future. Pick one and let me know.)

@Anyone who wants to take this offer: The easiest way to do it is check your CL result BEFORE submitting, and if it doesn't match, DO NOT SUBMIT OR UNRESERVE. When I report my result, you will still have the assignment, and after you report, your result will then clear the expo without it getting reassigned to anyone else.
[/QUOTE]

I did not realize msft already reported the second one... But it still looks reserved...


The first one is also me - I just did not realize I was not logged in when I reserved it.

Andriy

apsen 2012-03-18 00:00

[QUOTE=apsen;293332] But it still looks reserved...[/QUOTE]

So much for being reserved... Got an error message submitting it... At least it's no longer reserved.

Dubslow 2012-03-18 00:02

[strike]The second one doesn't look assigned to me, it just looks complete.[/strike][i][SIZE="2"]Cross post :razz:[/SIZE][/i]

Can you PM me the assignment key for the first one? I can then claim it via PrimeNet. (Normally I wouldn't bother, but since it's currently ANON, there's no reason not to.)


All times are UTC. The time now is 23:12.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.