mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2012-03-10, 15:06   #936
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

1,123 Posts
Default

I had a match but the assignment had already been turned in. Good news is the original LL was bad because my 1.64 matched David's CUDALucas run.

M( 26002063 )C, 0x1c5e4ca283b033__, n = 1572864, CUDALucas v1.64
flashjh is offline   Reply With Quote
Old 2012-03-10, 15:12   #937
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

11101011101112 Posts
Default

Quote:
Originally Posted by LaurV View Post
Which indeed would need to be much lower to catch the spikes going over 0.5. If you check it at every iteration (what -t is doing) then comparing it with 0.5 would be enough.
This is not correct. A roundoff error of 0.49 is harmless, but a roundoff error of 0.51 is deadly. The problem is the program will correctly report both as 0.49.

So if CUDALucas reports a round off error of 0.49, how confident are you that it really wasn't a deadly roundoff of 0.51??? This is why PFGW aborts (actually switches to a larger FFT length) when the roundoff error exceeds 0.45. Prime95 retries the iteration if the roundoff exceeds 0.40.
Prime95 is offline   Reply With Quote
Old 2012-03-10, 17:02   #938
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

25BF16 Posts
Default

Quote:
Originally Posted by Prime95 View Post
This is not correct. A roundoff error of 0.49 is harmless, but a roundoff error of 0.51 is deadly. The problem is the program will correctly report both as 0.49.

So if CUDALucas reports a round off error of 0.49, how confident are you that it really wasn't a deadly roundoff of 0.51??? This is why PFGW aborts (actually switches to a larger FFT length) when the roundoff error exceeds 0.45. Prime95 retries the iteration if the roundoff exceeds 0.40.
That was EXACTLY what I was talking about. You may not get that if only read post 931, but please read careful my post 929 [edit: the first observation, last part].

Last fiddled with by LaurV on 2012-03-10 at 17:06
LaurV is offline   Reply With Quote
Old 2012-03-10, 22:20   #939
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2·5·61 Posts
Default

Quote:
Originally Posted by Prime95 View Post
So if CUDALucas reports a round off error of 0.49, how confident are you that it really wasn't a deadly roundoff of 0.51??? This is why PFGW aborts (actually switches to a larger FFT length) when the roundoff error exceeds 0.45. Prime95 retries the iteration if the roundoff exceeds 0.40.
Code:
Ver 1.64
default:
if((iteration % 100) == 0 || iteration < 1000)
  if(roundoff > 0.35)
    increasing fft length

-t option:
if(roundoff > 0.49)
  exit program
else if(roundoff > 0.35)
    increasing fft length
Code:
if(roundoff > 0.49)
  exit program
this is experimental code.
msft is offline   Reply With Quote
Old 2012-03-11, 02:48   #940
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

1,123 Posts
Default Another good one

Another 1.64 success
Code:
 
Processing result: M( 26134351 )C, 0xb9d6a5672486c791, n = 1572864, CUDALucas v1.64
LL test successfully completes double-check of M26134351
flashjh is offline   Reply With Quote
Old 2012-03-11, 03:50   #941
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

236568 Posts
Default

Quote:
Originally Posted by flashjh View Post
Another 1.64 success
Code:
 
Processing result: M( 26134351 )C, 0xb9d6a5672486c791, n = 1572864, CUDALucas v1.64
LL test successfully completes double-check of M26134351
Encouraging.
kladner is offline   Reply With Quote
Old 2012-03-11, 18:35   #942
apsen
 
Jun 2011

131 Posts
Default

Quote:
Originally Posted by msft View Post
Code:
M( 29198173 )C, 0x6fd7e4d6557f5b77, n = 1572864, CUDALucas v1.58
correct.
That does not match first time test. I guess I better rerun it with P95.
apsen is offline   Reply With Quote
Old 2012-03-11, 19:29   #943
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

112310 Posts
Default

Quote:
Originally Posted by apsen View Post
That does not match first time test. I guess I better rerun it with P95.
You should submit the result to PrimeNet, it may be correct.
flashjh is offline   Reply With Quote
Old 2012-03-12, 03:38   #944
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

3×3,221 Posts
Default

I finished first-time-LL for 45130601 and 4520386. The tests were done with CL1.64, with -s and -t, so the intermediate residues and all checkpoint files (every 250k iterations) are available if someone wants to do the double check with p95. When (and if) my cores become less loaded, I would attempt a DC with P95 by myself, but this will not be the coming weeks.

Currently, I am testing another expo in the same range (45221537) with 2 cards in the same time, no overclocking. This to be sure if CL.1.64 is "reliable" in the 45M range area (in fact, this is more a test of the fact that the "cheap" gtx580 with 1.5Gig memory which I use are "reliable" from the hardware point of view, at factory speed 782MHz, they shall get the same results, no matter if the software is mathematically correct or not). Up to now, 19M iterations done on both (they have about the same speed, one is a bit slower maybe because it is used as primary display(?!)) and both residues are matching.

edit: Roughly 40 hours to go, I don't use -s and -t, in fact this is the idea, to see how reliable is without checking every iteration, but I am saving the checkpoints (using my batch file posted before) every 30 minutes, in case there will be a mismatch, to avoid starting everything from the beginning. Without -t switch, CL is faster, as discussed before.

Anyhow, if two copies are testing the same exponent (in two different folders) then -s can not be used, as they will try writing the SAME checkpoint files. The idea with the "backup" subfolder was to have it in the current folder, and not in the root of the disk... Like in ".\backup\......." and not "c:\backup\....." Anyhow, you could argue that no one will test the same expo with more copies of CL in the same time, but in the case you re-test the same expo later using -s, the chechpoint files will be overwritten too... Why not let the user to customize the output path?

Last fiddled with by LaurV on 2012-03-12 at 03:54
LaurV is offline   Reply With Quote
Old 2012-03-12, 05:33   #945
Brain
 
Brain's Avatar
 
Dec 2009
Peine, Germany

331 Posts
Default Responsibility

Quote:
Originally Posted by James Heinrich View Post
I just started experimenting with CUDAlucas yesterday. First impressions: it uses zero CPU, but the GPU usage is more aggressive than mfaktc. Normal Windows usage is fine, I can't watch even DVD-quality video smoothly with CUDAlucas whereas it's only 1080 video I have to switch mfaktc off for. Most likely I'll go back to mfaktc, partly for usability, but also because the extra two cores don't scale so well with the new AVX cores in Prime95 (iteration times when running 6 workers are significantly slower than 4 workers).
I cannot even run low res playback with 1.64. Because of lags / bad responsiveness. I suggest - again - a command line switch: for example: --polite or --agressive where --polite would be default. This would insert an artificial CUDA wait loop where other apps (playback) have a go.

It was introduced when an unnecessary CudaMemCpy was killed.
Brain is offline   Reply With Quote
Old 2012-03-12, 06:15   #946
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

3×137 Posts
Default

Quote:
Originally Posted by Brain View Post
I cannot even run low res playback with 1.64. Because of lags / bad responsiveness. I suggest - again - a command line switch: for example: --polite or --agressive where --polite would be default. This would insert an artificial CUDA wait loop where other apps (playback) have a go.

It was introduced when an unnecessary CudaMemCpy was killed.
Or a CL option to control threads and blocks. This way, it's up to the user to decide whether to run at max performance or at some gpu-idle state.
Karl M Johnson is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Don't DC/LL them with CudaLucas LaurV Data 131 2017-05-02 18:41
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 Brain GPU Computing 13 2016-02-19 15:53
CUDALucas: which binary to use? Karl M Johnson GPU Computing 15 2015-10-13 04:44
settings for cudaLucas fairsky GPU Computing 11 2013-11-03 02:08
Trying to run CUDALucas on Windows 8 CP Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 05:21.


Fri Aug 6 05:21:30 UTC 2021 up 13 days, 23:50, 1 user, load averages: 2.78, 2.42, 2.38

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.