![]() |
|
|
#199 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23·271 Posts |
|
|
|
|
|
|
#200 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23·271 Posts |
New x64 binaries uploaded, zip from 1.9 MB to 753 KB, exe from ~5MB to ~500 KB. And yes, tested.
Get it here: http://mersenneforum.org/cllucas/ |
|
|
|
|
|
#201 |
|
Romulan Interpreter
Jun 2011
Thailand
100101101101012 Posts |
Finished my first DCLL with clLucas (win64, exe compiled by kracker, worked well, took about 70-80 hours), matched, so my first successful openCL-LL test (most probably there will not be another one done soon, till improvements in the speed of CL FFT, this card is better suited for other purposes, including coin mining).
Last fiddled with by LaurV on 2013-09-30 at 05:19 |
|
|
|
|
|
#202 | |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23·271 Posts |
Quote:
70-80 hours? My lower end 7770 does one in 90-100 hours... and I get 11 ms, you should get ~4?
Last fiddled with by kracker on 2013-09-30 at 13:52 |
|
|
|
|
|
|
#203 | |
|
Mar 2010
Jyvaskyla, Finland
22·32 Posts |
Quote:
![]() Also agreed on the efficiency point. Of course, even a super fast LL won't pay back the electricity bill, but short tasks are easier to do every now and then, as opposed to dedicating a GPU for some task for a week. Speaking of efficiency, I've wondered if an integer transform would make LL faster on Radeons, which are notoriously fast for integer work. Then I found this ancient post on mersenne.org: http://www.mersenne.org/various/intfft.txt I'm mostly thinking about this on a philosophical level, as I don't know about GPU programming, but surely there is a way to use the fast, parallel integer operations for our integer problem. As I understand it, LL on x86 uses floating point math because that is where the performance is focused by design, but GPUs have different design goals. Also, I think the multiplication algo for big numbers need not use Fourier transform, as long as the transform satisfies the convolution theorem. |
|
|
|
|
|
|
#204 | |
|
Romulan Interpreter
Jun 2011
Thailand
72×197 Posts |
Quote:
I actually wasted about 30 hours for the former test... because I didn't pay attention to the FFT. In fact, I didn't know what to expect, either. Only after your post I checked carefully... Thanks.
Last fiddled with by LaurV on 2013-09-30 at 17:47 |
|
|
|
|
|
|
#205 | |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23×271 Posts |
Quote:
Last fiddled with by kracker on 2013-09-30 at 18:31 |
|
|
|
|
|
|
#206 |
|
Romulan Interpreter
Jun 2011
Thailand
72×197 Posts |
Separate post for separate subject.
Playing with the FFT for the reasons explained before, I found out that saving partial files with CL-Lucas is futile. You can't use the files for anything, as they contain the timestamps which later make the checksums different, and therefore the whole content of the files are different, even if you get the same residues. I did two runs of the first 200k iterations, saving (-s switch) checkpoints every 10k iterations, in two different folders. The resulted files were different each-other (and not only few bytes, but whole parts of the files were completely messed up!) even if I obtained THE SAME residues every time. Then I concluded that the files contain some timestamps or whatever make them differ from one run to the other. This is BAD. They resume properly every time, therefore the files are not wrong. Just different. Again, this is BAD. After a file is saved, there is no (easy) way to see the residue. For the second run I would like to compare if I get the same residues. If I don't have a log file (which can only be done by a screen redirection, or manually copy/paste) of the first run, there is no way to see the residues. See the naming scheme of the checkpoint files used by cudaLucas to understand what I mean (we had the same discussion there, until I convinced Dubslow to adopt that naming scheme, the residue to be "embedded" in the name of the file). As they are now, the checkpoint files are not useful. If I have a match, I don't need them. In this case usually all DC history is deleted. If I have a mismatch, I can't see where the mismatch is, unless I look into the files in binary (grrr! seriously?) or I re-run iterations by binary-search, to see where the mismatch was happened... (like run first from 15M, if is a match, run from 23M, if not, run from 7M, etc, until I find where the mismatch happened - surely not THAT was the saving intended for!). The fact that the files are different does not help me either, to do a binary comparison. I need to RUN those iterations, to find out the mismatched residue, and that is wasting precious time. Either make the files time/computer/etc independent, to be always the same and to be possible to use binary comparison, but better put the residue in the name, in case I still want to use different FFT size for triple check... Last fiddled with by LaurV on 2013-09-30 at 18:33 |
|
|
|
|
|
#207 |
|
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
251616 Posts |
How is that different from Prime95 savefiles from two independent, properly started runs? Can you cmp them? No, you can't. Can you compare that they have the same residue (after the proper bit shift)? Yes, you can. (Can you do a lot of other, nefarious things with them? Yes, you can, but that's a separate issue.)
If you think that savefiles should completely match, you are missing a large part of the point of doublecheck in Prime95. Which is this: No one (not even George) can guarantee you that FFT will not produce an overflow or a wrong carry after rounding in very rare curcumstances (less rare if you are close to the FFT useability boundary). A run can produce wrong result even if the hardware is absolutely perfect. For this reason, doublechecks are run with a different shift. This takes care of both hardware and overflow errors. Rarely, the two perfect runs will produce different residues -- and that is not a problem on a grand scale of things, because the third run with yet another shift will set the result straight. Maybe OpenCL implementation has the shift, finally (unlike that other, arguably lazy implementation). I haven't looked. |
|
|
|
|
|
#208 |
|
Romulan Interpreter
Jun 2011
Thailand
72×197 Posts |
We are not talking here about P95. Its saving format is not compatible with our format.
cudaLucas does not implement the shift, all the files are the same, except the last 4 bytes to which Dubslow added some time-stamp, to which I never agreed, but still I can binary-compare files from two different runs. I really doubt that CL implements any shift. If so, there would not be parts of the files the same (whole screen pages) and parts different (whole screen pages) and still show the same residue when I re-run from them (?!?). I suspect something different goes on there. Anyhow, NOT THIS is the issue. I don't care what is inside of the files, as a normal user who has no idea about what "binary" means... What I want is to have a fast way to see the residues, WITHOUT redirecting my outputs for every run into files (I like to see them on screen too, to see the progress). So, when I do the second run, I can compare the residues immediately when they are printed, and STOP the LL test if they differ. I don't give a sh!t what's in the file. The "content" of the file will only be used in case I want to resume, which mostly I don't want, and I will not do. I am interested in the residues from the former run, not in the checkpoint files. Only in case the residues do not match, I will use the files to resume both runs, and see which one of the two went nuts. But is the residues what I am after (edit: obviously, when I say "residues", I always mean the truncated, 16-hex character residues, not all the stuff) cudaLucas solves this very elegantly, putting the residue in the name of the files. This was what I was screaming for.... If I have the files in the saving folder, I see all residues, and when I do the second run (in case of a triple check or mismatch) I see immediately where the error was, and need only to re-run very few iterations. I even can use a batch file to compare them automatically and abort the LL if is printing things which are not in the NAMES of the files from the former run... Last fiddled with by LaurV on 2013-09-30 at 19:04 |
|
|
|
|
|
#209 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23·271 Posts |
So, simplified you just want clLucas to dump checkpoint residues to file, is that it?
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| Can't get OpenCL to work on HD7950 Ubuntu 14.04.5 LTS | VictordeHolland | Linux | 4 | 2018-04-11 13:44 |
| OpenCL accellerated lattice siever | pstach | Factoring | 1 | 2014-05-23 01:03 |
| OpenCL for FPGAs | TObject | GPU Computing | 2 | 2013-10-12 21:09 |
| AMD's Graphics Core Next- a reason to accelerate towards OpenCL? | Belteshazzar | GPU Computing | 19 | 2012-03-07 18:58 |