mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2013-09-27, 23:18   #199
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

216810 Posts
Default

4 DC's finished here

Code:
M( 30766511 )C, 0x1ff14c8237b5e935, n = 2097152, clLucas v1.00
M( 30822937 )C, 0x1c656da41a256c21, n = 2097152, clLucas v1.01
M( 30888499 )C, 0xc296d9ac47d90339, n = 2097152, clLucas v1.01
M( 30976273 )C, 0x6c0367ea40d74647, n = 2097152, clLucas v1.01
kracker is offline   Reply With Quote
Old 2013-09-29, 21:58   #200
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23×271 Posts
Default

New x64 binaries uploaded, zip from 1.9 MB to 753 KB, exe from ~5MB to ~500 KB. And yes, tested.

Get it here: http://mersenneforum.org/cllucas/
kracker is offline   Reply With Quote
Old 2013-09-30, 05:19   #201
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

25B516 Posts
Default

Finished my first DCLL with clLucas (win64, exe compiled by kracker, worked well, took about 70-80 hours), matched, so my first successful openCL-LL test (most probably there will not be another one done soon, till improvements in the speed of CL FFT, this card is better suited for other purposes, including coin mining).

Last fiddled with by LaurV on 2013-09-30 at 05:19
LaurV is offline   Reply With Quote
Old 2013-09-30, 13:52   #202
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

41708 Posts
Default

Quote:
Originally Posted by LaurV View Post
Finished my first DCLL with clLucas (win64, exe compiled by kracker, worked well, took about 70-80 hours), matched, so my first successful openCL-LL test (most probably there will not be another one done soon, till improvements in the speed of CL FFT, this card is better suited for other purposes, including coin mining).
70-80 hours? My lower end 7770 does one in 90-100 hours... and I get 11 ms, you should get ~4?

Last fiddled with by kracker on 2013-09-30 at 13:52
kracker is offline   Reply With Quote
Old 2013-09-30, 14:30   #203
TeknoHog
 
TeknoHog's Avatar
 
Mar 2010
Jyvaskyla, Finland

22·32 Posts
Default

Quote:
Originally Posted by LaurV View Post
Finished my first DCLL with clLucas (win64, exe compiled by kracker, worked well, took about 70-80 hours), matched, so my first successful openCL-LL test (most probably there will not be another one done soon, till improvements in the speed of CL FFT, this card is better suited for other purposes, including coin mining).
Interesting, I didn't know about checking the result details that way. Fortunately, my GPU DCs all match up so far

Also agreed on the efficiency point. Of course, even a super fast LL won't pay back the electricity bill, but short tasks are easier to do every now and then, as opposed to dedicating a GPU for some task for a week.

Speaking of efficiency, I've wondered if an integer transform would make LL faster on Radeons, which are notoriously fast for integer work. Then I found this ancient post on mersenne.org:

http://www.mersenne.org/various/intfft.txt

I'm mostly thinking about this on a philosophical level, as I don't know about GPU programming, but surely there is a way to use the fast, parallel integer operations for our integer problem. As I understand it, LL on x86 uses floating point math because that is where the performance is focused by design, but GPUs have different design goals. Also, I think the multiplication algo for big numbers need not use Fourier transform, as long as the transform satisfies the convolution theorem.
TeknoHog is offline   Reply With Quote
Old 2013-09-30, 17:43   #204
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

72·197 Posts
Default

Quote:
Originally Posted by kracker View Post
70-80 hours? My lower end 7770 does one in 90-100 hours... and I get 11 ms, you should get ~4?
HUH? Right now I see that the FFT was selected... careless. Anyhow the card was only partially used, but to set the things right, I just started a new one (30415969) to which I carefully tuned the FFT size... and the threads, and don't let the program chose what he wants.... grrrr... this is only supposed to take about 37 hours...

I actually wasted about 30 hours for the former test... because I didn't pay attention to the FFT. In fact, I didn't know what to expect, either. Only after your post I checked carefully... Thanks.

Last fiddled with by LaurV on 2013-09-30 at 17:47
LaurV is offline   Reply With Quote
Old 2013-09-30, 18:23   #205
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

216810 Posts
Default

Quote:
Originally Posted by LaurV View Post
HUH? Right now I see that the FFT was selected... careless. Anyhow the card was only partially used, but to set the things right, I just started a new one (30415969) to which I carefully tuned the FFT size... and the threads, and don't let the program chose what he wants.... grrrr... this is only supposed to take about 37 hours...

I actually wasted about 30 hours for the former test... because I didn't pay attention to the FFT. In fact, I didn't know what to expect, either. Only after your post I checked carefully... Thanks.
2097152(*/)2 is the fastest, for now atleast.
Attached Thumbnails
Click image for larger version

Name:	lol.png
Views:	94
Size:	31.9 KB
ID:	10312  

Last fiddled with by kracker on 2013-09-30 at 18:31
kracker is offline   Reply With Quote
Old 2013-09-30, 18:23   #206
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

72×197 Posts
Default

Separate post for separate subject.

Playing with the FFT for the reasons explained before, I found out that saving partial files with CL-Lucas is futile. You can't use the files for anything, as they contain the timestamps which later make the checksums different, and therefore the whole content of the files are different, even if you get the same residues. I did two runs of the first 200k iterations, saving (-s switch) checkpoints every 10k iterations, in two different folders. The resulted files were different each-other (and not only few bytes, but whole parts of the files were completely messed up!) even if I obtained THE SAME residues every time. Then I concluded that the files contain some timestamps or whatever make them differ from one run to the other. This is BAD.

They resume properly every time, therefore the files are not wrong. Just different. Again, this is BAD. After a file is saved, there is no (easy) way to see the residue. For the second run I would like to compare if I get the same residues. If I don't have a log file (which can only be done by a screen redirection, or manually copy/paste) of the first run, there is no way to see the residues.

See the naming scheme of the checkpoint files used by cudaLucas to understand what I mean (we had the same discussion there, until I convinced Dubslow to adopt that naming scheme, the residue to be "embedded" in the name of the file).

As they are now, the checkpoint files are not useful. If I have a match, I don't need them. In this case usually all DC history is deleted. If I have a mismatch, I can't see where the mismatch is, unless I look into the files in binary (grrr! seriously?) or I re-run iterations by binary-search, to see where the mismatch was happened... (like run first from 15M, if is a match, run from 23M, if not, run from 7M, etc, until I find where the mismatch happened - surely not THAT was the saving intended for!). The fact that the files are different does not help me either, to do a binary comparison. I need to RUN those iterations, to find out the mismatched residue, and that is wasting precious time.

Either make the files time/computer/etc independent, to be always the same and to be possible to use binary comparison, but better put the residue in the name, in case I still want to use different FFT size for triple check...

Last fiddled with by LaurV on 2013-09-30 at 18:33
LaurV is offline   Reply With Quote
Old 2013-09-30, 18:44   #207
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

2×47×101 Posts
Default

How is that different from Prime95 savefiles from two independent, properly started runs? Can you cmp them? No, you can't. Can you compare that they have the same residue (after the proper bit shift)? Yes, you can. (Can you do a lot of other, nefarious things with them? Yes, you can, but that's a separate issue.)

If you think that savefiles should completely match, you are missing a large part of the point of doublecheck in Prime95. Which is this: No one (not even George) can guarantee you that FFT will not produce an overflow or a wrong carry after rounding in very rare curcumstances (less rare if you are close to the FFT useability boundary). A run can produce wrong result even if the hardware is absolutely perfect. For this reason, doublechecks are run with a different shift. This takes care of both hardware and overflow errors. Rarely, the two perfect runs will produce different residues -- and that is not a problem on a grand scale of things, because the third run with yet another shift will set the result straight.

Maybe OpenCL implementation has the shift, finally (unlike that other, arguably lazy implementation). I haven't looked.
Batalov is offline   Reply With Quote
Old 2013-09-30, 18:56   #208
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

226658 Posts
Default

We are not talking here about P95. Its saving format is not compatible with our format.

cudaLucas does not implement the shift, all the files are the same, except the last 4 bytes to which Dubslow added some time-stamp, to which I never agreed, but still I can binary-compare files from two different runs. I really doubt that CL implements any shift. If so, there would not be parts of the files the same (whole screen pages) and parts different (whole screen pages) and still show the same residue when I re-run from them (?!?). I suspect something different goes on there.

Anyhow, NOT THIS is the issue. I don't care what is inside of the files, as a normal user who has no idea about what "binary" means... What I want is to have a fast way to see the residues, WITHOUT redirecting my outputs for every run into files (I like to see them on screen too, to see the progress). So, when I do the second run, I can compare the residues immediately when they are printed, and STOP the LL test if they differ. I don't give a sh!t what's in the file. The "content" of the file will only be used in case I want to resume, which mostly I don't want, and I will not do. I am interested in the residues from the former run, not in the checkpoint files. Only in case the residues do not match, I will use the files to resume both runs, and see which one of the two went nuts. But is the residues what I am after (edit: obviously, when I say "residues", I always mean the truncated, 16-hex character residues, not all the stuff)

cudaLucas solves this very elegantly, putting the residue in the name of the files. This was what I was screaming for.... If I have the files in the saving folder, I see all residues, and when I do the second run (in case of a triple check or mismatch) I see immediately where the error was, and need only to re-run very few iterations. I even can use a batch file to compare them automatically and abort the LL if is printing things which are not in the NAMES of the files from the former run...

Last fiddled with by LaurV on 2013-09-30 at 19:04
LaurV is offline   Reply With Quote
Old 2013-09-30, 19:01   #209
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23·271 Posts
Default

So, simplified you just want clLucas to dump checkpoint residues to file, is that it?
kracker is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
Can't get OpenCL to work on HD7950 Ubuntu 14.04.5 LTS VictordeHolland Linux 4 2018-04-11 13:44
OpenCL accellerated lattice siever pstach Factoring 1 2014-05-23 01:03
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
AMD's Graphics Core Next- a reason to accelerate towards OpenCL? Belteshazzar GPU Computing 19 2012-03-07 18:58

All times are UTC. The time now is 07:10.


Mon Aug 2 07:10:40 UTC 2021 up 10 days, 1:39, 0 users, load averages: 2.39, 2.09, 1.72

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.