mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Data

Reply
 
Thread Tools
Old 2017-07-12, 23:28   #199
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

22·3·929 Posts
Default

Quote:
Originally Posted by chalsall View Post
Isn't that how Trump became Commander in Chief?
Only Trump?

But please, let's keep pawl-it-icks in the Soap Box.
ewmayer is offline   Reply With Quote
Old 2017-07-12, 23:50   #200
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

2·5·29·31 Posts
Default

Quote:
Originally Posted by ewmayer View Post
But please, let's keep pawl-it-icks in the Soap Box.
Meow.
chalsall is online now   Reply With Quote
Old 2017-07-13, 00:11   #201
airsquirrels
 
airsquirrels's Avatar
 
"David"
Jul 2015
Ohio

20516 Posts
Default

I also tried running cuobjdump on the binary as well, no plx or cubin to be found, although perhaps that’s where the encryption instructions are being used.

The CUDA calls are linked at run time so no version information was easily extractable, and the code that uses them does setup a stream and some host to device and device to host CUDA memcpy operations before a loop of kernel launch. If I could have got it running I would have hooked a debugger to the load module code and dumped the the kernel for nvdisassm to at least get some idea of the algorithm, however the code itself plain does not run.

The installer crash was in the visual studio 2010 runtime, so nothing telltale about how to work around that.

The time was for the joy of the hunt, rather than merit of the effort.
airsquirrels is offline   Reply With Quote
Old 2017-07-13, 00:33   #202
science_man_88
 
science_man_88's Avatar
 
"Forget I exist"
Jul 2009
Dumbassville

26×131 Posts
Default

Quote:
Originally Posted by airsquirrels View Post
I also tried running cuobjdump on the binary as well, no plx or cubin to be found, although perhaps that’s where the encryption instructions are being used.

The CUDA calls are linked at run time so no version information was easily extractable, and the code that uses them does setup a stream and some host to device and device to host CUDA memcpy operations before a loop of kernel launch. If I could have got it running I would have hooked a debugger to the load module code and dumped the the kernel for nvdisassm to at least get some idea of the algorithm, however the code itself plain does not run.

The installer crash was in the visual studio 2010 runtime, so nothing telltale about how to work around that.

The time was for the joy of the hunt, rather than merit of the effort.
can't extract it into machine code ?
science_man_88 is offline   Reply With Quote
Old 2018-10-03, 19:57   #203
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

11×347 Posts
Default

Quote:
Originally Posted by axn View Post
Those are integer instructions, so possibly he could be doing integer transforms.

IIRC, cudaLucas uses nvidia's cuFFT library where max FFT size is "128 million elements" (https://developer.nvidia.com/cufft)
I think clLucas also has same kind of limitation (dependent on clFFT library from AMD?)
gpuOWL, OTOH, uses hand-rolled FFT, but currently only supports p-o-2 2M & 4M FFTs. But maybe the author can write a 256M FFT for s&g.
George should write a 192M one for s&g as well (if not already done).
CUDALucas and ClLucas max out at 64M FFT length. For good reason, since run time per iteration of roughly n ln n ln ln n means ~twice the exponent, more than 4 times the run time. Similarly, George has chosen to limit prime95 ffts above 32M to FMA3 hardware fast enough for it, and not bother to code above 64M yet.
GpuOwL has now progressed since v3.5 to supporting up to a 144M FFT. https://www.mersenneforum.org/showpo...&postcount=505 Perhaps when Preda comes back from his long vacation he'll add an 8K W or 4K H which would enable fft lengths > 144M and potentially supporting gigadigit exponents with very long run times.

I've suspected since I learned of CEMPLLA v1, that its reason for a 5 GPU minimum was Toom-Cook-3 in parallel wrapped around a library implementation of 64M fft (5 GPUs, each doing one of the 5 out of 9 partial products required for 3x3 at 64M size) 192M total, just barely big enough I think for gigadigit.

CUDALucas 2.06beta on a GTX1070 at 64M is about 85 msec/iteration, so on a 1080Ti would be about 42 ms/it. Toom-Cook-3 would be longer than that, so a modified CUDALucas spreading a single gigadigit exponent to multiple GPUs that way would be more than 13 years. (I recall reading somewhere that the CEMPLLA author had seen cases where larger exponents ran considerably FASTER than smaller exponents, or some such, and I recall in my own testing seeing cases where fft timings are anomalously fast, in some cases orders of magnitude, _BUT THOSE ARE FOR FATAL TO ACCURACY ERROR CONDITIONS_.)

Last fiddled with by kriesel on 2018-10-03 at 20:00
kriesel is offline   Reply With Quote
Old 2018-10-03, 20:06   #204
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

3×1,811 Posts
Default

Quote:
Originally Posted by kriesel View Post
(I recall reading somewhere that the CEMPLLA author had seen cases where larger exponents ran considerably FASTER than smaller exponents, or some such, and I recall in my own testing seeing cases where fft timings are anomalously fast, in some cases orders of magnitude, _BUT THOSE ARE FOR FATAL TO ACCURACY ERROR CONDITIONS_.)
Yeah. There is no point in running really really fast if you are going in the wrong direction.
retina is offline   Reply With Quote
Old 2018-10-27, 14:52   #205
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

11×347 Posts
Default Anomalously fast iterations on gpus

Detailed description of error cases producing fast-but-wrong iteration can be seen at https://mersenneforum.org/showpost.p...&postcount=617
kriesel is offline   Reply With Quote
Old 2019-04-29, 19:28   #206
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

1110111010012 Posts
Default

Quote:
Originally Posted by kriesel View Post
CUDALucas and ClLucas max out at 64M FFT length.
Nope. Well, sort of, depending on gpu model and CUDA level. Small-gpu-ram models won't be able to primality test, threadbench, or fftbench above certain levels. A 1GB Quadro 2000 is limited to around 32768k - 38880k as I recall, in CUDALucas or CUDAPm1. The CUDALucas code (v2.06 at least, and possibly some earlier), with a sufficiently high CUDA level, can go to 256M fft length, and p~231 probably because of using signed 32 bit integers in places. But run times are dreadfully long. p~109 takes about 1.5 years on a GTX1080Ti; p~231 ~9 years on a GTX1080. CUDAPm1 has other issues that often occur at lower p than its memory limit.
Code:
Device              GeForce GTX 1080 Ti
Compatibility       6.1
clockRate (MHz)     1620
memClockRate (MHz)  5505

  fft    max exp  ms/iter
    1      22133   0.1083
...
 4608   85111207   3.2221
...
65536 1143276383  49.4602
69120 1204418959  49.4578
73728 1282931137  51.3181
75264 1309078039  56.8343
81920 1422251777  58.5331
82944 1439645131  60.2333
84672 1468986017  64.5615
86016 1491797777  66.1291
86400 1498314007  67.0704
93312 1615502269  67.4838
96768 1674025489  69.4963
98304 1700021251  72.0720
100352 1734668777  74.1605
102400 1769301077  77.7934
104976 1812840839  78.9627
110592 1907684153  80.0951
114688 1976791967  82.2443
115200 1985426669  86.3511
116640 2009707367  91.6873
131072 2147483647  94.1572
Code:
Device              GeForce GTX 1080
Compatibility       6.1
clockRate (MHz)     1797
memClockRate (MHz)  5005

  fft    max exp  ms/iter
    1      22133   0.1797
...
 4608   85111207   4.3534
...
65536 1143276383  66.6890
69120 1204418959  70.9543
69984 1219148351  73.3338
73728 1282931137  73.5568
75264 1309078039  81.4435
76832 1335757897  83.7932
81920 1422251777  83.9614
82944 1439645131  85.6173
84672 1468986017  91.7141
86016 1491797777  95.1226
86400 1498314007  95.1886
93312 1615502269  96.7214
96768 1674025489  99.2282
98304 1700021251 103.3630
100352 1734668777 105.7010
102400 1769301077 110.7946
104976 1812840839 112.6621
110592 1907684153 114.1252
114688 1976791967 117.3213
115200 1985426669 123.8906
116640 2009707367 131.4965
131072 2147483647 133.0254
139968 2147483647 149.7381
147456 2147483647 150.2931
163840 2147483647 171.1358
165888 2147483647 175.8527
169344 2147483647 185.7351
172032 2147483647 194.0131
172800 2147483647 200.9266
174960 2147483647 202.3867
184320 2147483647 202.9247
186624 2147483647 207.8176
193536 2147483647 215.1034
200704 2147483647 217.1268
204800 2147483647 225.7533
209952 2147483647 231.5600
221184 2147483647 232.8545
229376 2147483647 239.4878
230400 2147483647 267.3366
233280 2147483647 267.8970
236196 2147483647 272.5134
262144 2147483647 273.2270

Last fiddled with by kriesel on 2019-04-29 at 19:49
kriesel is offline   Reply With Quote
Old 2019-05-02, 12:34   #207
DukeBG
 
Mar 2018

2018 Posts
Default

This thread was a fun read. Thanks for bringing it up. I wouldn't want to necropost, but since it's not my reply that brought it back up, I feel safe sharing some thoughts.

After reading it (and threads on nvidia forums) I believe the author had honest intentions and the software indeed was written to do what's advertised. He was too ignorant + arrogant1 however, so the real discussion never happened, such as comparing the actual residues and iteration timings.

But he did actually listen in the end. When he found out there was factoring done, recieved criticism for the factoring software, recieved criticism for the shady internet connections, he removed the factoring software, removed internet connections and just hard-coded the set of exponents to LL-test in the installation.

The SSH internet connections were most definitely just for factoring data stored/read. 4shared.com is just a file storage / file exchange website that allows sftp connections – if you don't know, that's one of the "secure ftp" family of protocols which is basically ftp over ssh. Thus ssh. He wouldn't want unsecure ftp because people would then easily sniff the password and mess up his data. Since GIMPS has done more factoring for 100M- and giga-digits than the guy was ever capable of, he just removed all of that in favor of the list of no-factor exponents.

He did implement res64's from what it looks (and the display of the current exponent was there allegedly from the beginning), though never posted any for smaller exponents to compare the vailidity of the tests. Because the i+a (see above).

He never posted timings... Because he never actually had them. From the nvidia thread you can read that he doesn't himself own any hardware that his software "expects", with code in place to abort calculations if they are "taking too long". And that code always firing for him. It's a wonder how he's developing software that he himself is unable to truly test and that's probably the most amusing thing about the whole shebang.

_____
[1] – these two words so often go together, one my non-native-english-speaker acquaintance often mixes them up, but his sentences still tend to work!

Last fiddled with by DukeBG on 2019-05-02 at 12:37
DukeBG is offline   Reply With Quote
Old 2019-05-02, 15:43   #208
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

11·347 Posts
Default

Quote:
Originally Posted by DukeBG View Post
He did implement res64's from what it looks (and the display of the current exponent was there allegedly from the beginning), though never posted any for smaller exponents to compare the validity of the tests. Because the i+a (see above).

He never posted timings... Because he never actually had them. From the nvidia thread you can read that he doesn't himself own any hardware that his software "expects", with code in place to abort calculations if they are "taking too long". And that code always firing for him. It's a wonder how he's developing software that he himself is unable to truly test and that's probably the most amusing thing about the whole shebang.
Enjoyed your post. And it raised a question or two for me.

Where did you see the implementation of res64's in CEMPLLA?

It's my recollection that more than one forum member that I believe to be respected and competent, and possessing the necessary hardware, tried to install and test the CEMPLLA software, and it failed to install and run.

It's also my recollection that the author made reference to displaying timing info as seconds/iteration. If that's multiple seconds, that's low performance.

If the author could not run it, and others could not run it, it may have been a substantial accomplishment in coding (undocumented performance questions aside), but I'm not sure we should call it developing.

That aside, if the tone of his announcements were more mainstream, and the secrecy was gone, and the performance and reliability reasonable, I think it would be welcomed.

Last fiddled with by kriesel on 2019-05-02 at 16:18
kriesel is offline   Reply With Quote
Old 2019-05-02, 15:51   #209
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

3×1,811 Posts
Default

Nah, the author was just greedy. Wanting to use other people's time, effort and resources to get the prize money for himself. That's all.

It was a mistaken path though, because people aren't stupid, and finding large primes is hard.
retina is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Java applet alternative a1call Programming 19 2019-11-08 22:31
Alternative to LL paulunderwood Miscellaneous Math 36 2019-08-26 08:09
Fulsorials a1call Miscellaneous Math 41 2019-07-21 14:19
Is CEMPLLA 1.5 "the only software in the world capable of discovering" something? Not really. CRGreathouse Number Theory Discussion Group 51 2018-12-16 21:55
free alternative to EasyFit? ixfd64 Software 1 2008-04-26 21:28

All times are UTC. The time now is 14:20.

Fri May 29 14:20:33 UTC 2020 up 65 days, 11:53, 1 user, load averages: 3.01, 2.67, 2.54

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.