mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Data (https://www.mersenneforum.org/forumdisplay.php?f=21)
-   -   CEMPLLA: An alternative to GIMPS ? (https://www.mersenneforum.org/showthread.php?t=20489)

Gordon 2015-09-17 23:47

[QUOTE=R.D. Silverman;410702]

BTW, please describe the convolution algorithm you use in your LL code.[/QUOTE]

We're all still waiting, suspect it'll be a very long time :yawn:

ewmayer 2015-09-18 00:15

[QUOTE=Madpoo;410693]It was pointed out to me that what might be involved is breaking down a single LL iteration into parallel chunks and then combining the results.[/QUOTE]
I guarantee you that if he is in fact doing LLT rather than (say) bitcoin mining, that is what he is doing. Nothing new at all there, every exiting LL tester worth discussing has that capability, though on the CPU side it tends to perform more poorly in terms of overall throughput than one-job-per-node. The GPU client(s) must be massively parallel in order to make halfway-decent usage of that kind of hardware, but AFAIK in multi-GPU systems the comms-bandwidth intra-GPU dwarfs the inter-GPU bandwidth (GPU wonks, please correct me if I have that wrong), so throwing multiple GPUs at each FFT-mul is likely to behave similarly as parallel multicore-CPU implementations. Again the dead silence from the OP w.r.to benchmarks, FFT-basics (e.g. hand-rolled or not; non-power-of-2 support or not, maxP for various FFT lengths) - is not inspiring confidence.

[QUOTE]I wonder if the author considered the effects of limited memory bandwidth. working with billion-digit Mersenne #'s, I imagine that's some hefty amounts of data flying around, and the need to run across 5 GPU's (which I think is tied to him wanting to break it down into 8,190 chunks of work) means a lot of data on PCI as well.[/QUOTE]
More or less same point as I made above.

[QUOTE]Oh well... those are problems I'm sure he's solved by now.[/QUOTE]
On what do you base your confidence? Show ... us .. the ... numbers.

[QUOTE]EDIT: specifically it sounds like the author was interested in this at one point:
[URL="https://en.wikipedia.org/wiki/Karatsuba_algorithm"]https://en.wikipedia.org/wiki/Karatsuba_algorithm[/URL][/QUOTE]
Karatsuba is one of the standard waypoints on the learning curve leading from grammar-school bignum multiply to O(n lg n) transform-based. (Schönhage-Strassen in its various guises). There's a reason no one who desires to be taken seriously uses it for really big moduli.

As the western-themed saying goes, so far I've seen a whole of lot of hat, but no cattle.

Madpoo 2015-09-18 00:59

[QUOTE=ewmayer;410707][QUOTE]Oh well... those are problems I'm sure he's solved by now.[/QUOTE]

On what do you base your confidence? Show ... us .. the ... numbers.[/QUOTE]

LOL... I was being sarcastic but I can tell it wasn't that obvious. :smile:

[QUOTE=ewmayer;410707]Karatsuba is one of the standard waypoints on the learning curve leading from grammar-school bignum multiply to O(n lg n) transform-based. (Schönhage-Strassen in its various guises). There's a reason no one who desires to be taken seriously uses it for really big moduli.[/QUOTE]

That's kind of the impression I got when looking at that info, that it works okay, but, as you eloquently put it, it's a bit grammar-school compared to the more advanced methods.

I suppose I shouldn't be surprised that his objections to using GMP LIB were based mostly on his mistrust of all the "TODO" comments he saw, thus assuming it wasn't ready for actual use?

At least he had some open source code to look at before assuming it was no good... we're forced to make assumptions on his program based on scraps of comments from 2011.

science_man_88 2015-09-18 01:07

[QUOTE=ewmayer;410707](Schönhage-Strassen in its various guises). [/QUOTE]

other than the FFT and inverse FFT even I understand how to make the picture for the Wikipedia article:

Uncwilly 2015-09-18 01:13

[QUOTE=VBCurtis;410695]And since it can only find 64-bit factors, it also doesn't need to exist (since all candidates in the range of interest to the author have already been checked to 64 bits, or could be in hours with the right tools).[/QUOTE]
[QUOTE=Gordon;410706]We're all still waiting, suspect it'll be a very long time :yawn:[/QUOTE]
He's busy downloading factoring data from PrimeNet and OBD :lol:

LaurV 2015-09-18 02:25

[QUOTE=philmoore;410690]If I had a system with 5 GPUs, would the most efficient use of this system to LL test be to run five instances of CudaLucas? If so, this program would have to be 5 times as fast to be an improvement.[/QUOTE]
Well, this is another reason to believe this guy is a joker (I think someone pointed it before too). All "good" GPUs (i.e. not very old crap) are 2-slots, if not wider. Air-cooled "good" GPUs can be 3 slots (see Asus DC2 cards). Five GPUs means 10 slots, and I don't know any mobo where I can connect 10 slots without lots of cabling, except for big servers. I had 4 water-cooled GPUs in one of my rig, and currently have 3 (because they have "grown is size" mechanically) and there is no space to put a needle inside... And having a huge mobo and a haf932 case (i.e. HUGE).

danaj 2015-09-18 03:10

IIRC some discussion was made about "as little as $25,000 to start" so we're not talking about little home computers. One of the dense server units with 2-8 K80s (e.g. Penguin Relion). Better hope there isn't much inter-GPU communication going on (the bandwidth limitations would be harsh). I'm wondering why anyone with one of these would bother running the code, given how little info they get into what it's doing.

I'm mystified by the requirement of 5 GPUs per node. Clearly it runs on multiple GPUs, but why hard code it? I don't want to jump on the rag-bandwagon, but it's odd.

LaurV 2015-09-18 03:16

[QUOTE=ewmayer;410707]
Karatsuba is one of the standard waypoints on the learning curve leading from grammar-school bignum multiply to O(n lg n) transform-based. (Schönhage-Strassen in its various guises). There's a reason no one who desires to be taken seriously uses it for really big moduli.[/QUOTE]
Even if he uses kinda very optimized Toom-Cook multiplication, as someone pointed out before in the thread, the NV's FFT libs can only handle ~200M-digit numbers, so you still need to split the 1B-digits in 5, then FFT the part, then "Cook" the results, he would still have polynomials of degree 4 (with 5 coefficients) to multiply, where each coefficient is a 200M-digit number. It can not be faster than 5 GPU's running 5 copies of CudaLucas, even if he can transfer intermediary data instantaneously between the GPU's.

Gordon 2015-09-18 17:56

[QUOTE=danaj;410728]IIRC some discussion was made about "as little as $25,000 to start" so we're not talking about little home computers. One of the dense server units with 2-8 K80s (e.g. Penguin Relion). Better hope there isn't much inter-GPU communication going on (the bandwidth limitations would be harsh). I'm wondering why anyone with one of these would bother running the code, given how little info they get into what it's doing.

I'm mystified by the requirement of 5 GPUs per node. Clearly it runs on multiple GPUs, but why hard code it? I don't want to jump on the rag-bandwagon, but it's odd.[/QUOTE]

It's hard coded to ensure that the vast majority if not nearly all down-loaders run MGCT....

chalsall 2015-09-18 18:46

[QUOTE=Gordon;410783]It's hard coded to ensure that the vast majority if not nearly all down-loaders run MGCT....[/QUOTE]

...which, if the claim is correct, is an extremely poor implementation of TF'ing. And yet, interestingly, the empirical evidence provided by Madpoo says that this "talks" to its Command and Control (C&C) constantly via an encrypted channel; and not even to a registered domain.

Those doing the shortest possible TF'ing in "just in time" mode through GPU72 only "talk" to the server a dozen times or so an hour (less than a hundred bytes per "chat"); and they have (I think) three or four reasonably high-end GPUs on the job.

Clearly, several things just don't add up. It will be interesting to see if "CEMPLLA Author" (do we actually yet know his real name) will ever come back to explain himself. I bet a dollar no.

Madpoo 2015-09-18 19:08

[QUOTE=danaj;410728]I'm wondering why anyone with one of these would bother running the code, given how little info they get into what it's doing.[/QUOTE]

Well here's what he said on the Nvidia forum:
[QUOTE]In fact, I could see some well-off people doing it as an interesting hobby, once they've been made aware of the fact that they can own a "supercomputer" for little more than the price of a set of high-end golf clubs, and can immediately put it to work to make history.[/QUOTE]

So, I guess it would be limited to people with too much money who feel like owning a "supercomputer". I'd think someone with that much disposable income wouldn't be too interested in the EFF prize money, especially considering the odds are still against them.

While I'm thinking about it... I'm not a golfer at all. Are a set of high end clubs really that much? Good gravy!


All times are UTC. The time now is 06:54.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.