mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Hardware (https://www.mersenneforum.org/forumdisplay.php?f=9)
-   -   Parallella / Epiphany (https://www.mersenneforum.org/showthread.php?t=18589)

xilman 2017-03-14 07:23

[QUOTE=paulunderwood;454819]A bit of necro-post...

Did you, Paul and Mark, get anything useful done with these boards?

How is the fpga programmed?

What flavour of Linux are you using?

:bump2:[/QUOTE]My cluster is running ECM on the ARM cores. Nothing exciting.

LaurV 2017-03-14 14:52

[QUOTE=Madpoo;454838]
128-bit FP? Faster FMA? (just tossing out two examples that will probably get people laughing at me).[/QUOTE]
No laughing. 96-bit modular multiplication (or just squaring), single tick. Thinking about exponentiation and TF. This is not very difficult to implement for somebody who knows VHDL well. We tried once, but got our hands dirty and the nose red, and gave up.. Need to learn more... We are still dreaming about duplicating the design 96 times in the same chip (it may not use more than a hundredth of a xilinx or so, for a multiplier) so we could crunch all 96 classes (on a 420-based) at the same time. But it may be a very expensive machine, and not extremely fast... I mean, it may be faster than the actual tools, but to have it worth, it will need the next step: asics, low cost and low power consumption, hundreds of them. Like bitcoin mining. Which will need big investment... etc.

GP2 2017-03-14 18:02

[QUOTE=LaurV;454859]We are still dreaming about duplicating the design 96 times in the same chip (it may not use more than a hundredth of a xilinx or so, for a multiplier) so we could crunch all 96 classes (on a 420-based) at the same time.[/QUOTE]

I think TF is several years ahead of the LL wavefront, and GPUs are plentiful. So maybe other areas could benefit more. If it was somehow possible to use FPGA to greatly speed up LL testing, the $150,000 prize for a 100M decimal-digit prime could be an incentive.

We have had an unusually dense streak of prime finds in recent years. Historically there were some big gaps, including the ones between M127 and M521, or M216091 and M756839. So if we're unlucky it's certainly possible that the next exponent that yields a prime could be more than three and a half times larger than than the current 74.2M, which would be above 250M, and we might wait nearly a decade between discoveries like the Sierpinksi problem.

LaurV 2017-03-17 10:01

There is no profit in hunting for EFF money, you should know that after so many years of hunting for primes :razz:, what we do, we do for fun, for socializing, for fame, etc, but from the expense point of view, we are always in the negatives. You will spend more in hardware and electricity, unless you are bloody lucky. In fact you contradict yourself in the second paragraph of the post, by [U]correctly[/U] pointing how long it may take to find that prime - think about the effort and finances to build specialized hardware, even if you run it almost for free, and you get the EFFs money, you will still be in the negatives when you find that prime...

OTOH, I was talking about TF because this is easy to implement, only integers, shift and add registers in FPGA are elementary (but that is not a "tick" solution, you will need a cleverer-than-that multiplication to be fast enough, otherwise you waste about 100 ticks for every squaring - but this may be ok too, as you need less hardware, and you may be able to do more multipliers, and go for a full 960 classes, in a 4620-based), etc.

Implementing FFT multiplication for LL or P-1 in VHDL would be totally OVER my head... (and most of the people here).

retina 2017-03-17 15:54

[QUOTE=LaurV;455022]Implementing FFT multiplication for LL or P-1 in VHDL would be totally OVER my head... (and most of the people here).[/QUOTE]I think NTT would be the thing to target for FPGAs. Forget about FFT, too messy and untidy. The only reason we use it now is because CPUs are good at FFT, and not so good at NTT.

GP2 2017-03-17 17:36

[QUOTE=retina;455040]I think NTT would be the thing to target for FPGAs. Forget about FFT, too messy and untidy. The only reason we use it now is because CPUs are good at FFT, and not so good at NTT.[/QUOTE]

Turns out there's actually a thing called a [URL="http://www.sciencedirect.com/science/article/pii/S1051200413002388"]Mersenne number transform[/URL], a specialization of NTT. Is that relevant to our interests by any chance?

GP2 2017-03-23 07:59

I was under the impression that you have to learn and use VHDL to use FPGAs. But apparently FPGAs can be programmed with the same OpenCL as for GPUs. I wonder if clLucas could be adapted for this?

Xilinx supports OpenCL in their [URL="http://www.xilinx.com/products/design-tools/software-zone/sdaccel.html"]SDAccel Development Environment[/URL] (the FPGA instances available in preview on Amazon cloud are Xilinx UltraScale Plus). They give a number of [URL="http://www.xilinx.com/products/design-tools/software-zone/sdaccel.html#examples"]examples of applications[/URL] (Bitcoin mining, etc). There are also some tutorial videos.

Edit: here's [URL="https://deixismagazine.org/2017/03/chipping-away/"]an article[/URL] that suggests FPGAs could become more widely used in high-performance computing applications.

jasonp 2017-03-23 16:04

Mersenne number transforms basically replace floating point numbers in complex FFTs with integers modulo a small Mersenne number. The most convenient such Mersenne number on a 64-bit system is 2^61-1. They've been known for a long time, probably since the early 1980s. If the figure of merit for how fast an integer NTT runs is the number of integer multiplications that you have to do for very big transform lengths, these kinds of transforms are probably the best tools available.

Ernst and I have a lot of practice building them, and they are a playground for optimization; but floating point FFTs are still faster on general purpose hardware.

Edit: a thread from [url="mersenneforum.org/showthread.php?t=11255"]the last time this came up[/url]

GP2 2017-03-23 18:52

[QUOTE=jasonp;455351]Ernst and I have a lot of practice building them, and they are a playground for optimization; but floating point FFTs are still faster on general purpose hardware.[/url][/QUOTE]

Sure, but what about FPGAs? Would integer make more sense there? By "general purpose" I presume you mean x86-64 architecture. Do the GPU programs like cudaLucas and clLucas use integer or floating point?

jasonp 2017-03-23 23:41

They would be floating point only.


All times are UTC. The time now is 07:12.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.