![]() |
|
|
#56 |
|
Sep 2002
Database er0rr
5×937 Posts |
I played with a Xilinx spartan 6 dev board (10 MHz?). If anything it teaches you to code well. Unless you design some FFT circuits using the given multipliers are almost useless for huge integer calculations.
Last fiddled with by paulunderwood on 2021-04-28 at 15:09 |
|
|
|
|
|
#57 |
|
"Rashid Naimi"
Oct 2015
Remote to Here/There
95E16 Posts |
Precisely so,
I have used a CPU-less FPGA in a past job to design a NDT module. I have this way-on-the-backburner/dream project to create a module that would perform arbitrary-precision calculations which would entail designing external memory and from-the-scratch programming to perform the arbitrary precision calculation, It is a Herculean-task, ETA, OTOH, You could design a very specialized/efficient system without the PC's OS overhead. Last fiddled with by a1call on 2021-04-28 at 16:19 |
|
|
|
|
|
#58 | |
|
Bamboozled!
"๐บ๐๐ท๐ท๐ญ"
May 2003
Down not across
2·17·347 Posts |
Quote:
|
|
|
|
|
|
|
#59 | |
|
"David Kirkby"
Jan 2021
Althorne, Essex, UK
2×229 Posts |
Quote:
However, I could see this could make a student project, with perhaps the possibility of a university getting a decent FPGA development board free from Xilinx, Intel etc. |
|
|
|
|
|
|
#60 | |
|
Bamboozled!
"๐บ๐๐ท๐ท๐ญ"
May 2003
Down not across
2·17·347 Posts |
Quote:
Don't underestimate the value of self-education. |
|
|
|
|
|
|
#61 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24×3×163 Posts |
Quote:
Or stop advocating someone else should do it. Getting an FPGA to do better than a mass produced gpu driven by finely honed GIMPS software seems a tall order. Last fiddled with by kriesel on 2021-04-28 at 17:49 |
|
|
|
|
|
|
#62 |
|
Feb 2016
UK
26·7 Posts |
Is trying to use FPGAs to implement FFTs the best way forward? By that I mean, is there some other algorithm that might work more efficiently on FPGA by taking better advantage of what the hardware can do?
I wish I had the understanding, but through observation it is often said on this forum that GPUs are the best way currently. Yet for relatively smaller prime finding on Primegrid, CPUs mostly running LLR are the choice. Attempts at GPU usage for smaller prime finding have not resulted in good performance relative to CPU. It is a scaling problem? The exception is genefer, which works great on GPUs even with relatively smaller sizes, although it is limited to generalised fermat. Whatever the method behind it, I wonder if it could be expanded to other prime types. So swinging it back to this thread, would something other than implementing FFTs work better on FPGA? To answer this I think you'd need to have decent skills in mathematics, software and hardware. I don't fit in any of those categories. |
|
|
|
|
|
#63 |
|
Sep 2002
Database er0rr
5·937 Posts |
FPGAs usually have several, sometimes scores, of (say) 48x48 bit multipliers which can all be harnessed. I am not sure about the existence of floating point units.
Another thing to take into account is the clock frequency of the FPGA. The higher the clock the more they cost. I'll let someone more knowledgeable chip in... ps here is a not too technical paper with many references that might be illuminating. Last fiddled with by paulunderwood on 2021-04-29 at 10:30 |
|
|
|
|
|
#64 | |
|
"Mihai Preda"
Apr 2015
145210 Posts |
Quote:
For primality testing Mersenne numbers, there is LL and PRP; the two are very similar from an implementation perspective. They both require squaring very large numbers, and that is done through FFTs. There's no avoiding FFTs for the bit-sizes we consider (let's say larger than 100M bits). OTOH there is a wide range of choice about how to implement the FFT: 1. using complex numbers: based on some floating point format (e.g. DP), or on some fixed-point format. 2. using FGT (Fast Galois Transform), which is similar to complex arithmetic but using integers modulo a small prime. 3. using NTT (Number Theoretic Transform), which again involves modular arithmetic modulo a small prime. (there is extensive discussion of these techniques in other threads). And each of these choices has sub-variants and various tradeoffs. What they have in common, is they all require a lot of multiplications. These multiplications, at some fundamental level, always come down to integer multipliers (even for floating point). There is also a different technique, "Nussbaumer convolution", which does not require multiplication (using additions and shifts instead). It does not seem competitive on GPUs, but the situation may be different on FPGAs, I have no idea. Last fiddled with by preda on 2021-04-29 at 10:13 |
|
|
|
|
|
|
#65 | |
|
Feb 2016
UK
26·7 Posts |
Quote:
One other thought I had, and think I've mentioned in the past, is if we can do this, would it be beneficial to use number formats different than those currently available. For example, would it be beneficial overall (more efficient) if you could work on bigger data sizes vs more smaller ones, factoring in the hardware implementation cost. So unless there are fixed functionality blocks in FPGAs that have to be used, could this be an option? |
|
|
|
|
|
|
#66 | |
|
"Ben"
Feb 2007
3×5×251 Posts |
Quote:
It's complicated. Fundamentally, you could implement anything you like, limited only by the logic cell capacity of the chip. Clock frequency, however, will be limited to the max possible fabric clock which is usually way slower than the hard-DSPs present on the chip. Also, the bigger the multiplier design gets, the harder it is to interconnect all of the logic, and the slower the clock will run. The way to get around using relative slow reconfigurable logic is to use any hard DSPs the chip may have (most chips have some, expensive chips have lots), which run a lot faster. But they are limited in size; modern ones I think are around 25x18 bits. A few of these can be cascaded to get to around 36x36 or maybe 48x48 without compromising clock frequency, but beyond that the clock will slow way down again (crossing slice boundaries). Add to all of that: * integrating the small multipliers discussed above into a working IBDWT algorithm * interfacing with memory, probably external, probably heirarchical (using on-chip block rams) * considering several parallel designs or one huge one * controllers to interface with the outside world (microblaze makes this easier than it used to be) * probably lots more detail Anyway, it would be a massive design effort and even with the biggest most expensive parts available, is questionable as to whether it could beat a nice gpu. Even figuring out the answer to that question (could it beat a gpu) is maybe a master's thesis project scale question since it likely involves prototyping designs to figure out the max clock rates and size of many design variations to essential fill in the boxes of a large design-space matrix. Last fiddled with by bsquared on 2021-04-29 at 16:57 |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Intel Xeon E5 with Arria 10 FPGA on-package | BotXXX | Hardware | 1 | 2016-11-18 00:10 |
| FPGA based NFS sieving | wwf | Hardware | 5 | 2013-05-12 11:57 |
| Sugg. for work distr. based on Syd's database | hhh | Aliquot Sequences | 2 | 2009-04-12 02:26 |
| ECM/FPGA Implementation | rdotson | Hardware | 12 | 2006-03-26 22:58 |
| Number-theoretic FPGA function implementation suggestions? | rdotson | Hardware | 18 | 2005-09-25 13:04 |