mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2021-04-28, 15:05   #56
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

5×937 Posts
Default

I played with a Xilinx spartan 6 dev board (10 MHz?). If anything it teaches you to code well. Unless you design some FFT circuits using the given multipliers are almost useless for huge integer calculations.

Last fiddled with by paulunderwood on 2021-04-28 at 15:09
paulunderwood is offline   Reply With Quote
Old 2021-04-28, 16:07   #57
a1call
 
a1call's Avatar
 
"Rashid Naimi"
Oct 2015
Remote to Here/There

95E16 Posts
Default

Precisely so,
I have used a CPU-less FPGA in a past job to design a NDT module.
I have this way-on-the-backburner/dream project to create a module that would perform arbitrary-precision calculations which would entail designing external memory and from-the-scratch programming to perform the arbitrary precision calculation, It is a Herculean-task,

ETA, OTOH, You could design a very specialized/efficient system without the PC's OS overhead.

Last fiddled with by a1call on 2021-04-28 at 16:19
a1call is offline   Reply With Quote
Old 2021-04-28, 17:38   #58
xilman
Bamboozled!
 
xilman's Avatar
 
"๐’‰บ๐’ŒŒ๐’‡ท๐’†ท๐’€ญ"
May 2003
Down not across

2·17·347 Posts
Default

Quote:
Originally Posted by retina View Post
Digging ditches is also a career in itself. But amateurs can also partake in the fun if they wish.

If you want a fantastic ditch dug quickly, hire a professional. If you want to spend no money and lots of time farting about with a shovel to create a ditch that does its job only under a narrow range of special conditions then you can do it yourself.
xilman is online now   Reply With Quote
Old 2021-04-28, 17:40   #59
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

2×229 Posts
Default

Quote:
Originally Posted by kriesel View Post
I know they are around, but the cheap boards don't have enough power to do anything very useful. They don't have enough RAM for a start to hold an FFT.

However, I could see this could make a student project, with perhaps the possibility of a university getting a decent FPGA development board free from Xilinx, Intel etc.
drkirkby is offline   Reply With Quote
Old 2021-04-28, 17:40   #60
xilman
Bamboozled!
 
xilman's Avatar
 
"๐’‰บ๐’ŒŒ๐’‡ท๐’†ท๐’€ญ"
May 2003
Down not across

2·17·347 Posts
Default

Quote:
Originally Posted by retina View Post
Digging ditches is also a career in itself. But amateurs can also partake in the fun if they wish.

If you want a fantastic ditch dug quickly, hire a professional. If you want to spend no money and lots of time farting about with a shovel to create a ditch that does its job only under a narrow range of special conditions then you can do it yourself.
And learning about how to dig ditches.

Don't underestimate the value of self-education.
xilman is online now   Reply With Quote
Old 2021-04-28, 17:46   #61
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

24×3×163 Posts
Default

Quote:
Originally Posted by drkirkby View Post
I know they are around, but the cheap boards don't have enough power to do anything very useful. They don't have enough RAM for a start to hold an FFT.
Try crawling before sprinting or flying. One TF kernel. Small ram requirements, and much less herculean than attempting FFT based PRP or P-1 implementation.
Or stop advocating someone else should do it.
Getting an FPGA to do better than a mass produced gpu driven by finely honed GIMPS software seems a tall order.

Last fiddled with by kriesel on 2021-04-28 at 17:49
kriesel is online now   Reply With Quote
Old 2021-04-29, 09:28   #62
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

26·7 Posts
Default

Is trying to use FPGAs to implement FFTs the best way forward? By that I mean, is there some other algorithm that might work more efficiently on FPGA by taking better advantage of what the hardware can do?

I wish I had the understanding, but through observation it is often said on this forum that GPUs are the best way currently. Yet for relatively smaller prime finding on Primegrid, CPUs mostly running LLR are the choice. Attempts at GPU usage for smaller prime finding have not resulted in good performance relative to CPU. It is a scaling problem? The exception is genefer, which works great on GPUs even with relatively smaller sizes, although it is limited to generalised fermat. Whatever the method behind it, I wonder if it could be expanded to other prime types. So swinging it back to this thread, would something other than implementing FFTs work better on FPGA?

To answer this I think you'd need to have decent skills in mathematics, software and hardware. I don't fit in any of those categories.
mackerel is offline   Reply With Quote
Old 2021-04-29, 09:53   #63
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

5·937 Posts
Default

FPGAs usually have several, sometimes scores, of (say) 48x48 bit multipliers which can all be harnessed. I am not sure about the existence of floating point units.

Another thing to take into account is the clock frequency of the FPGA. The higher the clock the more they cost.

I'll let someone more knowledgeable chip in...

ps here is a not too technical paper with many references that might be illuminating.

Last fiddled with by paulunderwood on 2021-04-29 at 10:30
paulunderwood is offline   Reply With Quote
Old 2021-04-29, 10:05   #64
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

145210 Posts
Default

Quote:
Originally Posted by mackerel View Post
Is trying to use FPGAs to implement FFTs the best way forward? By that I mean, is there some other algorithm that might work more efficiently on FPGA by taking better advantage of what the hardware can do?

I wish I had the understanding, but through observation it is often said on this forum that GPUs are the best way currently. Yet for relatively smaller prime finding on Primegrid, CPUs mostly running LLR are the choice. Attempts at GPU usage for smaller prime finding have not resulted in good performance relative to CPU. It is a scaling problem? The exception is genefer, which works great on GPUs even with relatively smaller sizes, although it is limited to generalised fermat. Whatever the method behind it, I wonder if it could be expanded to other prime types. So swinging it back to this thread, would something other than implementing FFTs work better on FPGA?

To answer this I think you'd need to have decent skills in mathematics, software and hardware. I don't fit in any of those categories.
TF (trial factoring) does not use FFTs.

For primality testing Mersenne numbers, there is LL and PRP; the two are very similar from an implementation perspective. They both require squaring very large numbers, and that is done through FFTs. There's no avoiding FFTs for the bit-sizes we consider (let's say larger than 100M bits).

OTOH there is a wide range of choice about how to implement the FFT:
1. using complex numbers: based on some floating point format (e.g. DP), or on some fixed-point format.
2. using FGT (Fast Galois Transform), which is similar to complex arithmetic but using integers modulo a small prime.
3. using NTT (Number Theoretic Transform), which again involves modular arithmetic modulo a small prime.

(there is extensive discussion of these techniques in other threads).
And each of these choices has sub-variants and various tradeoffs.

What they have in common, is they all require a lot of multiplications. These multiplications, at some fundamental level, always come down to integer multipliers (even for floating point).

There is also a different technique, "Nussbaumer convolution", which does not require multiplication (using additions and shifts instead). It does not seem competitive on GPUs, but the situation may be different on FPGAs, I have no idea.

Last fiddled with by preda on 2021-04-29 at 10:13
preda is offline   Reply With Quote
Old 2021-04-29, 15:48   #65
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

26·7 Posts
Default

Quote:
Originally Posted by preda View Post
OTOH there is a wide range of choice about how to implement the FFT:
1. using complex numbers: based on some floating point format (e.g. DP), or on some fixed-point format.
2. using FGT (Fast Galois Transform), which is similar to complex arithmetic but using integers modulo a small prime.
3. using NTT (Number Theoretic Transform), which again involves modular arithmetic modulo a small prime.
That's one of many gaps in my knowledge filled out. I had taken FFTs to be #1 in the above only (throw more FP64 at it), and thought #3 was something else entirely. I think #3 might be the one used in genefer, but I'm not 100% on that.

One other thought I had, and think I've mentioned in the past, is if we can do this, would it be beneficial to use number formats different than those currently available. For example, would it be beneficial overall (more efficient) if you could work on bigger data sizes vs more smaller ones, factoring in the hardware implementation cost. So unless there are fixed functionality blocks in FPGAs that have to be used, could this be an option?
mackerel is offline   Reply With Quote
Old 2021-04-29, 16:55   #66
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

3×5×251 Posts
Default

Quote:
Originally Posted by mackerel View Post
That's one of many gaps in my knowledge filled out. I had taken FFTs to be #1 in the above only (throw more FP64 at it), and thought #3 was something else entirely. I think #3 might be the one used in genefer, but I'm not 100% on that.

One other thought I had, and think I've mentioned in the past, is if we can do this, would it be beneficial to use number formats different than those currently available. For example, would it be beneficial overall (more efficient) if you could work on bigger data sizes vs more smaller ones, factoring in the hardware implementation cost. So unless there are fixed functionality blocks in FPGAs that have to be used, could this be an option?
I think you are asking, could you implement, say, FP128 and use that instead of FP64 as a building block for the FFT?

It's complicated. Fundamentally, you could implement anything you like, limited only by the logic cell capacity of the chip. Clock frequency, however, will be limited to the max possible fabric clock which is usually way slower than the hard-DSPs present on the chip. Also, the bigger the multiplier design gets, the harder it is to interconnect all of the logic, and the slower the clock will run.

The way to get around using relative slow reconfigurable logic is to use any hard DSPs the chip may have (most chips have some, expensive chips have lots), which run a lot faster. But they are limited in size; modern ones I think are around 25x18 bits. A few of these can be cascaded to get to around 36x36 or maybe 48x48 without compromising clock frequency, but beyond that the clock will slow way down again (crossing slice boundaries).

Add to all of that:
* integrating the small multipliers discussed above into a working IBDWT algorithm
* interfacing with memory, probably external, probably heirarchical (using on-chip block rams)
* considering several parallel designs or one huge one
* controllers to interface with the outside world (microblaze makes this easier than it used to be)
* probably lots more detail

Anyway, it would be a massive design effort and even with the biggest most expensive parts available, is questionable as to whether it could beat a nice gpu.

Even figuring out the answer to that question (could it beat a gpu) is maybe a master's thesis project scale question since it likely involves prototyping designs to figure out the max clock rates and size of many design variations to essential fill in the boxes of a large design-space matrix.

Last fiddled with by bsquared on 2021-04-29 at 16:57
bsquared is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Intel Xeon E5 with Arria 10 FPGA on-package BotXXX Hardware 1 2016-11-18 00:10
FPGA based NFS sieving wwf Hardware 5 2013-05-12 11:57
Sugg. for work distr. based on Syd's database hhh Aliquot Sequences 2 2009-04-12 02:26
ECM/FPGA Implementation rdotson Hardware 12 2006-03-26 22:58
Number-theoretic FPGA function implementation suggestions? rdotson Hardware 18 2005-09-25 13:04

All times are UTC. The time now is 16:37.


Fri Jul 7 16:37:10 UTC 2023 up 323 days, 14:05, 1 user, load averages: 2.51, 2.46, 2.11

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

โ‰  ยฑ โˆ“ รท ร— ยท โˆ’ โˆš โ€ฐ โŠ— โŠ• โŠ– โŠ˜ โŠ™ โ‰ค โ‰ฅ โ‰ฆ โ‰ง โ‰จ โ‰ฉ โ‰บ โ‰ป โ‰ผ โ‰ฝ โŠ โА โŠ‘ โŠ’ ยฒ ยณ ยฐ
โˆ  โˆŸ ยฐ โ‰… ~ โ€– โŸ‚ โซ›
โ‰ก โ‰œ โ‰ˆ โˆ โˆž โ‰ช โ‰ซ โŒŠโŒ‹ โŒˆโŒ‰ โˆ˜ โˆ โˆ โˆ‘ โˆง โˆจ โˆฉ โˆช โจ€ โŠ• โŠ— ๐–• ๐–– ๐–— โŠฒ โŠณ
โˆ… โˆ– โˆ โ†ฆ โ†ฃ โˆฉ โˆช โІ โŠ‚ โŠ„ โŠŠ โЇ โŠƒ โŠ… โŠ‹ โŠ– โˆˆ โˆ‰ โˆ‹ โˆŒ โ„• โ„ค โ„š โ„ โ„‚ โ„ต โ„ถ โ„ท โ„ธ ๐“Ÿ
ยฌ โˆจ โˆง โŠ• โ†’ โ† โ‡’ โ‡ โ‡” โˆ€ โˆƒ โˆ„ โˆด โˆต โŠค โŠฅ โŠข โŠจ โซค โŠฃ โ€ฆ โ‹ฏ โ‹ฎ โ‹ฐ โ‹ฑ
โˆซ โˆฌ โˆญ โˆฎ โˆฏ โˆฐ โˆ‡ โˆ† ฮด โˆ‚ โ„ฑ โ„’ โ„“
๐›ข๐›ผ ๐›ฃ๐›ฝ ๐›ค๐›พ ๐›ฅ๐›ฟ ๐›ฆ๐œ€๐œ– ๐›ง๐œ ๐›จ๐œ‚ ๐›ฉ๐œƒ๐œ— ๐›ช๐œ„ ๐›ซ๐œ… ๐›ฌ๐œ† ๐›ญ๐œ‡ ๐›ฎ๐œˆ ๐›ฏ๐œ‰ ๐›ฐ๐œŠ ๐›ฑ๐œ‹ ๐›ฒ๐œŒ ๐›ด๐œŽ๐œ ๐›ต๐œ ๐›ถ๐œ ๐›ท๐œ™๐œ‘ ๐›ธ๐œ’ ๐›น๐œ“ ๐›บ๐œ”