![]() |
|
|
#23 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
17×487 Posts |
I have no expertise in hardware design, so feel free to laugh at my off-the-wall idea.
Can you create say a 1024 x 1024 pipelined integer multiplier that produces a result in 1024 clocks? If so, and if we can manage to feed the FPGA with data fast enough, then you could do 1K 1024x1024 multiplies in 2047 clocks. I imagine a bigint package could make use of that. |
|
|
|
|
|
#24 | |
|
Undefined
"The unspeakable one"
Jun 2006
My evil lair
6,793 Posts |
Quote:
Last fiddled with by retina on 2018-02-16 at 10:02 |
|
|
|
|
|
|
#25 |
|
Just call me Henry
"David"
Sep 2007
Liverpool (GMT/BST)
3·23·89 Posts |
How many clock cycles is a 1024 x 1024 multiplication on a cpu? Quite a few.
I would hope that a FPGA would be able to beat a cpu in throughput even adjusting for difference in clockspeed. One area where a FPGA could help on this forum is cases where programs run into a 64-bit wall as >64-arithmetic is so much slower.I would love to see a speed comparison between 64 x 64 mod 64 and 65 x 65 mod 65 on a FPGA. On a cpu there is huge difference which shouldn't exist on a FPGA I think. |
|
|
|
|
|
#26 | |
|
"David"
Jul 2015
Ohio
10000001012 Posts |
Quote:
I’ve got a pipeline that does SHA-3 512bit hashes on 64 byte inputs in the GB/s range, so similar inputs but all Boolean and shifts. |
|
|
|
|
|
|
#27 | |
|
P90 years forever!
Aug 2002
Yeehaw, FL
201278 Posts |
Quote:
Does the FPGA have on-chip memory to hold a 100M-bit input and 200Mbit output? Anyway, I was just trying to come up with a way an FPGA could excel compared to a CPU. Pipelining was what I could think of. I think trying to emulate what a CPU does (double-precision floats for an FFT) is the losing path. |
|
|
|
|
|
|
#28 | |
|
"Ben"
Feb 2007
3×5×251 Posts |
Quote:
It looks like the largest Ultrascale+ parts have on the order of 100 Mb of fast on-chip block RAM, and even more of "ultraRAM" (not familiar with that). They are pretty darn capable devices. |
|
|
|
|
|
|
#29 | |
|
"David"
Jul 2015
Ohio
11×47 Posts |
Quote:
Most of the modern FPGAs have very-fast serial transceivers on them, as well as PCIe bus capability. For reference, 16x PCIe 3.0 is 8GT/s. This is good for about 16GByte/s less overhead (Full Duplex), so for argument we will say 12 GB/second to the FPGA. Inside the fabric those 32 signal pairs (16 lanes) get deserialized and spit out to a wide 16 or 32 bit buffer. 32 bit gives us more breathing room on the FPGA clock (16 requires a 500Mhz or faster clock. 250 Mhz is obtainable on much lower priced devices), so we can get a nice wide 512 bits a clock in/out. 8 clocks total latency IO + pipeline latency, quite likely the actually multipliers could be completely pipelined. You're looking at 50 million 1024x1024 multiplications/second. Now what would you do with those? If you wanted 12 million/second we could likely put it on a comparatively cheap FPGA on 8x PCIe 2.0. |
|
|
|
|
|
|
#30 |
|
6809 > 6502
"""""""""""""""""""
Aug 2003
101×103 Posts
22·2,767 Posts |
Even if this winds up not being practical for daily use by many people. Might it be worthwhile for GIMPS to get 1 up and running for fast checks of purported primes? (I would cough up a dozen schillings or more to help buy it.)
And it could be used for quick double or triple checks when the original looks hinky. Also, it could be used on the clean-up side for assignments that expire and are in the milestone hold up zone (it could out poach the poachers.) |
|
|
|
|
|
#31 |
|
∂2ω=0
Sep 2002
República de California
22·2,939 Posts |
Too busy with other stuff to crunch the numbers myself just now, but if one used best-of-breed multiword-mul algos to build such kilobit MULs from hardware arithmetic instructions, what would the estimated throughput be on a high-end GPU or KNL-style multicore CPU?
|
|
|
|
|
|
#32 | |
|
Romulan Interpreter
"name field"
Jun 2011
Thailand
41×251 Posts |
Quote:
Last fiddled with by LaurV on 2018-02-17 at 05:32 |
|
|
|
|
|
|
#33 | |
|
"Ben"
Feb 2007
3·5·251 Posts |
Quote:
|
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Intel Xeon E5 with Arria 10 FPGA on-package | BotXXX | Hardware | 1 | 2016-11-18 00:10 |
| FPGA based NFS sieving | wwf | Hardware | 5 | 2013-05-12 11:57 |
| Sugg. for work distr. based on Syd's database | hhh | Aliquot Sequences | 2 | 2009-04-12 02:26 |
| ECM/FPGA Implementation | rdotson | Hardware | 12 | 2006-03-26 22:58 |
| Number-theoretic FPGA function implementation suggestions? | rdotson | Hardware | 18 | 2005-09-25 13:04 |