![]() |
|
|
#23 |
|
Aug 2002
Buenos Aires, Argentina
27638 Posts |
The multiplication circuit shown in Knuth's book does not require floating point support. It requires 11 flip-flops and some combinatorial logic per bit and each block only communicates with the block at its left and the block at its right. So I assume that it may be clocked faster than 500-600 MHz.
|
|
|
|
|
|
#24 | |
|
"Ben"
Feb 2007
3×5×251 Posts |
Quote:
|
|
|
|
|
|
|
#25 |
|
Sep 2009
46408 Posts |
Several years ago I read the spec for an Inmos A100. It was basically several multiply-adders arranged so you could load 1 number into it, then feed a 2nd number in a word at a time, each stage multiplied the input word by its word of the first number, added previous result, then passed the result along so you could multiply two numbers in time proportional to the total length of the numbers. Several chips could be chained to process numbers large than 1 chip could handle.The total number of gates was proportional to the shorter of the numbers to be processed. Larger numbers could be done in slices.
Obviously today you could put a lot more stages on 1 chip than you could then. Chris |
|
|
|
|
|
#26 | |
|
∂2ω=0
Sep 2002
República de California
267548 Posts |
Quote:
|
|
|
|
|
|
|
#27 |
|
"Ben"
Feb 2007
3×5×251 Posts |
To be fair, the backyard circuit he said was O(log n), so the constant would need to be O(n), not O(log n).
|
|
|
|
|
|
#28 | |
|
Aug 2002
Buenos Aires, Argentina
1,523 Posts |
Quote:
Last fiddled with by alpertron on 2013-04-29 at 19:40 |
|
|
|
|
|
|
#30 |
|
Tribal Bullet
Oct 2004
5·23·31 Posts |
If memory serves, a Dubner Cruncher was an off-the-shelf signal processor chip on an ISA board. It was much faster than PCs of its day because DSP chips are designed for high-throughput multiply-add chains, which are the bottleneck in filtering signals in real time. By comparison, PCs sucked at those kinds of computations.
Nowadays PCs can pipeline that stuff just as much as DSP chips can, to the point where you use DSP chips for reasons other than favorable performance compared to PCs (i.e. space, weight, power or real-time responsiveness constraints). A modern-day accelerator looks a lot like a latter-day GPU: lots of memory bandwidth, but a much larger arithmetic throughput than PC chips. The need for wide access to memory to execute an FFT means the largest speedup you see for LL testing on a GPU is a factor of ~4x, whereas for Mersenne-mod trial factoring (which only depends of lots of ALUs) you see a GPU run 100x faster. |
|
|
|
|
|
#31 |
|
Oct 2011
22 Posts |
not a full out custom chip but AMD is providing possible alternatives.
http://www.bit-tech.net/news/hardwar...-semi-custom/1 possible 2p G34 chip with larger L3? |
|
|
|
|
|
#32 |
|
"Svein Johansen"
May 2013
Norway
3·67 Posts |
I guess its memory size that limits some of this on GPU. As the GPUs only can adress max 8gb of memory. The last cards has 6gb, but not even close to whats needed for proper testing.
With Maxwell architecture from Nvidia, where they place an ARM cpu onchip with the GPU, the GPU itself can adress the hosts memory, so inserting 128gb mem on a machine, and virtualize even more to SSD will make it happen. |
|
|
|
|
|
#33 |
|
∂2ω=0
Sep 2002
República de California
267548 Posts |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| New GPU accelerated sieve of Eratosthenes | cseizert | Programming | 8 | 2016-10-27 05:55 |
| Anti-poverty drug testing vs "high" tax deduction testing | kladner | Soap Box | 3 | 2016-10-14 18:43 |
| Accelerated launch date for Intel 45nm quads | rx7350 | Hardware | 1 | 2007-08-02 14:47 |
| Speed of P-1 testing vs. Trial Factoring testing | eepiccolo | Math | 6 | 2006-03-28 20:53 |
| Hardware failure only detected on torture test or also when factoring/LL-testing...? | Jasmin | Hardware | 10 | 2005-02-14 01:58 |