mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2013-04-29, 16:45   #23
alpertron
 
alpertron's Avatar
 
Aug 2002
Buenos Aires, Argentina

27638 Posts
Default

The multiplication circuit shown in Knuth's book does not require floating point support. It requires 11 flip-flops and some combinatorial logic per bit and each block only communicates with the block at its left and the block at its right. So I assume that it may be clocked faster than 500-600 MHz.
alpertron is offline   Reply With Quote
Old 2013-04-29, 16:57   #24
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

3×5×251 Posts
Default

Quote:
Originally Posted by alpertron View Post
The multiplication circuit shown in Knuth's book does not require floating point support. It requires 11 flip-flops and some combinatorial logic per bit and each block only communicates with the block at its left and the block at its right. So I assume that it may be clocked faster than 500-600 MHz.
In FPGA's, not much faster. The maximum frequency is limited by things other than your design (clock tree, the FPGA fabric, etc.)
bsquared is offline   Reply With Quote
Old 2013-04-29, 17:27   #25
chris2be8
 
chris2be8's Avatar
 
Sep 2009

46408 Posts
Default

Several years ago I read the spec for an Inmos A100. It was basically several multiply-adders arranged so you could load 1 number into it, then feed a 2nd number in a word at a time, each stage multiplied the input word by its word of the first number, added previous result, then passed the result along so you could multiply two numbers in time proportional to the total length of the numbers. Several chips could be chained to process numbers large than 1 chip could handle.The total number of gates was proportional to the shorter of the numbers to be processed. Larger numbers could be done in slices.

Obviously today you could put a lot more stages on 1 chip than you could then.

Chris
chris2be8 is offline   Reply With Quote
Old 2013-04-29, 18:57   #26
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

267548 Posts
Default

Quote:
Originally Posted by alpertron View Post
If you can design custom hardware with millions of gates, it is very simple to perform a multiplication in O(n) time (where n is the number of bits) by reading the end of the section 4.3.3 of Donald Knuth's TAOCP.

If you could use billions of gates (not now, but in the future that will not be very expensive), you will be able to multiply two numbers a lot faster than a PC.
And what kind of clock rate would you be able to achieve on such hardware capable of multiplying many-megabit exponents? O(n) loses to O(n log n) if the implied constant is more than log n greater due to your circuit covering the backyard.
ewmayer is offline   Reply With Quote
Old 2013-04-29, 19:02   #27
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

3×5×251 Posts
Default

Quote:
Originally Posted by ewmayer View Post
And what kind of clock rate would you be able to achieve on such hardware capable of multiplying many-megabit exponents? O(n) loses to O(n log n) if the implied constant is more than log n greater due to your circuit covering the backyard.
To be fair, the backyard circuit he said was O(log n), so the constant would need to be O(n), not O(log n).
bsquared is offline   Reply With Quote
Old 2013-04-29, 19:37   #28
alpertron
 
alpertron's Avatar
 
Aug 2002
Buenos Aires, Argentina

1,523 Posts
Default

Quote:
Originally Posted by ewmayer View Post
And what kind of clock rate would you be able to achieve on such hardware capable of multiplying many-megabit exponents? O(n) loses to O(n log n) if the implied constant is more than log n greater due to your circuit covering the backyard.
For the 100-megabit case, a 1GHz FPGA or ASIC would require 100 msec in order to perform the iteration. According to http://www.mersenne.org/report_benchmarks/ the fastest processors perform a few percent faster. This is because Knuth's multiplier requires 1 clock cycle per bit while current processors operate with 128 or 256 bits simultaneously.

Last fiddled with by alpertron on 2013-04-29 at 19:40
alpertron is offline   Reply With Quote
Old 2013-04-29, 23:28   #29
Ken_g6
 
Ken_g6's Avatar
 
Jan 2005
Caught in a sieve

18B16 Posts
Default

All this ASIC talk made me think I'd heard of something like that before. But it took me awhile to remember the name. It was called a Dubner Cruncher. I wonder how hard it would be to make something similar today?
Ken_g6 is offline   Reply With Quote
Old 2013-04-30, 00:18   #30
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

5·23·31 Posts
Default

If memory serves, a Dubner Cruncher was an off-the-shelf signal processor chip on an ISA board. It was much faster than PCs of its day because DSP chips are designed for high-throughput multiply-add chains, which are the bottleneck in filtering signals in real time. By comparison, PCs sucked at those kinds of computations.

Nowadays PCs can pipeline that stuff just as much as DSP chips can, to the point where you use DSP chips for reasons other than favorable performance compared to PCs (i.e. space, weight, power or real-time responsiveness constraints).

A modern-day accelerator looks a lot like a latter-day GPU: lots of memory bandwidth, but a much larger arithmetic throughput than PC chips. The need for wide access to memory to execute an FFT means the largest speedup you see for LL testing on a GPU is a factor of ~4x, whereas for Mersenne-mod trial factoring (which only depends of lots of ALUs) you see a GPU run 100x faster.
jasonp is offline   Reply With Quote
Old 2013-05-05, 11:48   #31
fruitflavor
 
Oct 2011

22 Posts
Default

not a full out custom chip but AMD is providing possible alternatives.

http://www.bit-tech.net/news/hardwar...-semi-custom/1

possible 2p G34 chip with larger L3?
fruitflavor is offline   Reply With Quote
Old 2013-05-08, 10:34   #32
Manpowre
 
"Svein Johansen"
May 2013
Norway

3·67 Posts
Default Maxwell makes this possible

I guess its memory size that limits some of this on GPU. As the GPUs only can adress max 8gb of memory. The last cards has 6gb, but not even close to whats needed for proper testing.

With Maxwell architecture from Nvidia, where they place an ARM cpu onchip with the GPU, the GPU itself can adress the hosts memory, so inserting 128gb mem on a machine, and virtualize even more to SSD will make it happen.
Manpowre is offline   Reply With Quote
Old 2013-05-09, 18:17   #33
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

267548 Posts
Default

Quote:
Originally Posted by Manpowre View Post
I guess its memory size that limits some of this on GPU. As the GPUs only can adress max 8gb of memory. The last cards has 6gb, but not even close to whats needed for proper testing.
6gb is "not even close to whats needed for proper testing" of what, exactly?
ewmayer is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
New GPU accelerated sieve of Eratosthenes cseizert Programming 8 2016-10-27 05:55
Anti-poverty drug testing vs "high" tax deduction testing kladner Soap Box 3 2016-10-14 18:43
Accelerated launch date for Intel 45nm quads rx7350 Hardware 1 2007-08-02 14:47
Speed of P-1 testing vs. Trial Factoring testing eepiccolo Math 6 2006-03-28 20:53
Hardware failure only detected on torture test or also when factoring/LL-testing...? Jasmin Hardware 10 2005-02-14 01:58

All times are UTC. The time now is 14:49.


Fri Jul 7 14:49:44 UTC 2023 up 323 days, 12:18, 0 users, load averages: 0.68, 1.16, 1.13

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔