mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2009-11-19, 00:12   #12
__HRB__
 
__HRB__'s Avatar
 
Dec 2008
Boycotting the Soapbox

24·32·5 Posts
Default

Quote:
Originally Posted by bsquared View Post
[...]You'd be lucky to spend $700 just talking to the guy in the fab house who needs to build this for you. Nevermind the design time/resources.

The reason PC's are affordable, as you mention, is economy of scale. Pay a team of 10 engineers for a year to design a board, then build 10 million of them to amortize the cost. What you are talking about here is, in my view, decidedly more complex than a PC motherboard and completely out of reach for the aveage Joe.
Congratulations, you now own 24.9% of the companies stock and are head of hardware-engineering!

Since I don't possess any expertise in hardware design, I expect it should be easy for you to come up with ten brilliant ideas of how to cut production cost while at the same time increasing performance, so we can look at some of our options during the Monday meeting. How long do you think it will take you to locate all points on the efficient frontier? Software-development hasn't kicked into high gear yet, so feel free to give them some of the dirty work.

The implicit assumption in the business-sketch was that one would first make - say - 10.000 units using $10M in capital. If you want a more detailed answer you'll be lucky to spend $700 just talking to the consultant who needs to design your business plan for you.

Getting the mucho $$$ is not much of a problem, if one can first find a bunch of people with brains for technology and/or business who are willing to quit/decline a 250.000K job/offer and work for room and board in their own company wanting to become billionaires. There is a reason why VCs don't give $10M to homeless people who claim they can build perpetuum mobiles.
__HRB__ is offline   Reply With Quote
Old 2009-11-19, 03:45   #13
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

3·5·251 Posts
Default

Quote:
Originally Posted by __HRB__ View Post
Software-development hasn't kicked into high gear yet, so feel free to give them some of the dirty work.
Heh. That is a bit of fun.

Quote:
Originally Posted by __HRB__ View Post
The implicit assumption in the business-sketch was that one would first make - say - 10.000 units using $10M in capital. If you want a more detailed answer you'll be lucky to spend $700 just talking to the consultant who needs to design your business plan for you.

Getting the mucho $$$ is not much of a problem, if one can first find a bunch of people with brains for technology and/or business who are willing to quit/decline a 250.000K job/offer and work for room and board in their own company wanting to become billionaires. There is a reason why VCs don't give $10M to homeless people who claim they can build perpetuum mobiles.
I guess I mis-interpreted your plan. I'll be among the first in line to buy a multi-FPGA number cruncher, complete with bitstreams for a variety of DC projects, for $399 + tax. Not sayin you can't get there from here, just trying to illustrate some of the hurdles should the VC schmoozing and man-power hunting prove successful.
bsquared is offline   Reply With Quote
Old 2009-11-19, 04:50   #14
__HRB__
 
__HRB__'s Avatar
 
Dec 2008
Boycotting the Soapbox

24×32×5 Posts
Default

Quote:
Originally Posted by bsquared View Post
I guess I mis-interpreted your plan. I'll be among the first in line to buy a multi-FPGA number cruncher, complete with bitstreams for a variety of DC projects, for $399 + tax. Not sayin you can't get there from here, just trying to illustrate some of the hurdles should the VC schmoozing and man-power hunting prove successful.
Get back to work! The totally clueless general manager has found a 641 LUTs/$ solution from Altera, while you were snoozing on the job. http://www.buyaltera.com/scripts/par...me=544-2075-ND
A 160.000 unit volume discount might haggle this down to $29. Meaning $646 for a 400K-LUT Swiss-Chocolate solution. Cheap 4GB of RAM is $82 at newegg.com. Is $100 for RAM + Board unreasonably optimistic? Retail price $1199.00 + tax?

Alternative: a 615 LUTs/$ solution.
http://www.buyaltera.com/scripts/par...me=544-2456-ND
At $19 volume discount, that would be ~$300 for 250K-LUT Hershey's. Retail price with 4GB RAM and board $599.00 + tax?

Would you still be a customer at $599 + tax?

The software department did some pretend-shopping at opencores.com, but fell asleep. Therefore the accounting department has the following question: is a soft-processor with a 4096-bit register file and a USR-instruction prefix something people would like to have?
__HRB__ is offline   Reply With Quote
Old 2009-11-20, 17:51   #15
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

3×5×251 Posts
Default

Quote:
Originally Posted by __HRB__ View Post
Get back to work! The totally clueless general manager has found a 641 LUTs/$ solution from Altera, while you were snoozing on the job. http://www.buyaltera.com/scripts/par...me=544-2075-ND
Sir, yes Sir! Good find Sir!

However, the cyclone III doesn't have enough I/O to fully wire a DDR2 DIMM module (and doesn't support DDR3 at all). Which is fine, because you shouldn't be using them anyway. Happily, the device does support QDRII SRAM, which is what you should be using. I think they make these chips with 72Mb densities nowadays. Can you get by with 9MB of storage for a LL test? If not, how many multiples of 9 do you need? If more than a handful, then back to the drawing board.
bsquared is offline   Reply With Quote
Old 2009-11-20, 22:33   #16
__HRB__
 
__HRB__'s Avatar
 
Dec 2008
Boycotting the Soapbox

24·32·5 Posts
Default

Quote:
Originally Posted by bsquared View Post
Sir, yes Sir! Good find Sir!

However, the cyclone III doesn't have enough I/O to fully wire a DDR2 DIMM module (and doesn't support DDR3 at all). Which is fine, because you shouldn't be using them anyway. Happily, the device does support QDRII SRAM, which is what you should be using. I think they make these chips with 72Mb densities nowadays. Can you get by with 9MB of storage for a LL test? If not, how many multiples of 9 do you need? If more than a handful, then back to the drawing board.
Let's say we want to be able to do a Pepin-test on F33 using 2^17+1 bit moduli, and we fit 2^9 of these into the 72 Mb SRAM caches to do a NTT of length 2^9. We need at least 2*9*2^9*2^17=~1G LUT-operations to do the transform, which naively divided by the 16K of LUTs is 80.000 cycles or ~0.15 ms at 400Mhz, which would require main-memory bandwidth of 60 GB/s. Per FPGA! Wouldn't it already be very optimistic to get this kind of performance from the SRAM itself?

Hm...it appears that the 'problem' with Schoenhage-Strassen is that the amount of 'arithmetic' is so ridiculously low that
Quote:
Originally Posted by ldesnogu View Post
No experience with FPGA, but IIRC some article I read years ago, memory bandwidth was a serious issue. Is that still the case?
has put his finger right upon the real issue.

But, just for fun, let's assume that memory bandwidth wasn't a problem and one actually manges to get every LUT to do something all the time...then I estimate the time to do a transform of length 2^17 as 2*2^8*0.15ms->80ms, recurse 3 times->250ms, + inverse transform=~500ms, divided by 16 FPGAs->~30ms. 30ms*2^32iterations/86400s/365->4 years.

If we knew that F33 was in fact prime, then with $2000 Virtex-6 FPGAs clocked at 1600Mhz (we also get 10x the number of LUTs) one should be able to do this in about 1 year, for a handsome profit of >$300.000 because one would collect the $400.000 prize money for finding the first 100M- and 1G-digit prime.
__HRB__ is offline   Reply With Quote
Old 2009-11-21, 00:44   #17
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

2×7×461 Posts
Default

The $32 XC6SLX16 chip has 32 block RAMs and two DDR3 memory controllers each of which can connect to a single 128Mx16 DDR3 chip (so in total half a gigabyte per FPGA, memory access for streaming at about 3.2Gbytes/second). The block RAMs are 32 bits wide and run at 260MHz, so 1Gbyte/sec/block RAM, so the 60Gbyte/sec rate isn't quite practical on the XC6SLX16.

The $200 XC6SLX150 has 268 of the block RAMs, so the internal data transfer rate is entirely reasonable, and four of the memory controllers. Block RAMs are of course individually quite small, one 2^17-bit modulus takes up eight of them.

Of course, the $300 GeForce GTX275 has seven very much faster memory controllers, an internal data rate about an order of magnitude higher, and you don't have to program it in Verliog using a compiler that costs three thousand dollars and whose error reporting makes a bear with a sore head seem positively friendly. OK, Cuda's error reporting is nothing to write home about, but compared to VHDL ...

http://www.dinigroup.com/ sells the DNBFC_S12_PCIe, which looks like what you're trying to develop. It has 12 of the big XC6SLX150 Spartan-6 chips, each with a 128Mx16 DDR3 memory chip (1600Mbyte/second transfer rate) next door to it, on a slot that plugs into PCIe and uses about the power consumption of a low-end current GPU.

If that board costs less than ten thousand dollars I will be very surprised, so it's competing with six i7/860 boxes with competent GPUs, a competition which it is going to lose for almost any application.
fivemack is offline   Reply With Quote
Old 2009-11-21, 01:44   #18
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
San Diego, Calif.

281D16 Posts
Default

We had business with Dini Group. A lovely bunch of people, and a pet parrot crapping all over the office (that was in 2000). Notably, an FPGA-appropriate application is DNA/protein sequence alignments, and we were trying to implement a tricky variant of that. The competitors were TimeLogic (with a boring straightfoward S/W alignment) and another company that used ASICs (what was its name?). Anyway, both companies went belly up even before the genomic hype subsided. (Nominally, TimeLogic still exists, it was bought for dimes and pennies by a biologics company.)

Maybe, it was a good thing, that we had reprioritization very early on and only ended up spending ~$100K for the initial RFIs, setups, and then abandoned the project. (I wasn't the programmer but even sitting in the specs mulling meetings made me sick, as far as I remember. It's a jungle, Verilog and all. It's a full-time job.)

P.S.
Batalov is offline   Reply With Quote
Old 2009-11-21, 01:47   #19
__HRB__
 
__HRB__'s Avatar
 
Dec 2008
Boycotting the Soapbox

72010 Posts
Default

Quote:
Originally Posted by fivemack View Post
The $32 XC6SLX16 chip has 32 block RAMs and two DDR3 memory controllers each of which can connect to a single 128Mx16 DDR3 chip (so in total half a gigabyte per FPGA, memory access for streaming at about 3.2Gbytes/second). The block RAMs are 32 bits wide and run at 260MHz, so 1Gbyte/sec/block RAM, so the 60Gbyte/sec rate isn't quite practical on the XC6SLX16.

The $200 XC6SLX150 has 268 of the block RAMs, so the internal data transfer rate is entirely reasonable, and four of the memory controllers. Block RAMs are of course individually quite small, one 2^17-bit modulus takes up eight of them.[...]
Ok. What about the other extreme, using e.g. 16x16 $2 FPGAs with only enough logic cells to do the modular additions + write address generation of e.g. a length-16 split-radix NTT (the part before the bits get shifted and don't line up anymore) and 3.2GB/s memory attached?

Quote:
Originally Posted by fivemack View Post
[...]If that board costs less than ten thousand dollars I will be very surprised, so it's competing with six i7/860 boxes with competent GPUs, a competition which it is going to lose for almost any application.
It is highly suspicious when companies do not offer public price information. The only instances where this even makes remotely sense is when you have a monopoly on something, in which case the lack of competition guarantees that the product will be inferior. If they are relying on the foolishness of their customers, they won't stay in business long, because foolish customers tend to run out of money very quickly.

EDIT: Batalov appears to confirm the suspicion. About Dini, not that I'm a troll, which is beyond all doubt.
__HRB__ is offline   Reply With Quote
Old 2011-02-25, 17:45   #20
MikeLaJolla
 
Feb 2011
La Jolla, CA

3 Posts
Default

Quote:
Originally Posted by Batalov View Post
We had business with Dini Group. A lovely bunch of people, and a pet parrot crapping all over the office (that was in 2000). Notably, an FPGA-appropriate application is DNA/protein sequence alignments, and we were trying to implement a tricky variant of that. The competitors were TimeLogic (with a boring straightfoward S/W alignment) and another company that used ASICs (what was its name?). Anyway, both companies went belly up even before the genomic hype subsided. (Nominally, TimeLogic still exists, it was bought for dimes and pennies by a biologics company.)

Maybe, it was a good thing, that we had reprioritization very early on and only ended up spending ~$100K for the initial RFIs, setups, and then abandoned the project. (I wasn't the programmer but even sitting in the specs mulling meetings made me sick, as far as I remember. It's a jungle, Verilog and all. It's a full-time job.)

P.S.
DINI here. Parrot is still here. Forest the Parrot, but you might be confused with the Mousebirds. They all died.
MikeLaJolla is offline   Reply With Quote
Old 2011-02-25, 17:55   #21
MikeLaJolla
 
Feb 2011
La Jolla, CA

3 Posts
Default FPGA Expert here -- How can I help?

Quote:
Originally Posted by fivemack View Post
The $32 XC6SLX16 chip has 32 block RAMs and two DDR3 memory controllers each of which can connect to a single 128Mx16 DDR3 chip (so in total half a gigabyte per FPGA, memory access for streaming at about 3.2Gbytes/second). The block RAMs are 32 bits wide and run at 260MHz, so 1Gbyte/sec/block RAM, so the 60Gbyte/sec rate isn't quite practical on the XC6SLX16.

The $200 XC6SLX150 has 268 of the block RAMs, so the internal data transfer rate is entirely reasonable, and four of the memory controllers. Block RAMs are of course individually quite small, one 2^17-bit modulus takes up eight of them.

Of course, the $300 GeForce GTX275 has seven very much faster memory controllers, an internal data rate about an order of magnitude higher, and you don't have to program it in Verliog using a compiler that costs three thousand dollars and whose error reporting makes a bear with a sore head seem positively friendly. OK, Cuda's error reporting is nothing to write home about, but compared to VHDL ...

http://www.dinigroup.com/ sells the DNBFC_S12_PCIe, which looks like what you're trying to develop. It has 12 of the big XC6SLX150 Spartan-6 chips, each with a 128Mx16 DDR3 memory chip (1600Mbyte/second transfer rate) next door to it, on a slot that plugs into PCIe and uses about the power consumption of a low-end current GPU.

If that board costs less than ten thousand dollars I will be very surprised, so it's competing with six i7/860 boxes with competent GPUs, a competition which it is going to lose for almost any application.
DNBFC_S12_PCIe is $8950 in small quantity. We can put 12 in a chassis. That product is DNBFC_S12_12_Cluster. Cost with 12 BFC's is $120k.

Generally it is hard to compete against GPUs. That market is heavily subsidized.

This product has the most dedicated multipliers: DNV6F6PCIe, but the cost is high. With 6 SX475-1's, the price is $50k.
MikeLaJolla is offline   Reply With Quote
Old 2011-02-25, 18:02   #22
MikeLaJolla
 
Feb 2011
La Jolla, CA

3 Posts
Default

Quote:
Originally Posted by __HRB__ View Post
EDIT: Batalov appears to confirm the suspicion. About Dini, not that I'm a troll, which is beyond all doubt.
Ummm -- DINI here. It would be stupid to publish price information on the web for products this specialized. This is far off topic, but I'll explain the issues if the moderators allow and you're interested.
MikeLaJolla is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
(Pseudo)-Mathematics in Economics clowns789 Miscellaneous Math 3 2016-06-07 04:01
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
Eugenics: Economics for the Long Run Asian-American Soap Box 62 2005-02-15 05:45

All times are UTC. The time now is 16:27.


Fri Jul 7 16:27:32 UTC 2023 up 323 days, 13:56, 0 users, load averages: 1.91, 2.07, 1.74

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔