![]() |
|
|
#45 | |
|
(loop (#_fork))
Feb 2006
Cambridge, England
144668 Posts |
Quote:
You'd need rather richer and more confident hobbyists to accumulate around $60,000 to get a dozen PCIe boards with six 72Mbit QDRII SRAM chips, an SODIMM socket and a V6-LX240T designed and made - a few hundred hours of Shanzhai engineering and fabrication effort; the big problem being to find the people required to bridge the three degrees of separation between here and the Shenzhen electronic-engineering community (quick, anyone here speak both Verilog and colloquial Mandarin?). Quantity-1 XC6VLX240T chips in a package with enough I/Os turn out to cost more from avnet than a devboard with one on it does, but that says very little about the cost of quantity-50. The next dozen would be cheaper, but we're not in the world of six-figure NREs here, let alone the seven and eight figures you're talking about; these could be made semi-practically for an audience of a dozen people each prepared to pay the price of a 48-core Magny Cours server or a 300mm/2.8 telephoto lens. Of course, I suspect such an audience doesn't exist: this forum is likely to be sampling the very far tail of the distribution of number-theory hobbyists and contains about one person who's personally bought such a server. Last fiddled with by fivemack on 2011-05-11 at 18:10 |
|
|
|
|
|
|
#46 | ||
|
Sep 2006
The Netherlands
3×269 Posts |
Quote:
So we cannot use that as a proof of the above here. We were discussing massive number crunching. That GPU is going to annihilate such fpga's in terms of performance. Just not in power usage. So using that as an example of how to "speed up" number crunching is a very bad idea. DES cracking is from before everyone at home could program a GPU (of course there was some gpu type hardware back in the 90s, but not very common nor cheap). Quote:
You realize a single gpu, just 40 nm, already delivers at each streamcore (which is 4 pe's) a FMA per each cycle, which means a simple card of 500 euro delivers 1.17 Tflop double precision and that runs 830Mhz. Your FPGA is just 400 Mhz, which currently is the HIGHEST CLOCKED fpga. Sure 1.5ghz @ 22 nm is coming. The junk you refer to is clocked oh what is it 30Mhz if you're so lucky or so? How are you going to beat a gamers card man? Sure you can, if you invest hell of a lot more money. Custom design the board and add this and add that. That's a multimillion dollar project as printing 1 such fpga is not interesting at all. When i wanted to produce a FPGA chip the realism was that i had to produce at least a 1000 of those chips. Add 1000 PCI cards (royalties royalties). So that's soon a really expensive project if you have some sort of semi-capable fpga. |
||
|
|
|
|
|
#47 | |
|
(loop (#_fork))
Feb 2006
Cambridge, England
2·7·461 Posts |
Quote:
The EFF did make custom DES chips at one stage; they got two thousand 800nm custom chips for about $130,000, though I suspect that was cost-price. http://cryptome.org/jya/cracking-des/cracking-des.htm I just asked www.mosis.com for a quote; if you give them a full design (which, admittedly, requires probably the better part of a million dollars worth of licences for design software), they will for $211,500 have 100 25mm^2 chips fabbed on the TSMC 65nm logic process, packaged in 256-pin packages, and sent to you by Fedex. 25mm^2 would be six megabytes of SRAM or six Cortex-A8 processors (note that the licence fee for using Cortex-A8 processors is also enormous), or a fairly prodigious quantity of custom logic. |
|
|
|
|
|
|
#48 | |
|
Sep 2006
The Netherlands
3×269 Posts |
Quote:
and i own currently 1 videocard. Plan is when i win the lotto to buy another few videocards. A single videocard delivers 5 Tflop. Card i have just has 1 gpu on the card. Delivers 2.7 Tflop in theory. Maybe i can squeeze out 2 Tflop out of it. Note this is single precision and of course only busy wth integers. At a FPGA you'd do it more efficient. But you must do it more efficient. You dn't have the same number of transistors. It's not interesting to do all that effort to produce out of 1 FPGA development kit. With a compiler of say $60k and those boards you need and some additional logics and a lot of your time, say total project size $100k, programming time not counted. Then you've got 1 fpga. by the time it's there, i can also buy a GPU in the shop with 10k cores @ 1Ghz, delivering 20 Tflop single precision. All this code that runs on those gpu's not extremely efficient, yet you lose just a few factors to efficiency. Not some order of magnitude. In the end the problem with the fpga's is the bandwidth to the caches. 0.5 TB someone claimed. Yet at a GPU you laugh for 0.5 TB to the L1. It actually has, just counting instructions (multiply-add is 1 instruction), Nearly 10 Terabye to the local caches bandwidth. The shared cache already delivers 1 TB/s. I hope you see the problem when designing a FPGA. 0.5 TB/s claim is not very convincing if from each internal calculation the result needs to get stored, say for a FFT. If you already have problems beating 1 gpu, whereas you can soon buy these gpu's second hand dirt cheap and stack them up, how to beat that? FPGA's are interesting for example for traders at an exchange, because of its latency. Yet then you also need to implement the TCP real fast at the same latency like the fastest Network card delivers it by then; that's pretty complicated project. I guess that's why the fpga's are allowed to so quickly get printed at 22 nm, just for the financial guys. This has hardly something to do with number crunching however. Regards, Vincent |
|
|
|
|
|
|
#49 | |||
|
Bamboozled!
"๐บ๐๐ท๐ท๐ญ"
May 2003
Down not across
2×17×347 Posts |
Quote:
For the problem on which the hardware is working, a single GPU is approximately 50 times the speed of a single CPU core. A single FPGA instance is approximately 500 times faster than a single CPU core. A single FPGA can hold a number of instances. These are not made up numbers. I went to quite some effort to measure them. Quote:
Quote:
There are some problems which do not require large amounts of memory nor which use "deep" algorithms but which, rather, use algorithms which are very simple in hardware but rather complex in software. For instance, bit-permutation is expensive in almost all forms of software but is completely free in hardware because it consists solely of re-routing wires between computational elements. Primitives like shift registers and combinatorial logic are very cheap on FPGAs. The same is true of ASICs but a significant advantage of FPGAs is that they are Field Programmable Gate Arrays. That is, the same device can be reprogrammed later to solve a different problem if a better algorithm is implemented after the hardware is built. Paul Last fiddled with by xilman on 2011-05-11 at 18:46 Reason: Fix monir typo |
|||
|
|
|
|
|
#50 |
|
Tribal Bullet
Oct 2004
5×23×31 Posts |
For the record, when I first thought about massively parallel NFS polynomial selection I realized that COPACOBANA would be a very nice target platform for it; but it's not clear whether the system would work out for a problem that needs somewhat more memory than an ECM core but still enough to fit inside one mid-range FPGA. Plus a GPU was only a hundred dollars.
|
|
|
|
|
|
#51 | |
|
Sep 2006
The Netherlands
3·269 Posts |
Quote:
Just need to get convinced that a specific real cpu makes sense to produce. Not sure what technology they are allowed to print in. Usually not extremely bad though. |
|
|
|
|
|
|
#52 | ||
|
(loop (#_fork))
Feb 2006
Cambridge, England
11001001101102 Posts |
Quote:
Quote:
|
||
|
|
|
|
|
#53 |
|
Sep 2006
The Netherlands
11001001112 Posts |
Speaking of GPU's i just had good news from AMD helpdesk, assuming a knowledgeable person answerred. So top bits of 24 x 24 bits also goes at full PE speed (1536 processing elements per GPU @ 880Mhz). OpenCL currently doesn't implement this, but i would sure hope that next servicepack they add an AMD specific function for that, which hopeflly also gets integrated into OpenCL.
That speeds up the AMD gpu factor 3 or so for Trial Factoring. Also might get it 3x faster than the GTX470 which was used to benchmark for nvidia (which is not the fastest nvidia card for doing this). More upon this in another thread. Currently the gpu's are slow in 32 x 32 bits multiplications. Yet it's wishful thinking that will remain so. Some years ago, it was easy for some random good programmer to pick up FPGA programming and outgun by some factor 1000+ a CPU. If we try that nowadays that is really a lot harder and i'd guess it'll get ever harder in future as well, if we simply speak about throughput. Majority of the persons here is very interested in very big prime numbers. i sure write things from that context, as when i tried to design on paper a design for a FPGA there, i sure couldn't cheaply do that. If we discuss something utmost tiny that works under a couple of thousands of bits, obviously there is a lot of tricks possible. yet that's not so relevant i'd argue. Maybe there is 1 or 2 guys with such problem. If some oldie GPU already is factor 50 faster there than a cpu and a fpga factor 500. I'd argue, get 16 gpu's. And right now you can upgrade to new ones; some years from now a single gpu will outgun that. They're busy fixing those gpu's more and more for HPC type crunching workloads. Point is, if you never publicly show up with a problem that GPU's can't do fast, it's not so sure they'll fix them for it. What you post on the net, you have good odds they fix. Regards, Vincent |
|
|
|
|
|
#54 | |
|
Sep 2006
The Netherlands
3×269 Posts |
Quote:
Having it doesn't mean people already are using it. Tells you also something about how unlikely it is that anyone over here ever would invest into FPGA technology. So i'd really argue that 'show' and 'proof' is there, yet even that hardly has people use it. Really hard conclusion on the people rather than GPU's and proving them fast. |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| (Pseudo)-Mathematics in Economics | clowns789 | Miscellaneous Math | 3 | 2016-06-07 04:01 |
| OpenCL for FPGAs | TObject | GPU Computing | 2 | 2013-10-12 21:09 |
| Eugenics: Economics for the Long Run | Asian-American | Soap Box | 62 | 2005-02-15 05:45 |