![]() |
|
|
#1 |
|
Bemusing Prompter
"Danny"
Dec 2002
California
23×313 Posts |
Suppose a very generous person decided to hire a company to design a custom microchip for the purpose of finding Mersenne primes, and that money is not an issue.
How fast could such a chip be, and how much would it cost to make one? Just curious. |
|
|
|
|
|
#2 |
|
"Ben"
Feb 2007
3·5·251 Posts |
Here's a rough guess based on nothing other than some past experience with state of the art ASIC flows, digital logic development cycles and toolchains, and FFT algorithms in general:
One to ten million dollars and a man year or two of labor for ~10-100x speedup vs. modern general purpose CPUs. An FPGA solution would probably be cheaper (10 - 100k + 0.5 to 1 man years of labor) for maybe a 10-20x speedup. ASIC solutions are generally only appropriate if a) you are going to build and sell millions of them or b) you really really really need the size/weight/power reduction or performance improvement (i.e., it is mission critical) |
|
|
|
|
|
#3 |
|
Tribal Bullet
Oct 2004
5·23·31 Posts |
The Hardware forum has a sticky that goes into a lot of the details of the answers to this question.
|
|
|
|
|
|
#4 |
|
Dec 2010
Monticello
111000000112 Posts |
I think the conclusion has been that it is cheaper and easier to put together 10PCs than to get an FPGA flow working well, or get a GPU to work at the problem. Memory bandwidth is a major issue.
Me, I'd want to look at what would happen if we notice that what mprime does is very much bound by the CPU<-->memory bandwidth, and built a PCI (or other favorite bus) coprocessor card (wait! is that a GPU?) with basically the CPU, a memory slot, and a PCI interface only. (and of course, a heatsink and a way to remove lots of heat). Architecturally, I'd want to build a machine that could carry out one step of an LL test, and then figure out how to keep it fed (for example, on each clock, feed in the inputs of a new LL step and remove the outputs of a just-completed LL step. The decision as to which LL step would be up to a general-purpose machine. Given all the steps (20-30) involved in a single FFT for an LL step, I would think that I might have that many different LL steps simultaneously in progress. |
|
|
|
|
|
#5 | |
|
Tribal Bullet
Oct 2004
5·23·31 Posts |
Quote:
I've been wondering recently if it would be worthwhile to build a memory controller optimized for high *address* bandwidth to many banks of DRAM, rather than the traditional optimization for high *data* bandwidth. That might help achieve very high GUPS (giga-updates-per-second) rather than GBPS, and the former is a critical component of fast NFS sieving and linear algebra. |
|
|
|
|
|
|
#6 | |
|
6809 > 6502
"""""""""""""""""""
Aug 2003
101ร103 Posts
22×2,767 Posts |
Quote:
|
|
|
|
|
|
|
#7 |
|
Bemusing Prompter
"Danny"
Dec 2002
California
23×313 Posts |
Argh, I meant to post this in the Hardware forum. Could a mod please move it there?
Thanks. |
|
|
|
|
|
#8 |
|
Bamboozled!
"๐บ๐๐ท๐ท๐ญ"
May 2003
Down not across
2·17·347 Posts |
|
|
|
|
|
|
#9 |
|
Bemusing Prompter
"Danny"
Dec 2002
California
250410 Posts |
Thanks!
By the way, I agree with Christenson regarding the memory bottleneck. I think someone here mentioned that his Tesla C2050 card was only performing at about 25% of the expected throughput, and the cause was found to be the limited memory bandwith. |
|
|
|
|
|
#10 |
|
Dec 2010
Monticello
34038 Posts |
jasonp...you missed that huge, silly, sh*t-eating grin on my face when I said GPU!....maybe we could get an open SATA chipset to feed the inputs to the FPGAs for LL, or maybe PCI-e to serialize for us. Easier than building DDR3 controllers. Sieving work, we need to look at how to optimize that "worst" case, completely (pseudo-)random memory access, scattered all over a gig or more of memory. Such a bus would need to be deep, that is, lots of memory modules, each of which can start a write and grab the data in one clock, but might need many clocks (and therefore many parallel modules) to commit the data. To sell it, find a non-sieving application, like non-colliding writes to a database, that could use similar performance.
Wonder what it would cost to get Nvidia to let us use that billion dollar investment in bus controllers to front end the kind of dedicated logic arrays that would be useful for mersenne work? |
|
|
|
|
|
#11 |
|
Sep 2009
46418 Posts |
Would it be possible to build a system with ~1Gb of level 3 (or 2) cache? If so how fast would it be for sieving and much would it cost? That's probably the simplest way to speed up main memory.
Chris K |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Newbie question about current users hardware running GIMPS | JonRussell | Hardware | 42 | 2017-09-13 17:10 |
| Do normal adults give themselves an allowance? (...to fast or not to fast - there is no question!) | jasong | jasong | 35 | 2016-12-11 00:57 |
| CUDALucas not fast on my slow hardware | saeres | GPU Computing | 37 | 2015-11-01 17:32 |
| Custom login? | Rodrigo | PrimeNet | 10 | 2014-02-22 16:53 |
| Optimal Hardware for bare GIMPS client | Angular | Hardware | 25 | 2003-03-04 15:05 |