mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2011-03-02, 17:19   #1
ixfd64
Bemusing Prompter
 
ixfd64's Avatar
 
"Danny"
Dec 2002
California

23×313 Posts
Default custom GIMPS hardware: how fast and how much?

Suppose a very generous person decided to hire a company to design a custom microchip for the purpose of finding Mersenne primes, and that money is not an issue.

How fast could such a chip be, and how much would it cost to make one? Just curious.
ixfd64 is offline   Reply With Quote
Old 2011-03-02, 19:43   #2
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

3·5·251 Posts
Default

Here's a rough guess based on nothing other than some past experience with state of the art ASIC flows, digital logic development cycles and toolchains, and FFT algorithms in general:

One to ten million dollars and a man year or two of labor for ~10-100x speedup vs. modern general purpose CPUs.

An FPGA solution would probably be cheaper (10 - 100k + 0.5 to 1 man years of labor) for maybe a 10-20x speedup.

ASIC solutions are generally only appropriate if
a) you are going to build and sell millions of them
or
b) you really really really need the size/weight/power reduction or performance improvement (i.e., it is mission critical)
bsquared is offline   Reply With Quote
Old 2011-03-02, 20:17   #3
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

5·23·31 Posts
Default

The Hardware forum has a sticky that goes into a lot of the details of the answers to this question.
jasonp is offline   Reply With Quote
Old 2011-03-05, 04:03   #4
Christenson
 
Christenson's Avatar
 
Dec 2010
Monticello

111000000112 Posts
Default

I think the conclusion has been that it is cheaper and easier to put together 10PCs than to get an FPGA flow working well, or get a GPU to work at the problem. Memory bandwidth is a major issue.
Me, I'd want to look at what would happen if we notice that what mprime does is very much bound by the CPU<-->memory bandwidth, and built a PCI (or other favorite bus) coprocessor card (wait! is that a GPU?) with basically the CPU, a memory slot, and a PCI interface only. (and of course, a heatsink and a way to remove lots of heat).
Architecturally, I'd want to build a machine that could carry out one step of an LL test, and then figure out how to keep it fed (for example, on each clock, feed in the inputs of a new LL step and remove the outputs of a just-completed LL step. The decision as to which LL step would be up to a general-purpose machine. Given all the steps (20-30) involved in a single FFT for an LL step, I would think that I might have that many different LL steps simultaneously in progress.
Christenson is offline   Reply With Quote
Old 2011-03-05, 16:41   #5
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

5·23·31 Posts
Default

Quote:
Originally Posted by Christenson View Post
Me, I'd want to look at what would happen if we notice that what mprime does is very much bound by the CPU<-->memory bandwidth, and built a PCI (or other favorite bus) coprocessor card (wait! is that a GPU?) with basically the CPU, a memory slot, and a PCI interface only. (and of course, a heatsink and a way to remove lots of heat).
This is essentially what a GPU is, except that ATI and Nvidia spend billions of dollars building nice ones, which we can't do. i.e. there's no way I'd be able to build a DDR memory controller in an FPGA that can run as fast or as effectively as one of the multiple memory controllers in a GPU.

I've been wondering recently if it would be worthwhile to build a memory controller optimized for high *address* bandwidth to many banks of DRAM, rather than the traditional optimization for high *data* bandwidth. That might help achieve very high GUPS (giga-updates-per-second) rather than GBPS, and the former is a critical component of fast NFS sieving and linear algebra.
jasonp is offline   Reply With Quote
Old 2011-03-05, 18:20   #6
Uncwilly
6809 > 6502
 
Uncwilly's Avatar
 
"""""""""""""""""""
Aug 2003
101ร—103 Posts

22×2,767 Posts
Default

Quote:
Originally Posted by jasonp View Post
I've been wondering recently if it would be worthwhile to build a memory controller optimized for high *address* bandwidth to many banks of DRAM, rather than the traditional optimization for high *data* bandwidth. That might help achieve very high GUPS (giga-updates-per-second) rather than GBPS, and the former is a critical component of fast NFS sieving and linear algebra.
The IBM 5162 (XT 286) running at 6Mhz could out perform the IBM 5170 (AT) running at 8 MHz because memory related issues. So, your idea of using memory to speed the machine does have a real world precedent.
Uncwilly is offline   Reply With Quote
Old 2011-03-05, 19:24   #7
ixfd64
Bemusing Prompter
 
ixfd64's Avatar
 
"Danny"
Dec 2002
California

23×313 Posts
Default

Argh, I meant to post this in the Hardware forum. Could a mod please move it there?

Thanks.
ixfd64 is offline   Reply With Quote
Old 2011-03-05, 19:44   #8
xilman
Bamboozled!
 
xilman's Avatar
 
"๐’‰บ๐’ŒŒ๐’‡ท๐’†ท๐’€ญ"
May 2003
Down not across

2·17·347 Posts
Default

Quote:
Originally Posted by ixfd64 View Post
Argh, I meant to post this in the Hardware forum. Could a mod please move it there?

Thanks.
Done.

Paul
xilman is online now   Reply With Quote
Old 2011-03-05, 20:15   #9
ixfd64
Bemusing Prompter
 
ixfd64's Avatar
 
"Danny"
Dec 2002
California

250410 Posts
Default

Quote:
Originally Posted by xilman View Post
Done.

Paul
Thanks!

By the way, I agree with Christenson regarding the memory bottleneck. I think someone here mentioned that his Tesla C2050 card was only performing at about 25% of the expected throughput, and the cause was found to be the limited memory bandwith.
ixfd64 is offline   Reply With Quote
Old 2011-03-08, 01:39   #10
Christenson
 
Christenson's Avatar
 
Dec 2010
Monticello

34038 Posts
Default

jasonp...you missed that huge, silly, sh*t-eating grin on my face when I said GPU!....maybe we could get an open SATA chipset to feed the inputs to the FPGAs for LL, or maybe PCI-e to serialize for us. Easier than building DDR3 controllers. Sieving work, we need to look at how to optimize that "worst" case, completely (pseudo-)random memory access, scattered all over a gig or more of memory. Such a bus would need to be deep, that is, lots of memory modules, each of which can start a write and grab the data in one clock, but might need many clocks (and therefore many parallel modules) to commit the data. To sell it, find a non-sieving application, like non-colliding writes to a database, that could use similar performance.
Wonder what it would cost to get Nvidia to let us use that billion dollar investment in bus controllers to front end the kind of dedicated logic arrays that would be useful for mersenne work?
Christenson is offline   Reply With Quote
Old 2011-03-08, 17:23   #11
chris2be8
 
chris2be8's Avatar
 
Sep 2009

46418 Posts
Default

Would it be possible to build a system with ~1Gb of level 3 (or 2) cache? If so how fast would it be for sieving and much would it cost? That's probably the simplest way to speed up main memory.

Chris K
chris2be8 is online now   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Newbie question about current users hardware running GIMPS JonRussell Hardware 42 2017-09-13 17:10
Do normal adults give themselves an allowance? (...to fast or not to fast - there is no question!) jasong jasong 35 2016-12-11 00:57
CUDALucas not fast on my slow hardware saeres GPU Computing 37 2015-11-01 17:32
Custom login? Rodrigo PrimeNet 10 2014-02-22 16:53
Optimal Hardware for bare GIMPS client Angular Hardware 25 2003-03-04 15:05

All times are UTC. The time now is 16:21.


Fri Jul 7 16:21:43 UTC 2023 up 323 days, 13:50, 0 users, load averages: 2.44, 1.76, 1.48

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

โ‰  ยฑ โˆ“ รท ร— ยท โˆ’ โˆš โ€ฐ โŠ— โŠ• โŠ– โŠ˜ โŠ™ โ‰ค โ‰ฅ โ‰ฆ โ‰ง โ‰จ โ‰ฉ โ‰บ โ‰ป โ‰ผ โ‰ฝ โŠ โА โŠ‘ โŠ’ ยฒ ยณ ยฐ
โˆ  โˆŸ ยฐ โ‰… ~ โ€– โŸ‚ โซ›
โ‰ก โ‰œ โ‰ˆ โˆ โˆž โ‰ช โ‰ซ โŒŠโŒ‹ โŒˆโŒ‰ โˆ˜ โˆ โˆ โˆ‘ โˆง โˆจ โˆฉ โˆช โจ€ โŠ• โŠ— ๐–• ๐–– ๐–— โŠฒ โŠณ
โˆ… โˆ– โˆ โ†ฆ โ†ฃ โˆฉ โˆช โІ โŠ‚ โŠ„ โŠŠ โЇ โŠƒ โŠ… โŠ‹ โŠ– โˆˆ โˆ‰ โˆ‹ โˆŒ โ„• โ„ค โ„š โ„ โ„‚ โ„ต โ„ถ โ„ท โ„ธ ๐“Ÿ
ยฌ โˆจ โˆง โŠ• โ†’ โ† โ‡’ โ‡ โ‡” โˆ€ โˆƒ โˆ„ โˆด โˆต โŠค โŠฅ โŠข โŠจ โซค โŠฃ โ€ฆ โ‹ฏ โ‹ฎ โ‹ฐ โ‹ฑ
โˆซ โˆฌ โˆญ โˆฎ โˆฏ โˆฐ โˆ‡ โˆ† ฮด โˆ‚ โ„ฑ โ„’ โ„“
๐›ข๐›ผ ๐›ฃ๐›ฝ ๐›ค๐›พ ๐›ฅ๐›ฟ ๐›ฆ๐œ€๐œ– ๐›ง๐œ ๐›จ๐œ‚ ๐›ฉ๐œƒ๐œ— ๐›ช๐œ„ ๐›ซ๐œ… ๐›ฌ๐œ† ๐›ญ๐œ‡ ๐›ฎ๐œˆ ๐›ฏ๐œ‰ ๐›ฐ๐œŠ ๐›ฑ๐œ‹ ๐›ฒ๐œŒ ๐›ด๐œŽ๐œ ๐›ต๐œ ๐›ถ๐œ ๐›ท๐œ™๐œ‘ ๐›ธ๐œ’ ๐›น๐œ“ ๐›บ๐œ”