![]() |
|
|
#837 | |
|
Sep 2006
The Netherlands
36 Posts |
Quote:
8 socket box @ 64 cores intel is $205k. GPU is under 1000. Say 520 euro for the AMD 6990 and 720 for the Nvidia GTX590. Both are for LL the same speed in theory. So far you saw IBDWT timings from the superior CPU code from Woltman. At gpu this is not fastest. Let me have a shot at it. But for future please realize this. The cpu's are 32 nm now and get total annihilated by gpu's wherever someone did do some fulltime work for the gpu. On the cpu we have like near optimal code programmed fulltime over a period of what is it 15 years or so? Even if i do a simple calculation i get to a timing of 10 hours for M42643801 at my 6970. It won't be easy to soon get to that timing. Sure there is hurdles to take. Yet that's a card of by now 280 euro @ 40 nm. I already see that nvidia GTX590 and amd HD6990 will be on par for the initial implementation code. That is because integer multiplication is handicapped on the AMD card; it has 3x more PE's than nvidia. What will happen of course is that if AMD guys have 2 braincells they improve this 32x32 bits a tad as well as top bits calculation. That will speed up NTT already factor 2 for sure. Note TF speeds up factor 3 (but AMD runs definitely behind there on nvidia under perfect circumstances for nvidia as TheJudger shows). Sure you also need more bandwidth. Moving to 22 nm will speed up the gpu's by factor 4. They have a powerbudget of like 500 watt. The cpu's have a powerbudget of say 80-90 watt. CPU's need cache coherency. CPU's need to do well in specint CPU's need to be sustained reliable CPU's are very expensive. So pricewise GPU's always will be a factor 10 ahead or more delivering at least a factor 10 more gflops per dollar. Now what most here don't realize is that both for amd as well as nvidia, with a few riser cards you can stack up a gpu or 8 easily. Some moron here (heh was it you?) suggested to get the 5870. Actually not a bad plan. It'll be slower than 6000 series, not sure by how much. But i saw it for 80 euro on ebay, if we'd get it from usa (add 20% import tax bla bla and shipment bla bla). Let's say you get 8 of 'em for 80 euro. It'll eat 1000 watt in total, maybe bit more. But you can have separated psu's for that. Total cost : 1000 euro power, 640 euro gpu costs, 400 euro for psu's. 2000 euro in total. That'll beat probably any $205k box from intel they release by end 2012. However it's not a fair compare. If we take the 22 nm gpu that AMD will release by then, it's going to be handsdown 10x faster than what they have now, whereas the gpu attached to the cpu's will suck of course because of power limits and too much focus upon double precision. All the fast codes on gpu's are 32 bits codes or even less bits per unit in fact (24 bits huh?). the constraints for the cpu's is too much, that makes them too expensive. If you ask me what i find disappointing on this planet, i'll argue it is the price of cpu's. That small piece of silicon is so so overpriced. Thanks to intel especially and their patents where they simply avoid any competition with and were quite succesful until now gpu's take over. With gpu's they cannot do that (yet) very succesful. Intel is dead performance wise of course, if you look to throughput per euro. Sure they will do well, the society is changing. What was a desktop computer in the 80s and a laptop by the time the 21th century started, it's now a mobile phone of course, not to mention the billions of inhabitants in Azia whom all only buy mobile phones, no computer at all; their mobile phone is their computer. That's where intel can make big bucks, so can AMD. That's where they will focus. As for crunching, it will move completely to gpu's of course. Just a few things like quickly testing algorithms you ideally can do at cpu's rather than gpu's. The production work you do at a gpu of course. Just the spoiled 1st world nations with their thousands of constraints didn't use them well. There is no competition possible against a card that can eat 500 watt for a cpu that can eat only 80 or so. Say 40 for the cpu and 40 for its vector units. So its integrated gpu can eat 40 watt or so. How to compete?????? Intel will have to either buy nvidia or show up with their own gpu. They tried larrabee. If you ask me a total failure from every viewpoint. It just won't be able to compete. The tiny cores have won the crunching battle. Intel is world champion overclocking, or let's say polite high clocking cpu's in new proces technologies others didn't master yet. They are world champion delaying other manufacturers by means of courtcases and blackmail (they managed to delay opteron for a full year doing that huh?). Other companies in their position would have done exactly the same. Intel is good in giving gifts let's face it. That's why they have such great support and no other reason. I can give you evidence for that statement as well. But it will be very embarrassing for a few professors here how corrupt their commissions work. Or let's politely say major incompetence not yet proven 100% to be corruption. In case of NCSA let's call it incompetence. China kicks their butt already for some time now, despite having a budget that's factor 10 less for hardware. Just go to www.top500.org and see. If i look at monsterboard for a job coding gpgpu however i see 0 companies looking for 1. Very disappointing. The price of the gpu's is that it is far more complicated to write good code for them and a big problem of especially open source communities is that they have always such ugly code; only when a company is interested in providing them code they have something. With gpu's that didn't happen yet, despite that industry already massively has embraced them. Not everywhere yet, but at many spots. It all happened sneaky of course. Who'd notice it if a company buys a 10k gpu's? No one would. Just 1 salesman knows it and he ain't gonna public talk, as the reduction they get is considerable then. This at a total of tens of millions of gpu's sold, no one notices it. It is those tens of thousands of gpu's getting sold that makes things cheap dude. No cpu can compete with that. For now. Of course when cpu's integrate gpu's, the risk is that less gpu's will get sold in some years from now. That would be bad news for us, as then price of gpu's will go up. You need something produced in massive numbers to keep cheap. Wasn't it intel that wrote about the 2nd law of Moore? Every new proces technology also the price of a factory to produce 'em is factor 2 more expensive. by 2020 they extrapolated it would be 20 billion dollar to build a single factory. Now they didn't use inflation yet with that price. With the current dollar dropping of course they'll reach that pricepoint sooner, but it means you need big mass production. So far gpu's lucky had this and coming few years will have. They simply cannot sell gamer cards for over a 1000 dollar. Nvidia tries to rip off people with the Tesla/Quadro series a lot. With succes. It's the same chip you know. Just it's $2200 for a tesla @ 448 cores. the intel model. they turned on the double precision logics and turned it off at the gamerscards; factor 4 difference for double precision is the result. So a tesla beats in double precision a GTX580 by factor 4 in double precision was a report of someone who tested it. That's what you get if price doesn't get determined by teenagers gaming, but when the HPC world pays. Factor 8 difference in price or so with the GTX 470, the exact equivalent in gaming? the next generation cpu's that will deliver a big punch will be priced at a price point they can roam the market as it is called in marketing. they'll have a power budget that's a fraction of what the gpu's can use. So you can already prove a factor 10 difference in speed, provided your code works for single precision. That's why all i am busy with is 32 bits code and why TheJudger writes it all in 32 bits code and and and... ...why it will kick all those cpu's in a few years time as well ![]() Vincent |
|
|
|
|
|
|
#838 |
|
Dec 2010
Monticello
5·359 Posts |
Draft Specification for mfaktc automatic interaction with primenet attached. Comments welcome.
Hopefully this is minimum enough that most of it can be cribbed from Prime95. P.S. mfaktc rocks; 0.17 will rock more, as it shares at least the CPU pretty well. E |
|
|
|
|
|
#839 |
|
"Oliver"
Mar 2005
Germany
11×101 Posts |
Speaking about mfaktc: yes, it is!
Each factor candidate needs roughly 4 bytes transfered to the GPU. E.g. a GPU capable of 250M/s needs ~1GB/sec of PCIe bandwidth. So PCIe x1 won't work for current GPUs. On my system (stock GTX 470 on X58 chipset PCIe 2.0 x16) I measure ~6GB/sec. PCIe 2.0 x8 is enough for mfaktc on current highend GPUs while PCI 1.x x8 might be mostly saturated. Oliver |
|
|
|
|
|
#840 |
|
Mar 2010
3×137 Posts |
I've noticed too that when the PCI-E slot is running at x1, most compute apps(incl. mfaktc) suffer from it.
There's no visible difference between x8 and x16. And it's not even PCI-E 2.0 (old P35 MB )
|
|
|
|
|
|
#841 |
|
Jan 2008
France
2×52×11 Posts |
Do you have any number that proves that claim? I would have said that memory efficiency is by far the main bottleneck for high speed sieving, and I guess Intel CPU are better in that regard. Am I wrong?
|
|
|
|
|
|
#842 | |
|
P90 years forever!
Aug 2002
Yeehaw, FL
7,537 Posts |
Quote:
The server will not accept TF bit level results out-of-order. That is, you cannot report no factor from 2^68 to 2^69 until you have reported no factor from 2^67 to 2^68. You can combine them into one message: no factor from 2^67 to 2^69. Are you planning on putting this code in mfaktc or run as a new separate program? |
|
|
|
|
|
|
#843 |
|
Dec 2010
Monticello
34038 Posts |
I see no point in branching mfaktc except for the development version that I have on my desktop; therefore the additional code modules will be linked into mfaktc. I don't want to write a lot of code here, as it isn't at all unreasonable to merge mfaktc into prime95 at some point in the future. It is simply a little too ambitious for me right now.
Unstated assumptions: The automatic communications would be included in mfaktc in much the same way they are in prime95. Any keys we need (GUID for computer, primenet ID) would be added to mfaktc.ini. primenet.c/.h from prime95 will be added unmodified. A few dozen lines in the existing mfaktc will make hooks into the communications thread or set and release locks. If prime95 is reasonable in that area, I intend to borrow the code, possibly to the point of infringing copyright. I will certainly be using the same inter-thread communications methods. After looking at how prime95 handles results.txt and prime.log, I'm wondering if my log file concepts in the proposal are reasonable. Combining multiple bit ranges from results.txt certainly resolves the race condition I was thinking about that might happen if primenet got very fast compared to the mfaktc communications, for example if communications was lost between transmitting the first and second bit ranges on the same exponent. Finally, it's not clear to me that I want my factoring computers to share IDs with my non-factoring computers -- the problem is that, after 4 months of primenet work, I have half a dozen computers that have contributed about 2000GHz-days of work. Xyzzy said he was getting upwards of 500GHz-days per day of work on his card; my medium card is getting about 90GHz-days per day, and in either case, I'm going to end up with a pie chart on my summary that's all TF with a little tiny, indecipherable sliver of everything else. So, as with prime95, we will leave choice of CPU name and User ID up to the user to edit in the config files. Last fiddled with by Christenson on 2011-05-04 at 20:30 |
|
|
|
|
|
#844 | |||
|
P90 years forever!
Aug 2002
Yeehaw, FL
7,537 Posts |
Quote:
Quote:
Quote:
I agree that using or not using the same GUID as prime95 is for the user to decide. |
|||
|
|
|
|
|
#845 |
|
Dec 2010
Monticello
179510 Posts |
When I say infringing copyright, I am talking about a high degree of copying...like cutting and pasting wholesale....since prime95 is open source, I understand that I have permission to do so, especially as the result will also be open source.
I am thinking that communications will be a separate thread. My locks don't have to be operating-system level locks, just something that guarantees mutual exclusion even if the communications thread gets scheduled on a different CPU than the compute thread or threads. They are included because I don't want this to break under a wide range of assumptions. The advantage of a separate thread is that it will have a low impact on the computing activity when problems happen with primenet, which, based on extrapolating the extremely small sample this year, happen every few months. It also lets me put a just a few hooks into the existing mfaktc code and otherwise leave it alone. It also makes later upgrades, such as menus and/or incorporation into prime95, easier. |
|
|
|
|
|
#846 |
|
"James Heinrich"
May 2004
ex-Northern Ontario
1101010111012 Posts |
Have you put any thought into how appropriate work quantity and range is determined for a given system? The number of "at least 30 minutes per assignment" was thrown around, but there will be systems pushing through anywhere from 25 to 500+ GHz-days/day, and what's an appropriate assignment for one system might not be for another. Would you make an attempt to examine the throughput rate of mfaktc and determine appropriate work levels that way, or rely on the user editing the .ini file to say "my system can do ____ GHz-days of work per hour"?
|
|
|
|
|
|
#847 |
|
Jan 2011
Cincinnati, OH
22×52 Posts |
I don't think I've seen anyone mention this before, but last night, mfaktc found my first factor. It was on M76056139. I ran from bit 69 to 70, then 70 to 71, and then finally 71 to 72, which is where it found the factor. What I wasn't expecting was, when I manually uploaded the results file, PrimeNet gave me the correct credit for the first two tests, but then gave me P-1 credit for the last bit level rather than TF credit, and 3 GHz-Days less than the TF credit would have been.
Is this what is expected? Thanks, Doug |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| The P-1 factoring CUDA program | firejuggler | GPU Computing | 753 | 2020-12-12 18:07 |
| gr-mfaktc: a CUDA program for generalized repunits prefactoring | MrRepunit | GPU Computing | 32 | 2020-11-11 19:56 |
| mfaktc 0.21 - CUDA runtime wrong | keisentraut | Software | 2 | 2020-08-18 07:03 |
| World's second-dumbest CUDA program | fivemack | Programming | 112 | 2015-02-12 22:51 |