![]() |
|
|
#34 |
|
Jun 2003
32·17 Posts |
Here is link on Tom's Hardware which mentions Nvidia's CUDA technology for speeding up floating point operations using its graphics cards http://www.tomshardware.com/news/nvi...hics,5417.html.
George, I understand how doubling the TF speed provides a minimal increase in Prime95 throughput when you increase the factoring limits from 69 to 70. However, we can look at this question in another way. What would increasing the number of values TF'd to a factoring limit of 69 do for Prime95 throughput? Is the trailing edge of values TF to 69 far enough ahead of LL testing that all LL testers are given values that have already been TF'd? |
|
|
|
|
#35 |
|
6809 > 6502
"""""""""""""""""""
Aug 2003
101×103 Posts
2×4,909 Posts |
If a bunch of LMH'ers had GPU's doing TF, on top of the CPU's and only upto some normal limit, that would help (maybe not as much as getting a 5% gain in L-L, but some.) And If the code was based on Luigi's Factor5(?) then that would leave George alone.
|
|
|
|
|
#36 | |
|
∂2ω=0
Sep 2002
República de California
2×32×647 Posts |
Quote:
2048K: 1.00 2304K: 1.13 2560K: 1.23 Now in absolute terms the code is still slower than Prime95 [I only started serious SSE2 coding last Fall, so I'm just a little behind the curve] - Mlucas@2304K runs no faster than Prime95@2560K - but the above numbers indicate that there's no reason in principle a radix-9 enhancement shouldn't provide a decent speedup for qualifying exponents. Of course coding it is not completely trivial, and it's not my job to tell George how to deploy his programming effort. |
|
|
|
|
|
#37 |
|
May 2008
21078 Posts |
If 2304K FFT is implemented would it also allow for 4608K, etc.?
|
|
|
|
|
#38 |
|
∂2ω=0
Sep 2002
República de California
265768 Posts |
Yes - once the radix-9-based front-end-routine[s] are in place, any FFT length of form 9*2n can be handled. Of course one also needs a suitable set of power-of-2-radix routines - I have those for radix-8,16,32 but no larger, so 4608K would require a combination of radices such as 36,16,16,16,16. [With my code larger front-end radices tend to be preferable because they lead to smaller dataset chunks and thus less spillover out of the L1 and L2 caches, and will also allow for better parallelization, once I get around to debug and tuning of the multithreaded implementation. Still have some front-end radices for other intermediate FFT lengths [e.g. (3,7,11,13,15)*2n] to code up first, though.
|
|
|
|
|
#39 | |
|
Tribal Bullet
Oct 2004
1101110101112 Posts |
Quote:
- buy graphics card (easy) - get used to the SDK (harder) - figure out how to get stock code to run on the card (harder). What would have to change in a command-line program to run on a GPU? What if there's no C library or console for output? What if double precision requires special contortions? Porting to a coprocessor is hard, the odds are overwhelming that a lot of little things will have to change - figure out whether getting the floating point performance of a 100MHz pentium in your high-end card will convince people to spend $300 on a sufficiently modern graphics card of their own (easy: no) I don't know how many people have such cards already and are currently contributing to a project. - do much more work to increase performance by 20x in order to justify all the work up until now (very hard) Graphics cards have been programmable for years by now. If the porting process is straightforward, someone would have done it by now. By way of comparison, msieve was ported to the PowerPC processor in the PS3 many months ago (I was impressed how easy it was), but the performance is pretty disappointing because the real payoff involves optimizing the code for the Cell coprocessor engines. Last fiddled with by jasonp on 2008-05-23 at 17:06 |
|
|
|
|
|
#40 |
|
Jun 2003
9916 Posts |
Here is an interesting link about Intel's Larabee upcoming graphics product. http://www.xbitlabs.com/news/video/d...onference.html
|
|
|
|
|
#41 | |
|
"Jason Goatcher"
Mar 2005
3×7×167 Posts |
Quote:
The slowdown on AMD hardware with his code is so pronounced that one might conclude that there's a George Woltman conspiracy going on. Suffice it to say, building new code specifically for AMD hardware probably isn't something anyone would want to undertake, so it's doubtful someone will show up with good AMD code. But if someone wanted to optimize graphics card code... Well, you've seen the speedup with Folding@Home. And graphics cards have WAY more throughput than cpus. 10-20 graphics cards could run circles around everyone in GIMPS, including the teams. I don't have the skills to build a graphics card implementation, and the person I got my information from doesn't want to come forward, but GW is either lying(yes, I said it) or mistaken when he says making a graphics cards implementation isn't worthwhile. He's either greedy for the prize or is sick of working on Prime95, in my opinion. |
|
|
|
|
|
#42 | |
|
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
2·47·101 Posts |
Quote:
Consider for a moment the opposite effect with the lattice sievers. I've just now tried to get them to run not 3 times slower (Q6600 vs. an Opteron). Achieved only factor of 2.2 after mutliple tunes and builds... This is something where Intel binaries suck. Opterons are excellent for these jobs. P.S. Generally it is not very polite to discuss things where one knows zilch, ok? Learn to read the assembly code first, then criticize. |
|
|
|
|
|
#43 | |
|
May 2008
3×5×73 Posts |
Quote:
|
|
|
|
|
|
#44 | |
|
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
2·33·109 Posts |
Quote:
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| New PC dedicated to Mersenne Prime Search | Taiy | Hardware | 12 | 2018-01-02 15:54 |
| The prime-crunching on dedicated hardware FAQ (II) | jasonp | Hardware | 46 | 2016-07-18 16:41 |
| How would you design a CPU/GPU for prime number crunching? | emily | Hardware | 4 | 2012-02-20 18:46 |
| DSP hardware for number crunching? | ixfd64 | Hardware | 15 | 2011-08-09 01:11 |
| Optimal Hardware for Dedicated Crunching Computer | Angular | Hardware | 5 | 2004-01-16 12:37 |