![]() |
|
|
#1 |
|
Bemusing Prompter
"Danny"
Dec 2002
California
23·313 Posts |
Can a Cisco TelePresence codec be used to find Mersenne primes?
The hardware in question uses an array of 32 Blackfin ADSP-BF561 processors. I know that DSP applications take advantage of FFT, which is used in GIMPS. So my question is, can one use this hardware for number crunching? Just curious. |
|
|
|
|
|
#2 |
|
Dec 2010
Monticello
5·359 Posts |
*ANY* computer with sufficient memory can crunch for GIMPS....the question is if it has enough performance to make it interesting. "P90 years forever!" (from P95 no less!)
Sure looks like it has enough processing horses to be interesting...how much memory on the codec as a whole? I assume this thing has an ethernet interface for programming and I/O. The main obstacle I see is that you'd have to port something over yourself, as I don't see P95 or anyone here thinking there will be enough of these in GIMPS duty to justify re-targeting mprime or any other code. mfaktc or CUDAlucas might be easiest, as the P95 core is full of assembler optimisations. With 32 processors, it might also be able to do matrix reduction for factoring/sieving jobs. If the price were right, I could see it being pressed into that use. Now, at what cost might I get one? Can I get a developer's kit, or is it going to be like working on a PS/3? |
|
|
|
|
|
#3 | |
|
"Ben"
Feb 2007
3×5×251 Posts |
That device has 2 Blackfin cores, each blackfin core has 2 16bit multiplier/accumulators operating at 600 MHz, for a total of 2400MMAC/s (mega-multiple/accumulates per sec). While the instruction thoughput approaches that of a modern processor (assuming it could be fully utilized), I'm doubtful that a 16 bit MAC would be useful for GIMPS. The processor is obviously not optimized for this sort of thing.
The memory architecture is also lacking for this (or any factoring step, sadly) application. 32k of L1 SRAM cache is good, 128k of L2 SRAM cache is not, and a PC133 bus to main memory is downright ugly. Of course there are those who will say that a cycle is a cycle. The datasheet says Quote:
|
|
|
|
|
|
|
#4 |
|
Bemusing Prompter
"Danny"
Dec 2002
California
23·313 Posts |
I guess it would be safe to assume that programs like Mprime, glucas, etc. won't run on this thing?
|
|
|
|
|
|
#5 |
|
"Ben"
Feb 2007
3×5×251 Posts |
|
|
|
|
|
|
#7 |
|
Dec 2010
Monticello
5×359 Posts |
Only 16 bit MACs? Gonna make it tough on the programmers here, didn't see the SLOW (relatively) bus to modern memory.
It can be done, but it will be a major hack....you got a source for a few thousand of these, cheap? |
|
|
|
|
|
#8 | |
|
"Ben"
Feb 2007
3·5·251 Posts |
Quote:
By the way, don't take my posting of links as an indication that I'm an expert with these devices. I'm just browsing the online website and data sheets the same as the rest of you. Last fiddled with by bsquared on 2011-07-26 at 21:19 Reason: add the bit about assembly |
|
|
|
|
|
|
#9 |
|
Dec 2010
Monticello
34038 Posts |
Bsquared...thanks. I was thinking of these CPUs assembled into teleconferencing machines, scrapped or something due to no market and thus available for much less than the original manufacturing cost, like maybe at $10 apiece....so I could have a small farm of them instead of my next GPU. I've got some Pentium systems like that around my house, and they are all idle.....
I'm still wrestling with the mathematician, who says, lets see, its a large enough computer, and reasonably fast, so it CAN do the job (and leaves all the practical details as an exercise to the reader), and the engineer, who says, suppose the resources are committed to programming this beast, will the return be worth the effort? Will many machines run the code? I think a reasonably skilled C coder could talk cudaLucas and/or P95 and make it run on the machine, but I think there are probably better-leveraged projects for GIMPS. Oh, and divide the performance by approximately 5 if 32 bit operations have to be synthesized from 16 bit operations. (4 is the cost of doing 4 16 bit multiplies to make a 32 bit multiply; add a fudge factor for the cost of managing the extra instructions to make it work. We need an energy per operation cost for this machine, rather badly to do meaningful estimation. |
|
|
|
|
|
#10 |
|
Tribal Bullet
Oct 2004
5·23·31 Posts |
There is no free lunch. Compared to a high-performance general-purpose CPU costing hundreds of dollars, a $10 DSP is worth every penny.
If what you want is raw double-precision flops without regard to space constraints or power consumption, no embedded processor, or even an FPGA, is going to be an attractive option. It's not like 1/100 of an LL test for 1/100 of the dollars is valuable. |
|
|
|
|
|
#11 |
|
Dec 2010
Monticello
70316 Posts |
I would have thought that high-end GPUs offered the best available performance for under a grand or so -- and that ultimate performance in a given technology would probably involve some kind of dedicated FFT hardware and something similar to an FPGA to route the results. (That is, take FPGA concept, but the fundamental units aren't gates -- they are floating-point register dataflows).
In my mind, the question was if there were enough of these DSPs available to make the programming effort worthwhile. A completed LL test with a residue is a completed LL test. If it takes half a year and $0.10 worth of electricity instead of $0.75, it's possibly a worthwhile bargain, provided there are enough (100s) of machines available at the right price. |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| The prime-crunching on dedicated hardware FAQ (II) | jasonp | Hardware | 46 | 2016-07-18 16:41 |
| The prime-crunching on dedicated hardware FAQ | jasonp | Hardware | 142 | 2009-11-15 23:20 |
| The Number Crunching King | Primeinator | Lounge | 18 | 2008-09-20 18:18 |
| Number Crunching Series. | mfgoode | Puzzles | 15 | 2006-06-08 05:34 |
| Optimal Hardware for Dedicated Crunching Computer | Angular | Hardware | 5 | 2004-01-16 12:37 |