mersenneforum.org DSP hardware for number crunching?
 Register FAQ Search Today's Posts Mark Forums Read

 2011-07-25, 22:30 #1 ixfd64 Bemusing Prompter     "Danny" Dec 2002 California 45168 Posts DSP hardware for number crunching? Can a Cisco TelePresence codec be used to find Mersenne primes? The hardware in question uses an array of 32 Blackfin ADSP-BF561 processors. I know that DSP applications take advantage of FFT, which is used in GIMPS. So my question is, can one use this hardware for number crunching? Just curious.
 2011-07-26, 02:38 #2 Christenson     Dec 2010 Monticello 5·359 Posts *ANY* computer with sufficient memory can crunch for GIMPS....the question is if it has enough performance to make it interesting. "P90 years forever!" (from P95 no less!) Sure looks like it has enough processing horses to be interesting...how much memory on the codec as a whole? I assume this thing has an ethernet interface for programming and I/O. The main obstacle I see is that you'd have to port something over yourself, as I don't see P95 or anyone here thinking there will be enough of these in GIMPS duty to justify re-targeting mprime or any other code. mfaktc or CUDAlucas might be easiest, as the P95 core is full of assembler optimisations. With 32 processors, it might also be able to do matrix reduction for factoring/sieving jobs. If the price were right, I could see it being pressed into that use. Now, at what cost might I get one? Can I get a developer's kit, or is it going to be like working on a PS/3?
2011-07-26, 03:27   #3
bsquared

"Ben"
Feb 2007

343810 Posts

That device has 2 Blackfin cores, each blackfin core has 2 16bit multiplier/accumulators operating at 600 MHz, for a total of 2400MMAC/s (mega-multiple/accumulates per sec). While the instruction thoughput approaches that of a modern processor (assuming it could be fully utilized), I'm doubtful that a 16 bit MAC would be useful for GIMPS. The processor is obviously not optimized for this sort of thing.

The memory architecture is also lacking for this (or any factoring step, sadly) application. 32k of L1 SRAM cache is good, 128k of L2 SRAM cache is not, and a PC133 bus to main memory is downright ugly.

Of course there are those who will say that a cycle is a cycle. The datasheet says
Quote:
 The architecture has been opti­mized for use in conjunction with the VisualDSP C/C++ compiler
... so knock yourself out!

 2011-07-26, 03:36 #4 ixfd64 Bemusing Prompter     "Danny" Dec 2002 California 2·3·397 Posts I guess it would be safe to assume that programs like Mprime, glucas, etc. won't run on this thing?
2011-07-26, 03:55   #5
bsquared

"Ben"
Feb 2007

2×32×191 Posts

Quote:
 Originally Posted by ixfd64 I guess it would be safe to assume that programs like Mprime, glucas, etc. won't run on this thing?
I don't know for sure, but it's very doubtful since the device does not have an x86 architecture.

2011-07-26, 04:16   #6
bsquared

"Ben"
Feb 2007

343810 Posts

Quote:
 Originally Posted by Christenson Now, at what cost might I get one? Can I get a developer's kit, or is it going to be like working on a PS/3?
Eval kit.

 2011-07-26, 21:06 #7 Christenson     Dec 2010 Monticello 5·359 Posts Only 16 bit MACs? Gonna make it tough on the programmers here, didn't see the SLOW (relatively) bus to modern memory. It can be done, but it will be a major hack....you got a source for a few thousand of these, cheap?
2011-07-26, 21:17   #8
bsquared

"Ben"
Feb 2007

1101011011102 Posts

Quote:
 Originally Posted by Christenson Only 16 bit MACs? Gonna make it tough on the programmers here, didn't see the SLOW (relatively) bus to modern memory. It can be done, but it will be a major hack....you got a source for a few thousand of these, cheap?
Me? No. There *is* a 1000 piece price point, but I imagine that a few thousand would be hard to come by given any definition of cheap. And of course you'll have to design and build boards for them, and buy the rest of the BOM: stuff like memory and capacitors and connectors, and have them assembled (BGA packages can't really be attached by hand - specialized equipment is needed which you don't have).

By the way, don't take my posting of links as an indication that I'm an expert with these devices. I'm just browsing the online website and data sheets the same as the rest of you.

Last fiddled with by bsquared on 2011-07-26 at 21:19 Reason: add the bit about assembly

 2011-07-27, 03:09 #9 Christenson     Dec 2010 Monticello 5×359 Posts Bsquared...thanks. I was thinking of these CPUs assembled into teleconferencing machines, scrapped or something due to no market and thus available for much less than the original manufacturing cost, like maybe at $10 apiece....so I could have a small farm of them instead of my next GPU. I've got some Pentium systems like that around my house, and they are all idle..... I'm still wrestling with the mathematician, who says, lets see, its a large enough computer, and reasonably fast, so it CAN do the job (and leaves all the practical details as an exercise to the reader), and the engineer, who says, suppose the resources are committed to programming this beast, will the return be worth the effort? Will many machines run the code? I think a reasonably skilled C coder could talk cudaLucas and/or P95 and make it run on the machine, but I think there are probably better-leveraged projects for GIMPS. Oh, and divide the performance by approximately 5 if 32 bit operations have to be synthesized from 16 bit operations. (4 is the cost of doing 4 16 bit multiplies to make a 32 bit multiply; add a fudge factor for the cost of managing the extra instructions to make it work. We need an energy per operation cost for this machine, rather badly to do meaningful estimation.  2011-08-02, 11:22 #10 jasonp Tribal Bullet Oct 2004 2×29×61 Posts There is no free lunch. Compared to a high-performance general-purpose CPU costing hundreds of dollars, a$10 DSP is worth every penny. If what you want is raw double-precision flops without regard to space constraints or power consumption, no embedded processor, or even an FPGA, is going to be an attractive option. It's not like 1/100 of an LL test for 1/100 of the dollars is valuable.
 2011-08-08, 04:41 #11 Christenson     Dec 2010 Monticello 179510 Posts I would have thought that high-end GPUs offered the best available performance for under a grand or so -- and that ultimate performance in a given technology would probably involve some kind of dedicated FFT hardware and something similar to an FPGA to route the results. (That is, take FPGA concept, but the fundamental units aren't gates -- they are floating-point register dataflows). In my mind, the question was if there were enough of these DSPs available to make the programming effort worthwhile. A completed LL test with a residue is a completed LL test. If it takes half a year and $0.10 worth of electricity instead of$0.75, it's possibly a worthwhile bargain, provided there are enough (100s) of machines available at the right price.

 Similar Threads Thread Thread Starter Forum Replies Last Post jasonp Hardware 46 2016-07-18 16:41 jasonp Hardware 142 2009-11-15 23:20 Primeinator Lounge 18 2008-09-20 18:18 mfgoode Puzzles 15 2006-06-08 05:34 Angular Hardware 5 2004-01-16 12:37

All times are UTC. The time now is 00:41.

Wed May 19 00:41:03 UTC 2021 up 40 days, 19:21, 0 users, load averages: 2.50, 2.18, 2.05