![]() |
Parallella / Epiphany
So I have a [URL="http://www.parallella.org/"]Parallella[/URL] board with a 1 GHz [URL="http://www.adapteva.com/wp-content/uploads/2012/12/epiphany_arch_reference_3.12.12.18.pdf"]16-core Epiphany[/URL] [PDF] chip on its way to me in a month or so. If you're not familiar with the chip, it's a 32-bit RISC architecture with nano-second fabric between the cores.
Each core can do a single 32-bit integer operation (load/store, add/sub, shift, bitwise stuff) per clock. Each core also has a IEEE754 single-precision FPU unit capable of addition, subtraction, [b]fused multiply-add, fused multiply-subtract[/b], fixed-to-float conversion, absolute, float-to-fixed conversion per clock (2 GFlops/s). It also has 64 registers usable with no restrictions on access, 32 KB of L1 cache per core, and 1 GB of shared memory for the whole board in one unified address space. And its power draw is only 5 watts for the whole board. So it's efficient power wise (but maybe not cost wise). I bought it mainly as something to get more into lower level programming and to learn some math. It does support OpenCL, but I'm thinking writing assembly could get more performance out of it. What would such an architecture be useful for with regards to GIMPS (or SoB)? |
Being RISC rather than x86, very limited. You could theoretically use it on GIMPS, but not SoB. You should talk to Ernst (ewmayer) about optimizing mlucas for it.
|
[QUOTE=rogue;353020]Being RISC rather than x86, very limited. You could theoretically use it on GIMPS, but not SoB. You should talk to Ernst (ewmayer) about optimizing mlucas for it.[/QUOTE]
Sorry, Mlucas requires 64-bit int and floating-double support. W.r.to LL testing (rather than TF) the former is needed for the quad-float emulation used in trig-tables inits; the FFT could obviously be recoded to use single-floats, but not worth my time. I wish I had time to custom-code for every interesting arch out there, but have alas just one lifetime and need to choose my coding battles carefully. Anyone is welcome to take the source and try to convert the scalar-double build stuff (as opposed to the x86 SIMD) to use float rather the double, but aside from very, very limited "start here and look at..." advice they'd be on their own. |
[QUOTE=ewmayer;353068]Sorry, Mlucas requires 64-bit int and floating-double support. W.r.to LL testing (rather than TF) the former is needed for the quad-float emulation used in trig-tables inits; the FFT could obviously be recoded to use single-floats, but not worth my time. I wish I had time to custom-code for every interesting arch out there, but have alas just one lifetime and need to choose my coding battles carefully.[/quote]
Having just one lifetime was the saddest realization I ever had. So many things I'll never have time to do... [quote]Anyone is welcome to take the source and try to convert the scalar-double build stuff (as opposed to the x86 SIMD) to use float rather the double, but aside from very, very limited "start here and look at..." advice they'd be on their own.[/QUOTE] I may yet try it. When the chip arrives I'll look into it in more detail. |
[QUOTE=Mark Rose;353018]So I have a [URL="http://www.parallella.org/"]Parallella[/URL] board with a 1 GHz [URL="http://www.adapteva.com/wp-content/uploads/2012/12/epiphany_arch_reference_3.12.12.18.pdf"]16-core Epiphany[/URL] [PDF] chip on its way to me in a month or so. If you're not familiar with the chip, it's a 32-bit RISC architecture with nano-second fabric between the cores.
Each core can do a single 32-bit integer operation (load/store, add/sub, shift, bitwise stuff) per clock. Each core also has a IEEE754 single-precision FPU unit capable of addition, subtraction, [b]fused multiply-add, fused multiply-subtract[/b], fixed-to-float conversion, absolute, float-to-fixed conversion per clock (2 GFlops/s). It also has 64 registers usable with no restrictions on access, 32 KB of L1 cache per core, and 1 GB of shared memory for the whole board in one unified address space. And its power draw is only 5 watts for the whole board. So it's efficient power wise (but maybe not cost wise). I bought it mainly as something to get more into lower level programming and to learn some math. It does support OpenCL, but I'm thinking writing assembly could get more performance out of it. What would such an architecture be useful for with regards to GIMPS (or SoB)?[/QUOTE]Just placed an order for the 4-board system with delivery expected in November. OK, so it was a wild impulse. Likely initial projects will be to get ECM implemented and/or a RNS arithmetic library with each core using its own modulus. 64 cores will allow just short of 1024-bit arithmetic to be implemented mostly in parallel. |
[QUOTE=Mark Rose;353082]Having just one lifetime was the saddest realization I ever had. So many things I'll never have time to do...
I may yet try it. When the chip arrives I'll look into it in more detail.[/QUOTE] Having just recently completed the main phase [meaning "aside from ongoing optimization efforts"] of my big coding project that began the year, porting all my Mlucas SIMD code to AVX, for my part I am now going to spend some weeks bringing the old scalar-double code under the same pthread ||ization umbrella. [I have ditched all the old experimental OpenMP pragmas, the interface is just way too opaque - and having all the pthread infrastructure in place for the SIMD builds makes it relatively trivial to handle the scalar code the same way.] Long story short, the official releases will still be doubles-based, but should make it much easier for someone to try to port to architectures like the one under discussion here. |
[QUOTE=ewmayer;353068]Sorry, Mlucas requires 64-bit int and floating-double support. W.r.to LL testing (rather than TF) the former is needed for the quad-float emulation used in trig-tables inits; the FFT could obviously be recoded to use single-floats, but not worth my time. I wish I had time to custom-code for every interesting arch out there, but have alas just one lifetime and need to choose my coding battles carefully.
[/QUOTE] Some computing history..... Indeed. Back in the 80's Sam Wagstaff and Jeff Smith at UGA wanted to produce a custom architecture to run CFRAC. It was known as the EPOC. [extended precision operand computer]. Very shortly thereafter I got MPQS running on a VAX and people came to realize that trying to implement algorithms on custom hardware was (in general) not worth the effort. The NSF made an informal decision to look at custom architecture proposals VERY carefully. At the same time, the MicroVax and Sun's were becoming available. It was also realized that it was more practical, more cost effective, and VASTLY more portable to "ride the technology curve" as low cost distributed small computers became widespread. One can always squeeze more performance (sometimes a lot more) from custom hardware. The price is portability. History has shown that custom implementations that only run on special hardware are generally not worth the effort. |
[QUOTE=R.D. Silverman;353646]History has shown that custom
implementations that only run on special hardware are generally not worth the effort.[/QUOTE]Even TWINKLE? |
[QUOTE=only_human;353654]Even TWINKLE?[/QUOTE]
Never implemented. Would not have been funded by NSF. It was of theoretical interest only. |
A group in Japan did implement an NFS line sieve in dedicated hardware several years ago. SHARCS has some nice papers on hardware architectures for sieving too.
GPUs are technically special-purpose, but the economies of scale in the gaming market make them dramatically more common. Now BOINC participants think you're a moron if your computation doesn't run on their cards. |
[QUOTE=jasonp;353665]...a moron if your computation doesn't run on their cards.[/QUOTE]
Do they?! /gasp/ |
| All times are UTC. The time now is 21:18. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.