mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Hardware (https://www.mersenneforum.org/forumdisplay.php?f=9)
-   -   Parallella / Epiphany (https://www.mersenneforum.org/showthread.php?t=18589)

firejuggler 2013-09-21 01:34

GPU are special-purpose hardware, but they are widespread.

ewmayer 2013-09-21 02:07

[QUOTE=firejuggler;353669]GPU are special-purpose hardware, but they are widespread.[/QUOTE]

Similar to x86 - where I indeed have spent much time on custom coding these past 7-8 years. OTOH, the folks in the Sparc hardware group at Oracle inform me that the recent sparc CPUs - like x86 - support SIMD, but there the bang-for-my-buck math alas doesn't support the custom coding effort.

Actually, another reason I decided to do the ||ization of the scalar-double code as my next project is by way of preparation for port-to-GPU. There the major "how best to spend one's time?" issues relate to coding APIs - e.g. cuda vs OpenCL - and whether to focus on nVidia offerings or try to be more general.

My first toy GPU-coding trials used cuda/nVidia, but now it seems an open, non-single-vendor-tied standard like OpenCL is the way to go.

henryzz 2013-09-21 13:10

[QUOTE=ewmayer;353672]My first toy GPU-coding trials used cuda/nVidia, but now it seems an open, non-single-vendor-tied standard like OpenCL is the way to go.[/QUOTE]
I am not sure that people have been very successful at running opencl code like mfakto on nVidia cards at least at a good speed. It would probably be worth checking that problem has been solved before exclusively using opencl. Nvidia cards are more common on this forum still I think.

sanaris 2013-09-21 23:56

Not just because it is special-hardware, it won't work.
I tested Prime95 on Intel Atoms - its performance is 400ms per iteration - just does not worth looking into.

If you pay for 10 times decrease in power with 100 times decrease with performance, it just doesn't buy itself. Any chip with less then ~80 watt just not worth looking into (of cause if it is not ASIC/big FPGA from $10k, like those in MDGRAPE project for example).

Second. Every fabless attempt to production is useless. Plants have all the money in the world and will reject any deals <$10M. Only powerful nations can afford plants and production. It is political question - why US is not powerful enough to achieve computing targets. Nothing can be done in this area from developer perspective who does not have $10M right in pocket. Nvidia and Amd is now failing companies, they was relying on "fabless fake dream".

jasonp 2013-09-22 00:09

[QUOTE=Batalov;353667]Do they?! /gasp/[/QUOTE]
I've seen several posts on BOINC message boards that say 'it is well known that GPUs are awesome. When will your project use them?'

Batalov 2013-09-22 01:06

I think I've seen similar posts, too, but I don't think they concluded that any particular author was a moron. ;-) They very well may think that the GPUs are awesome and the author is awesome but is busy with IRL stuff. Try George's reply on them:
[quote]At no point in time did I say I was going to implement this [([I]immediately[/I])]. I can guarantee it won't happen this decade*.[/quote]And then, when you will have implemented in a couple if years, they will be pleasantly surprised.

How does you schedule look?
______________
*keep in mind that there was ~a year left in [I]that[/I] decade, so it was a cleverly playful answer.

xilman 2013-10-25 08:10

[QUOTE=xilman;353554]Just placed an order for the 4-board system with delivery expected in November. OK, so it was a wild impulse.

Likely initial projects will be to get ECM implemented and/or a RNS arithmetic library with each core using its own modulus. 64 cores will allow just short of 1024-bit arithmetic to be implemented mostly in parallel.[/QUOTE]Delivery has slipped to December but the architecture manual and datasheet has just come out.

The Epiphany co-processors are 32-bit floating point --- no double precision --- OR 32-bit integer arithmetic together with an independently usable limited integer ALU. The latter doesn't have hardware multiplication/division and the former shares opcodes between fp and [i]signed[/i] integer arithmetic. [i]There is no division operation in either mode[/i] but fused multiply-add and multiply-sub is available in both.

The coprocessors are optimised for DSP and should also be pretty good for symmetric crypto primitives (at first sight) but high performance multi-precision integer arithmetic might be challenging.

I'm looking forward to receiving the kit in a few weeks.

paulunderwood 2017-03-13 20:56

A bit of necro-post...

Did you, Paul and Mark, get anything useful done with these boards?

How is the fpga programmed?

What flavour of Linux are you using?

:bump2:

Mark Rose 2017-03-13 22:02

Nope, I did not.

I never did find a solution to the flaky ethernet, plus I never got around to getting quiet cooling on my kickstarter board. So it's basically sitting in a parts drawer.

GP2 2017-03-13 22:06

[QUOTE=paulunderwood;454819]How is the fpga programmed?[/QUOTE]

I know absolutely nothing about any of this, but I see the keyword FPGA and I'm reminded that Amazon [URL="https://aws.amazon.com/ec2/instance-types/f1/"]AWS recently introduced FPGA instances[/URL] (still in preview), using 16 nm Xilinx UltraScale Plus FPGA.

Is there any hope of some radically faster implementations via this route?

Madpoo 2017-03-14 04:12

[QUOTE=GP2;454824]I know absolutely nothing about any of this, but I see the keyword FPGA and I'm reminded that Amazon [URL="https://aws.amazon.com/ec2/instance-types/f1/"]AWS recently introduced FPGA instances[/URL] (still in preview), using 16 nm Xilinx UltraScale Plus FPGA.

Is there any hope of some radically faster implementations via this route?[/QUOTE]

I suppose the question there is, what kind of magic operation would be awesome to have that isn't currently available anywhere, or what current op/ops can be made faster if only it had a dedicated configuration?

128-bit FP? Faster FMA? (just tossing out two examples that will probably get people laughing at me).

Then it comes down to how (or can) that's implemented in the FPGA in question. Are there existing libraries that offer that or is there anyone with the experience to have a go at coding it?

It sure seems interesting but I suspect the work involved in just programming the chip to work as expected and get the gains you like (plus customizing mprime to use it) would be a major effort. Could be worthwhile in the long run, especially if Intel and/or AMD makes FPGAs part of their future dies.


All times are UTC. The time now is 07:12.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.