mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software > Mlucas

Reply
 
Thread Tools
Old 2017-03-16, 05:35   #56
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

2×52×11 Posts
Default

Quote:
Originally Posted by fivemack View Post
It has fused FMA support, but in the form Vd = Vd + Vm*Vn because there isn't space to pass four five-bit register names in a 32-bit opcode
To clarify, only the vector variant uses 3 registers. The scalar one has 4 registers.
ldesnogu is offline   Reply With Quote
Old 2017-03-16, 22:03   #57
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

1163910 Posts
Default

Quote:
Originally Posted by fivemack View Post
AArch64 has 32 integer registers (but X31 reads as zero and throws away anything written to it, so basically that's 31 registers), and also 32 128-bit-wide "SIMD and floating-point" registers.

Code looks like

FADD V3.2D, V5.2D, V7.2D (which adds the doubles in V5[127:64] and V7[127:64] and puts the result in V3[127:64], and also adds the doubles in V5[63:0] and V7[63:0] and puts the result in V3[63:0])

or FADD S3, S7, S2 (which adds the bottom floats of V7 and V2, puts the result in the bottom float of S3, and sets the other three floats of V3 to zero)

It has fused FMA support, but in the form Vd = Vd + Vm*Vn because there isn't space to pass four five-bit register names in a 32-bit opcode (there is also an FMLS instruction that does Vd = Vd - Vm*Vn form).
Thanks - sounds like Wikipedia article has some bugs.

FMA3 is fine, as all my x86 vector-asm (starting with AVX2, obviously) is based on that. I will likely target a 16-vector-regs architecture for my initial implementation - that is more or less like my x86 AVX2/FMA3 code - question, will that appreciably broaden the base of ARM CPUs which can run said code?

Still at least a month until my first AVX-512 implementation is done, but then it will be time to get a suitable Odroid board and start coding!
ewmayer is offline   Reply With Quote
Old 2017-03-17, 01:56   #58
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

55010 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Thanks - sounds like Wikipedia article has some bugs.

FMA3 is fine, as all my x86 vector-asm (starting with AVX2, obviously) is based on that. I will likely target a 16-vector-regs architecture for my initial implementation - that is more or less like my x86 AVX2/FMA3 code - question, will that appreciably broaden the base of ARM CPUs which can run said code?

Still at least a month until my first AVX-512 implementation is done, but then it will be time to get a suitable Odroid board and start coding!
If you are using 64-bit, and you have to get DP SIMD, you can use the 32 registers as it's the number mandated by the architecture.
ldesnogu is offline   Reply With Quote
Old 2017-03-25, 01:26   #59
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

103×113 Posts
Default

The final stages of my initial AVX-512 port of Mlucas are proceeding more quickly than time-budgeted-for, so I'd like to go ahead and order an Odroid dev-board for 128-bit SIMD development under Linux. Any recommendations as to which of the options on offer I should choose? Also, what do I need in addition to the basic board such as PSU, cabling, WiFi? I only plan to use this for code development, so am fine simply using an Ethernet cable to connect and transfer data between my Macbook and the ARM system, but am open to "you really want the WiFi because..." pitches.

Also, the Odroid-C2 description notes

An additional MicroSD card or an eMMC module is required to install the OS.
We recommend the eMMC module as it has much higher performance than standard MicroSD cards.

So boot-from-OS-image-on-USB is not an option?

Lastly, are these boards strictly standalone - in which event I should probably invest in one of the protective housings - or can they be hosted in (say) an ATX-cased PC system?

Last fiddled with by ewmayer on 2017-03-25 at 01:28
ewmayer is offline   Reply With Quote
Old 2017-03-25, 04:27   #60
GP2
 
GP2's Avatar
 
Sep 2003

5×11×47 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Any recommendations as to which of the options on offer I should choose? Also, what do I need in addition to the basic board such as PSU, cabling, WiFi?
[...]
Lastly, are these boards strictly standalone - in which event I should probably invest in one of the protective housings - or can they be hosted in (say) an ATX-cased PC system?
I know next to nothing about hardware, but from a cursory look:

* The C2 option is the only one of the three that implements the 64/32-bit ARMv8-A architecture and AArch64 instruction set, and isn't that what you would be targeting? The other two only implement the 32-bit ARMv7-A architecture and AArch32 instruction set, and that would likely be pretty obsolete by the time (if and when) number crunching on ARM becomes practical or widespread.
* The FAQ (linked from the FAQs tab) mentions the peripherals that can be used. It doesn't seem to mention anything other than standalone, but mentions a discussion forum for questions not covered in the FAQ: http://forum.odroid.com
GP2 is offline   Reply With Quote
Old 2017-03-25, 20:43   #61
VictordeHolland
 
VictordeHolland's Avatar
 
"Victor de Hollander"
Aug 2011
the Netherlands

23·3·72 Posts
Default

My ODROID-U2 boots only from eMMC or microSD by design. It can NOT boot from USB.
It might be possible to 'hack' it to boot from the eMMC or SD and continue to load the OS from USB afterwards, but it is unsupported and unadvicable.
eMMC is quite a bit faster than microSD, especially in I/O operations, but also more expensive (a small eMMC card cost almost the same as the board itself). So I use a microSD card class 10 that was otherwise gathering dust.
VictordeHolland is offline   Reply With Quote
Old 2017-03-26, 01:49   #62
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

101101011101112 Posts
Default

Will probably get a MicroSD, as (like Victor) I don't want the cost of the I/O device to double the system cost. So I would download the OS boot image from the Hardkernel site onto my Mac - as it happens I already have an IOGear USB-based MicroSD reader/writer, sans a MicroSD card (the IOGear was a found item which I figured would come in handy some day.)

So presumably I can get a MicroSD card wherever it's cheapest, and only need to get the C2 board and housing from Hardkernel - sounds like a plan.

Edit: order placed - I decided to just shell out the modest $8 for the MicroSD preflashed with Linux, no point in trying to save a few $ to get a cheaper-per-GB card elsewhere and end up spending an hour or more downloading the OS image and working thru the procedure to unzip and properly transfer it to the SD card myself. Here is what I ordered:

o ODROID-C2
Item# G145457216438 $46.00 USD

o 8GB MicroSD UHS-1 C2 Linux
Item# G145586100692 $8.00 USD

o 5V/2A Power Supply US Plug
Item# G143652633329 $5.00 USD

o ODROID-C2/C1+ Case Clear
Item# G143805171261 $4.50 USD

Plus $16 shipping, total $79.50. Vendor says up to 20 days lead time.

Last fiddled with by ewmayer on 2017-03-26 at 03:34
ewmayer is offline   Reply With Quote
Old 2017-04-20, 17:34   #63
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

32×5×107 Posts
Default

Just installed Raspberry 32 bit on my PI 3 - ARMv7 OS.

mlucas compiled like a charm, now I'm passing make check.

I hope tomorrow to install a new SD card with OpenSUSE 64bit.
I might have some heating issues.

Would you recommend me a specific mlucas benchmark test I can run on both OSes to show how 64 bits is better than 32 bits?

Thank you.

And next step will be the Odroid Pico5 Cluster...
ET_ is offline   Reply With Quote
Old 2017-04-20, 18:19   #64
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

10010110011112 Posts
Default

Quote:
Originally Posted by ET_ View Post
Just installed Raspberry 32 bit on my PI 3 - ARMv7 OS.

mlucas compiled like a charm, now I'm passing make check.

I hope tomorrow to install a new SD card with OpenSUSE 64bit.
I might have some heating issues.

Would you recommend me a specific mlucas benchmark test I can run on both OSes to show how 64 bits is better than 32 bits?

Thank you.

And next step will be the Odroid Pico5 Cluster...
Make check didn't pass (the log file is attached).

I suppose it's because the PI has only 1 GB of free RAM and I was running the GUI on the desktop. can I safely "make install" the program?
Attached Files
File Type: zip self_test.log.zip (4.1 KB, 76 views)

Last fiddled with by ET_ on 2017-04-20 at 18:20
ET_ is offline   Reply With Quote
Old 2017-04-21, 07:10   #65
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

103·113 Posts
Default

Quote:
Originally Posted by ET_ View Post
Make check didn't pass (the log file is attached).

I suppose it's because the PI has only 1 GB of free RAM and I was running the GUI on the desktop. can I safely "make install" the program?
Sure, go ahead and try 'make install'. If you're gonna do any LL-testing or DCing, though, you'll want to finish the self-tests that were interrupted by the out-of-memory-ness. (1 GB should be more than enough, BTW, but if your OS is not doing a decent job of recovering memory as each self-test sub-task completes and frees it, you get the sort of error you saw.) Did the installer script's self-test create a (partial) mlucas.cfg file somewhere in the install directory tree?

If you have a partial .cfg file and want to add the entries missed by the aborted self-test, you have to run each missing FFT length manually. For example, your self-test barfed in middle of the 2048K FFT length test-all-radices step. To rerun that length (substitute whatever your binary is named, probably lowercase 'mlucas'), in the same dir as the partial mlucas.cfg is located (or copy the latter to a run directory of your own choosing and do things there):

Mlucas -fftlen 2048 -iters 100

Then do same for lengths 2304,2560,2816,3072,3328,3584,3840 and 4096.

---------------------------

Aside: My Odroid arrived last week. I've opened the box to have a cursory glance, but still have some AVX-512 coding work to finish, and don't want to get distracted from that. Sorry about the delay.
ewmayer is offline   Reply With Quote
Old 2017-04-21, 14:30   #66
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

113178 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Sure, go ahead and try 'make install'. If you're gonna do any LL-testing or DCing, though, you'll want to finish the self-tests that were interrupted by the out-of-memory-ness. (1 GB should be more than enough, BTW, but if your OS is not doing a decent job of recovering memory as each self-test sub-task completes and frees it, you get the sort of error you saw.) Did the installer script's self-test create a (partial) mlucas.cfg file somewhere in the install directory tree?

If you have a partial .cfg file and want to add the entries missed by the aborted self-test, you have to run each missing FFT length manually. For example, your self-test barfed in middle of the 2048K FFT length test-all-radices step. To rerun that length (substitute whatever your binary is named, probably lowercase 'mlucas'), in the same dir as the partial mlucas.cfg is located (or copy the latter to a run directory of your own choosing and do things there):

Mlucas -fftlen 2048 -iters 100

Then do same for lengths 2304,2560,2816,3072,3328,3584,3840 and 4096.

---------------------------

Aside: My Odroid arrived last week. I've opened the box to have a cursory glance, but still have some AVX-512 coding work to finish, and don't want to get distracted from that. Sorry about the delay.
Don't be sorry... I received my cluster 45 days ago, and it's still inside its box.

Testing single FFT lengths worked fine, but I found the same allocation error while running the -s tiny test.
I'm not worried, as I'm just testing the environment before upgrading to Odroid C2 64 bits. I also found some inconsistencies in the help got from the -h command switch, but again it's not important.

The GMP v6.1.2 compiled and worked immediately on 32 bit ARMv7.
Attached Files
File Type: zip test.cfg.zip (670 Bytes, 69 views)
ET_ is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Economic prospects for solar photovoltaic power cheesehead Science & Technology 137 2018-06-26 15:46
Which SIMD flag to use for Raspberry Pi BrainStone Mlucas 14 2017-11-19 00:59
compiler/assembler optimizations possible? ixfd64 Software 7 2011-02-25 20:05
Running 32-bit builds on a Win7 system ewmayer Programming 34 2010-10-18 22:36
SIMD string->int fivemack Software 7 2009-03-23 18:15

All times are UTC. The time now is 05:58.


Sat Jul 17 05:58:43 UTC 2021 up 50 days, 3:45, 1 user, load averages: 1.39, 1.48, 1.63

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.