mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2016-01-19, 20:09   #89
Xyzzy
 
Xyzzy's Avatar
 
Aug 2002

7·1,237 Posts
Default

Skylake NUCs: http://www.intel.com/content/www/us/...tml#newestnucs

i5 versus i3: http://ark.intel.com/compare/91160,88180

Note DDR4 and 32GiB capacity!

Xyzzy is offline   Reply With Quote
Old 2016-02-06, 02:24   #90
masser
 
masser's Avatar
 
Jul 2003
Behind BB

2·7·11·13 Posts
Default

http://blog.codinghorror.com/the-scooter-computer/
masser is online now   Reply With Quote
Old 2016-02-13, 15:40   #91
Xyzzy
 
Xyzzy's Avatar
 
Aug 2002

7·1,237 Posts
Default

http://www.theinquirer.net/inquirer/...ot-starter-kit

Quote:
The announcement explained: "We focused on the Intel NUC for its relatively low cost point for a starter platform (around $150) and broad availability (you can even find them on Amazon).
Xyzzy is offline   Reply With Quote
Old 2016-02-13, 18:48   #92
VictordeHolland
 
VictordeHolland's Avatar
 
"Victor de Hollander"
Aug 2011
the Netherlands

117910 Posts
Default Odroid U2

I bought an Hardkernel Odroid U2 some time ago. I wanted to use it to play media from a PC/external harddrive. But after a firmware update of my blu-ray player it was rendered useless as the blu-ray player could play files with H264/x264 codec from an external harddrive directly.

Hardkernel Odroid U2 Specs:
http://www.hardkernel.com/main/produ...=G135341370451
Samsung Exynos4412 Prime
Process node: 32nm HKMG
CPU: 4x ARM Cortex-A9 @1.7GHz, 1MB shared L2 cache
Mem: 2GB LPDDR2-880
GPU: ARM Mali-400MP4 @440MHz
LAN: 100mbit
OS: Ubuntu 14.04 or Android 4.4.4
5V 2A adapter (so it shouldn't consume more than 10W)

It is collecting dust for some time now, but I noticed the MLucas and ECM for ARM threads in the past weeks.
http://mersenneforum.org/showthread.php?t=20846
http://mersenneforum.org/showthread.php?t=20614

So I might try to get those working. And if I get lucky post some benchmark results here ;). From the schematics it looks like it only has a single VFPv3 unit for (fast) floating point calculations, so it's probably going to be .
VictordeHolland is offline   Reply With Quote
Old 2016-02-13, 18:59   #93
xilman
Bamboozled!
 
xilman's Avatar
 
"๐’‰บ๐’ŒŒ๐’‡ท๐’†ท๐’€ญ"
May 2003
Down not across

101110000101102 Posts
Default

Quote:
Originally Posted by VictordeHolland View Post
I bought an Hardkernel Odroid U2 some time ago. I wanted to use it to play media from a PC/external harddrive. But after a firmware update of my blu-ray player it was rendered useless as the blu-ray player could play files with H264/x264 codec from an external harddrive directly.

Hardkernel Odroid U2 Specs:
http://www.hardkernel.com/main/produ...=G135341370451
Samsung Exynos4412 Prime
Process node: 32nm HKMG
CPU: 4x ARM Cortex-A9 @1.7GHz, 1MB shared L2 cache
Mem: 2GB LPDDR2-880
GPU: ARM Mali-400MP4 @440MHz
LAN: 100mbit
OS: Ubuntu 14.04 or Android 4.4.4
5V 2A adapter (so it shouldn't consume more than 10W)

It is collecting dust for some time now, but I noticed the MLucas and ECM for ARM threads in the past weeks.
http://mersenneforum.org/showthread.php?t=20846
http://mersenneforum.org/showthread.php?t=20614

So I might try to get those working. And if I get lucky post some benchmark results here ;). From the schematics it looks like it only has a single VFPv3 unit for (fast) floating point calculations, so it's probably going to be .
It would make a fine ECMNET client. My Parallella systems did sterling work on the (forthcoming) HCN extension tables.


Paul
xilman is online now   Reply With Quote
Old 2016-02-16, 21:29   #94
VictordeHolland
 
VictordeHolland's Avatar
 
"Victor de Hollander"
Aug 2011
the Netherlands

32×131 Posts
Default

Thanks to the Debian package I got Mlucas working (sort of) on the Odroid-U2 with Ubuntu 14.04.
Some minor problems with running the selftest. Some radices would result in an ERR_ASSERT or threadpool_init failed error, this happened with the -s tiny (FFT 88K), small (FFT 768K) and m (FFT 3584K) tests.
But if I do:
Code:
mlucas -s s -fftlen 768
Then 768K and 3584K work without spitting out errors (see also the screenshots)

Here some preliminary results (FFT size in K, followed by ms/ilter, errors, radices)
Code:
       128  msec/iter =    9.73  ROE[avg,max] = [0.244838170, 0.312500000]  radices =  16 16 16 16  0  0  0  0  0  0
       144  msec/iter =   12.18  ROE[avg,max] = [0.233816964, 0.281250000]  radices =  36  8 16 16  0  0  0  0  0  0
       160  msec/iter =   13.03  ROE[avg,max] = [0.242075893, 0.312500000]  radices =  20 16 16 16  0  0  0  0  0  0
       176  msec/iter =   15.92  ROE[avg,max] = [0.282393973, 0.375000000]  radices =  44  8 16 16  0  0  0  0  0  0
       192  msec/iter =   14.96  ROE[avg,max] = [0.223990304, 0.250000000]  radices =  24 16 16 16  0  0  0  0  0  0
       208  msec/iter =   18.59  ROE[avg,max] = [0.264927455, 0.312500000]  radices =  52  8 16 16  0  0  0  0  0  0
       224  msec/iter =   18.92  ROE[avg,max] = [0.249218750, 0.312500000]  radices =  56  8 16 16  0  0  0  0  0  0
       240  msec/iter =   20.66  ROE[avg,max] = [0.236710031, 0.281250000]  radices =  60  8 16 16  0  0  0  0  0  0
       256  msec/iter =   21.03  ROE[avg,max] = [0.281250000, 0.281250000]  radices =  32 16 16 16  0  0  0  0  0  0
       288  msec/iter =   25.00  ROE[avg,max] = [0.229432896, 0.312500000]  radices =  36 16 16 16  0  0  0  0  0  0
       320  msec/iter =   27.60  ROE[avg,max] = [0.254017857, 0.312500000]  radices =  40 16 16 16  0  0  0  0  0  0
       352  msec/iter =   32.23  ROE[avg,max] = [0.290244838, 0.406250000]  radices =  44 16 16 16  0  0  0  0  0  0
       384  msec/iter =   32.07  ROE[avg,max] = [0.222670201, 0.281250000]  radices =  24 16 32 16  0  0  0  0  0  0
       416  msec/iter =   37.82  ROE[avg,max] = [0.250251116, 0.312500000]  radices =  52 16 16 16  0  0  0  0  0  0
       448  msec/iter =   39.09  ROE[avg,max] = [0.237262835, 0.281250000]  radices =  56 16 16 16  0  0  0  0  0  0
       480  msec/iter =   42.05  ROE[avg,max] = [0.230156599, 0.281250000]  radices =  60 16 16 16  0  0  0  0  0  0
       512  msec/iter =   43.43  ROE[avg,max] = [0.375000000, 0.375000000]  radices =  32 16 32 16  0  0  0  0  0  0
       576  msec/iter =   51.38  ROE[avg,max] = [0.230461775, 0.281250000]  radices = 144  8 16 16  0  0  0  0  0  0
       640  msec/iter =   57.27  ROE[avg,max] = [0.269419643, 0.312500000]  radices = 160  8 16 16  0  0  0  0  0  0
       704  msec/iter =   65.28  ROE[avg,max] = [0.296651786, 0.390625000]  radices = 176  8 16 16  0  0  0  0  0  0
       768  msec/iter =   70.51  ROE[avg,max] = [0.230664062, 0.312500000]  radices = 192  8 16 16  0  0  0  0  0  0
And the more relevant for the current wavefront and comparison with 'real' desktops:
Code:
      1024  msec/iter =  121.70  ROE[avg,max] = [0.298214286, 0.312500000]  radices = 128 16 16 16  0  0  0  0  0  0
      1152  msec/iter =  142.69  ROE[avg,max] = [0.225310407, 0.250000000]  radices = 144 16 16 16  0  0  0  0  0  0
      1280  msec/iter =  161.44  ROE[avg,max] = [0.251618304, 0.312500000]  radices = 160 16 16 16  0  0  0  0  0  0
      1408  msec/iter =  185.52  ROE[avg,max] = [0.297056362, 0.375000000]  radices = 176 16 16 16  0  0  0  0  0  0
      1536  msec/iter =  195.56  ROE[avg,max] = [0.234742955, 0.312500000]  radices = 192 16 16 16  0  0  0  0  0  0
      1664  msec/iter =  208.36  ROE[avg,max] = [0.254631696, 0.312500000]  radices = 208 16 16 16  0  0  0  0  0  0
      1792  msec/iter =  222.32  ROE[avg,max] = [0.234012277, 0.250000000]  radices = 224 16 16 16  0  0  0  0  0  0
      1920  msec/iter =  243.65  ROE[avg,max] = [0.235016741, 0.281250000]  radices = 240 16 16 16  0  0  0  0  0  0
      2048  msec/iter =  255.25  ROE[avg,max] = [0.310714286, 0.312500000]  radices = 256 16 16 16  0  0  0  0  0  0
      2304  msec/iter =  297.26  ROE[avg,max] = [0.228341239, 0.281250000]  radices = 288 16 16 16  0  0  0  0  0  0
      2560  msec/iter =  339.70  ROE[avg,max] = [0.256682478, 0.312500000]  radices = 160 16 16 32  0  0  0  0  0  0
      2816  msec/iter =  384.56  ROE[avg,max] = [0.296219308, 0.375000000]  radices = 176 16 16 32  0  0  0  0  0  0
      3072  msec/iter =  413.85  ROE[avg,max] = [0.239704241, 0.281250000]  radices = 192 16 16 32  0  0  0  0  0  0
      3584  msec/iter =  370.28  ROE[avg,max] = [0.231487165, 0.281250000]  radices = 224 16 16 32  0  0  0  0  0  0    
      4096  msec/iter =  455.10  ROE[avg,max] = [0.282142857, 0.312500000]  radices = 128 16 32 32  0  0  0  0  0  0
Timings are almost linearly increasing with FFT size, so the Odroid-U2 doesn't seem bandwidth limited. The Cortex-A9 cores are just not very powerful. I don't know if it is using the VFP3 extension/unit to accelerate floating point calc. How can I check if it is being used? At the start of MLucas it says:
Code:
Mlucas 14.1

http://hogranch.com/mayer/README.html

INFO: testing qfloat routines...
CPU Family = ARM Embedded ABI, OS = Linux, 32-bit Version, compiled with Gnu C [or other compatible], Version 4.8.2.
INFO: Using inline-macro form of MUL_LOHI64.
INFO: MLUCAS_PATH is set to ""
INFO: using 53-bit-significand form of floating-double rounding constant for scalar-mode DNINT emulation. 
INFO: testing IMUL routines...
INFO: System has 4 available processor cores.
INFO: testing FFT radix tables...
Mlucas command line options:
For comparison purposes a Core2Duo E7400 @2.8GHz with DDR2-800 (looking at the 2thread timings it is probably single-channel) running the latest mprime.
Code:
Intel(R) Core(TM)2 Duo CPU     E7400  @ 2.80GHz
CPU speed: 2800.89 MHz, 2 cores
CPU features: Prefetch, SSE, SSE2, SSE4
L1 cache size: 32 KB
L2 cache size: 3 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 256
Prime95 64-bit version 28.7, RdtscTiming=1
Best time for 1024K FFT length: 16.134 ms., avg: 16.250 ms.
Best time for 1280K FFT length: 20.692 ms., avg: 27.239 ms.
Best time for 1536K FFT length: 25.937 ms., avg: 26.121 ms.
Best time for 1792K FFT length: 30.732 ms., avg: 31.735 ms.
Best time for 2048K FFT length: 34.630 ms., avg: 35.848 ms.
Best time for 2560K FFT length: 42.596 ms., avg: 44.089 ms.
Best time for 3072K FFT length: 52.945 ms., avg: 53.301 ms.
Best time for 3584K FFT length: 66.552 ms., avg: 82.680 ms.
Best time for 4096K FFT length: 70.515 ms., avg: 73.019 ms.
Best time for 5120K FFT length: 87.307 ms., avg: 94.365 ms.
Best time for 6144K FFT length: 108.814 ms., avg: 122.418 ms.
Best time for 7168K FFT length: 131.005 ms., avg: 149.828 ms.
Best time for 8192K FFT length: 144.657 ms., avg: 156.096 ms.
Timing FFTs using 2 threads.
Best time for 1024K FFT length: 15.166 ms., avg: 15.980 ms.
Best time for 1280K FFT length: 17.386 ms., avg: 18.087 ms.
Best time for 1536K FFT length: 21.690 ms., avg: 22.476 ms.
Best time for 1792K FFT length: 25.742 ms., avg: 26.861 ms.
Best time for 2048K FFT length: 28.558 ms., avg: 33.612 ms.
Best time for 2560K FFT length: 36.754 ms., avg: 39.345 ms.
Best time for 3072K FFT length: 46.804 ms., avg: 48.510 ms.
Best time for 3584K FFT length: 82.085 ms., avg: 83.430 ms.
Best time for 4096K FFT length: 60.806 ms., avg: 64.677 ms.
Best time for 5120K FFT length: 76.952 ms., avg: 78.853 ms.
Best time for 6144K FFT length: 90.912 ms., avg: 91.943 ms.
Best time for 7168K FFT length: 112.518 ms., avg: 123.265 ms.
Best time for 8192K FFT length: 123.618 ms., avg: 132.638 ms.
1024K 121.70ms vs. 16.25ms (=~13% of a single Core2Duo core)
2048K 255.25ms vs. 35.85ms (=~14%)
4096K 455.10ms vs. 73.02ms (=~16%)

Conclusion: it's slow, but we already knew that.
But, it uses very little power. When idle on desktop my wallmeter reads 0W, which actually means it is using <1W and it can't get an accurate reading.
When running Mlucas on a random low DC exponent, let's say M35218831 (it's already DCed and I'm not going to finish it, as it would just take too long ;) )
Code:
M35218831: using FFT length 1920K = 1966080 8-byte floats.
 this gives an average   17.913223775227866 bits per digit
Using complex FFT radices       240        16        16        16
It's using 6.2W with very little variance and barely getting warm to the touch.

For webbrowsing and Libreoffice it is not a bad little thingy. Watching Youtube in 480p is ok, but it struggles with 720p. Setting Chromium to use x264 instead of VP9 for video's makes it slightly more fluent.

[edit]
Next objective is to try to get GMP-ECM working.
Attached Thumbnails
Click image for larger version

Name:	2016-02-16 01 -s m ERROR 768k.png
Views:	200
Size:	115.7 KB
ID:	13902   Click image for larger version

Name:	2016-02-16 02 -s m -fftlen 768k working.png
Views:	208
Size:	114.5 KB
ID:	13903   Click image for larger version

Name:	2016-02-16 03 CPU idle in browser.png
Views:	215
Size:	505.3 KB
ID:	13904   Click image for larger version

Name:	2016-02-16 04 CPU use Mlucas.png
Views:	202
Size:	205.7 KB
ID:	13905  

Last fiddled with by VictordeHolland on 2016-02-16 at 21:36
VictordeHolland is offline   Reply With Quote
Old 2016-02-16, 21:48   #95
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

59710 Posts
Default

Quote:
Originally Posted by VictordeHolland View Post
For comparison purposes a Core2Duo E7400 @2.8GHz with DDR2-800 (looking at the 2thread timings it is probably single-channel) running the latest mprime.
Wouldn't the comparison be more fair if you also used Mlucas on your C2D?

Last fiddled with by ldesnogu on 2016-02-16 at 21:48
ldesnogu is offline   Reply With Quote
Old 2016-02-16, 23:11   #96
VictordeHolland
 
VictordeHolland's Avatar
 
"Victor de Hollander"
Aug 2011
the Netherlands

32·131 Posts
Default

Quote:
Originally Posted by ldesnogu View Post
Wouldn't the comparison be more fair if you also used Mlucas on your C2D?
Yes it would, but it wouldn't change the outcome by much. The E7400_Ubuntu machine is not running at the moment (it uses too much electricity and I haven't got the space for it). I could have compared it with a i5 2500k or a i7 3770k both running P95 under Win7. Both of them have AVX, thus that would make the difference even larger. I don't think there is more than single digit % difference in performance between Mprime and MLucas on pre-AVX machines.

Running GMP-ECM on the Odroid-U2 probably makes more sense.
VictordeHolland is offline   Reply With Quote
Old 2016-02-16, 23:37   #97
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Liverpool (GMT/BST)

3·23·89 Posts
Default

Considering that it is asm vs c code it isn't that bad.
That single channel memory is crippling it though. The memory on Core 2s is bad but I haven't seen it that limited on a dual core before only quads.
henryzz is offline   Reply With Quote
Old 2016-02-17, 09:22   #98
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

59710 Posts
Default

Quote:
Originally Posted by VictordeHolland View Post
Yes it would, but it wouldn't change the outcome by much. The E7400_Ubuntu machine is not running at the moment (it uses too much electricity and I haven't got the space for it). I could have compared it with a i5 2500k or a i7 3770k both running P95 under Win7. Both of them have AVX, thus that would make the difference even larger. I don't think there is more than single digit % difference in performance between Mprime and MLucas on pre-AVX machines.
I didn't realize Ernst had reached that point! Are there benchmarks on various CPU of Mlucas?

FWIW I would have expected the U2 to be about 4 times slower than your C2D, not >6 times, hence my original question.

Quote:
Running GMP-ECM on the Odroid-U2 probably makes more sense.
That's likely
ldesnogu is offline   Reply With Quote
Old 2016-02-23, 03:57   #99
Xyzzy
 
Xyzzy's Avatar
 
Aug 2002

7·1,237 Posts
Default

http://arstechnica.com/gadgets/2016/...i-like-boards/

Quote:
32-bit-only ARMv8 chip is designed to consume as little as 4mW of power.
Xyzzy is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Sieving with powers of small primes in the Small Prime variation of the Quadratic Sieve mickfrancis Factoring 2 2016-05-06 08:13
Small FFTs immediately crashes my computer, help please! scrawlings Information & Answers 39 2014-08-02 21:48
Small computer is surprisingly capable fivemack Hardware 30 2013-05-18 19:36
Another interesting small computer fivemack Hardware 0 2013-04-25 15:15
What would you do with a small quantum computer? CRGreathouse Lounge 39 2012-07-31 00:20

All times are UTC. The time now is 16:29.


Fri Jul 7 16:29:47 UTC 2023 up 323 days, 13:58, 0 users, load averages: 2.34, 2.21, 1.84

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

โ‰  ยฑ โˆ“ รท ร— ยท โˆ’ โˆš โ€ฐ โŠ— โŠ• โŠ– โŠ˜ โŠ™ โ‰ค โ‰ฅ โ‰ฆ โ‰ง โ‰จ โ‰ฉ โ‰บ โ‰ป โ‰ผ โ‰ฝ โŠ โА โŠ‘ โŠ’ ยฒ ยณ ยฐ
โˆ  โˆŸ ยฐ โ‰… ~ โ€– โŸ‚ โซ›
โ‰ก โ‰œ โ‰ˆ โˆ โˆž โ‰ช โ‰ซ โŒŠโŒ‹ โŒˆโŒ‰ โˆ˜ โˆ โˆ โˆ‘ โˆง โˆจ โˆฉ โˆช โจ€ โŠ• โŠ— ๐–• ๐–– ๐–— โŠฒ โŠณ
โˆ… โˆ– โˆ โ†ฆ โ†ฃ โˆฉ โˆช โІ โŠ‚ โŠ„ โŠŠ โЇ โŠƒ โŠ… โŠ‹ โŠ– โˆˆ โˆ‰ โˆ‹ โˆŒ โ„• โ„ค โ„š โ„ โ„‚ โ„ต โ„ถ โ„ท โ„ธ ๐“Ÿ
ยฌ โˆจ โˆง โŠ• โ†’ โ† โ‡’ โ‡ โ‡” โˆ€ โˆƒ โˆ„ โˆด โˆต โŠค โŠฅ โŠข โŠจ โซค โŠฃ โ€ฆ โ‹ฏ โ‹ฎ โ‹ฐ โ‹ฑ
โˆซ โˆฌ โˆญ โˆฎ โˆฏ โˆฐ โˆ‡ โˆ† ฮด โˆ‚ โ„ฑ โ„’ โ„“
๐›ข๐›ผ ๐›ฃ๐›ฝ ๐›ค๐›พ ๐›ฅ๐›ฟ ๐›ฆ๐œ€๐œ– ๐›ง๐œ ๐›จ๐œ‚ ๐›ฉ๐œƒ๐œ— ๐›ช๐œ„ ๐›ซ๐œ… ๐›ฌ๐œ† ๐›ญ๐œ‡ ๐›ฎ๐œˆ ๐›ฏ๐œ‰ ๐›ฐ๐œŠ ๐›ฑ๐œ‹ ๐›ฒ๐œŒ ๐›ด๐œŽ๐œ ๐›ต๐œ ๐›ถ๐œ ๐›ท๐œ™๐œ‘ ๐›ธ๐œ’ ๐›น๐œ“ ๐›บ๐œ”