![]() |
|
|
#144 |
|
∂2ω=0
Sep 2002
República de California
2DEC16 Posts |
Adding PRP/Gerbicz-check support is my #1 to-do item for the next release. Time frame unclear, but def. 1H2018, hopefully 1Q.
|
|
|
|
|
|
#145 |
|
Banned
"Luigi"
Aug 2002
Team Italia
486510 Posts |
|
|
|
|
|
|
#146 |
|
∂2ω=0
Sep 2002
República de California
22·2,939 Posts |
Using a v17.1 SIMD build, my little Odroid C2 just successfully completed its first double-check. Woo Hoo! To paraphrase a certain wildly popular (and even more-wildly commercialized) SciFi movie of the 1970s, "Help me, Internet of Things and your billions of connected devices using ARM cores, you're my only hope." In other words, the business model here must needs be an "ARMy ant" one.
|
|
|
|
|
|
#147 | |
|
"Victor de Hollander"
Aug 2011
the Netherlands
100100110112 Posts |
Quote:
|
|
|
|
|
|
|
#148 |
|
∂2ω=0
Sep 2002
República de California
1175610 Posts |
|
|
|
|
|
|
#149 |
|
∂2ω=0
Sep 2002
República de California
22·2,939 Posts |
Couple of issues ARM-builders brought to my attention in the past week which I'd like to run by the readership:
1. An Odroid XU4 owner reported that he needed to specify the architecture (in his case via '-march=armv7ve') in order to get a working build. (Sans the -march specifier his build segfaulted.) I'd like to add some verbiage about that to the readme page, but as the laundry list of possible arch-types here is long, I'd really like a simpler solution if possible. Would suggesting use of '-march=native' be the portable way to go here? 2. A user whose ARM device runs the Open Pandora OS reports that the ARM-specific code around the has_asimd function in the util.c file gives a compile error because the '#include <sys/auxv.h>' fails under his OS. I've asked him for more details re. his platform's header-file tree and am awaiting a reply, but I wonder if the inelegant-but-portable way to go here would be to replace said header-file-include with code which simply parses the /proc/cpuinfo file and searches for the string 'asimd'. Thoughts? |
|
|
|
|
|
#150 | ||
|
"Victor de Hollander"
Aug 2011
the Netherlands
32×131 Posts |
Not sure this is the right thread, but most people interested in ARM stuff will probably read it here:
Qualcomm Snapdragon 845 (extensive) performance preview: https://www.anandtech.com/show/12420...rmance-preview Custom implementation of ARM Cortex A75 and A55 cores: 4x Kryo 385 gold (custom A75 with 256KB L2) @2.8GHz 4x Kryo 385 silver (custom A55 with 128KB L2) @1.77GHz 2MB L3 (shared between cores) 3MB system cache (shared between CPU, GPU, accelerators, etc.) LPDDR4x (29.9GB/s bandwidth) process: Samsung 10nm LPP (2nd gen 10nm) - DynamIQ allows for more flexible core combinations (1x A75 + 7x A55 for instance or 2+6 etc.) - Private L2 per core for lower latency (configurable) - A75 (3 wide decode) vs. A73 (2 wide decode) Most of the performance improvements are probably due to the new cache structure and the better memory subsystem. Architecturally there are some small improvements (wider decode and issue queues) to extract more IPS. NEON/FP pipe stays more or less the same between A73 and A75 (source Anandtech): Quote:
- dedicated store pipeline for the NEON/FP A55 and A53 NEON/FP is also almost the same (source Anandtech): Quote:
- A55 gains separate AGUs (Address Generation Units) for loads and stores (instead of 1 AGU that does both), so it can dual-issue a load and store at the same time. - A75 and A55 both gain native support for FP16 ARMs primary goals for (big, OoO designs) over the last generations: A57 --> (high) performance 64bit A72 --> reducing power A73 --> improving power efficiency A75 --> improving performance LITTLE in-order: A53 --> high power efficiency, small die area A55 --> improving performance More info on the A75/A55 https://www.anandtech.com/show/11441...cortex-a75-a55 |
||
|
|
|
|
|
#151 |
|
"Victor de Hollander"
Aug 2011
the Netherlands
117910 Posts |
By the way I tested throttling of a smartphone SoC (Samsung Exynos 7420 inside the Samsung Galaxy S6) in my Dutch blog:
https://victordehollander.tweakblogs...martphone-socs For putting a FP heavy load on the cores I used BOINC and theSkyNet POGS project (so the cycles were put to some good use :) ). https://pogs.theskynet.org/pogs/ Samsung Exynos 7420 Octa inside Samsung Galaxy S6 Instruction set: ARMv8-A (‘64bit’) Process: Samsung 14nm LPE (Low Power Early) FINFETs Microarchitecture: big.LITTLE 4x Cortex-A57 + 4x Cortex-A53 (GTS) Max Frequency: 2.1GHz (A57 cluster) 1.5GHz (A53 cluster) Memory: 3GB 64-bit (2x32bit) dual-channel LPDDR4 @1553MHz (24.88 GB/s bandwidth) Some interesting conclusions: - Geekbench 4 load is very spiky and has sometimes several seconds between tests in which the SoC can cool down. - The glass front and back side of the S6 seem to limit the thermal dissipation to about 2W. - When using 2 cores the 7420 throttles after about 1 minute, frequency never quite stabilises, keeps trying to find a balance between performance/power/heat. - 3 cores can only run at max speed (2100MHz) for about 10-15 seconds. After 3 minutes stabilises @1200MHz - 4 cores throttling sets in almost immediately, keeps switching between 1700 and 1200MHz. - a A57 core consumes almost 1.5W @2100MHz - @1200MHz the A57 core uses ~0.5W - @1500MHz the A57 core uses ~0.7W - The SoC gets quite warm (70C+) when using more than 1 core. The phone itself only feels luke warm, never hot to the touch In terms of performance for BOINC theSkynetPOGS: Intel i5-2500K @4,0GHz core with DDR3-2133 (2133*64bits*2channels/8bits=34GB/s ?) 5476.59 BOINC credits in 103,484 CPUseconds 0.0529 credits/CPUsecs (=100%) 0.0132 credits/CPUsec per 1GHz (=100%) Samsung Galaxy S6 (Exynos 7420) A57 core @1500MHz 99.06 BOINC credits in 11,638 CPUseconds 0.0085 credits/CPUsecs (=16%) 0.0057 credits/CPUsecs per 1GHz (=43%) So in terms of absolute performance a single SandyBridge core outperforms the entire SoC (no surprise there) in this scenario. In terms of performance per GHz the A57 scores about halve that of SandyBridge. If we look at performance per watt it is a whole different story: Lets take 100W @wall for the SandyBridge, (5 seconds for a point) that gives about 472 Joule/BOINCpoint vs. A57 @1500MHz (0.7W) which does take more than 100 seconds, but that works out to just 82J / BOINCpoint. Note: there are two generations of manufacturing process difference (32nm Planar vs. 14nm FINFET). Smartphone SoCs have high memory bandwidth |
|
|
|
|
|
#152 |
|
∂2ω=0
Sep 2002
República de California
22·2,939 Posts |
Thanks for the detailed data, Victor - so this again raises the fairly self-evident idea of ganging up a bunch of such phone chassis into a single larger 'compute block'. Perhaps a year or so following the release of a given popular phone model using the desired chipset, start looking for used ones on the cheap, e.g. due to cracked/missing screens and/or no-longer-working smartphone features which leave the cortex CPUs and basic OS intact. Affix suitable-sized mini-alu-fins to the CPUs and mount all the thus-modified screen-removed phones in a basic chassis which spaces them apart suitably (say 5-10mm gap) to permit external airflow, said chassis also providing a simply shared power rail, cooling fan and whatever kind of basic cabling/switch-setup is needed to allow the phoes to to be interfaced with in order to load software and monitor processes. I wonder how feasible - mainly in term of cost-effectiveness - such a setup would be.
|
|
|
|
|
|
#153 | |
|
"Victor de Hollander"
Aug 2011
the Netherlands
32×131 Posts |
Quote:
- Most phones have locked boot-loaders, so you would have to deal with whatever version of Android they come with. Chances of putting a Linux distro on them are small. Even if somehow the boot-loader could be hacked, there would still be the need for specialized drivers for the SoC?! - Without displays and ethernet, is it even possible to SSH/remote desktop to an Android device? - Price of a second hand (fully working) for instance Samsung S6 (released April 2015) is still well over 100 Euros. Compared to 55 euro for a Odroid-C2 and the price of about 110$ for the announced Odroid-N |
|
|
|
|
|
|
#154 |
|
"Composite as Heck"
Oct 2017
2·52·19 Posts |
ROC-RK3328-CC (Renegade)
https://www.indiegogo.com/projects/r...ndroid-linux#/ GCC: 7.2.0 Image: ROC-RK3328-CC_Ubuntu16.04_Arch64_20180124 It's an A53 board that's faster than a pi3b on paper, but my Renegade benchmarks are slower. I'm hoping that it's a poorly configured image (released weeks before anyone got their boards, still not many have the board, there are other problems with the image). Will retest if they get their act together. Scalar 4 thread Code:
17.1
1024 msec/iter = 89.21 ROE[avg,max] = [0.261718750, 0.312500000] radices = 256 8 16 16 0 0 0 0 0 0
1152 msec/iter = 108.25 ROE[avg,max] = [0.210023717, 0.250000000] radices = 288 8 16 16 0 0 0 0 0 0
1280 msec/iter = 133.32 ROE[avg,max] = [0.224469866, 0.250000000] radices = 160 16 16 16 0 0 0 0 0 0
1408 msec/iter = 148.25 ROE[avg,max] = [0.225892857, 0.250000000] radices = 176 16 16 16 0 0 0 0 0 0
1536 msec/iter = 167.77 ROE[avg,max] = [0.231222098, 0.250000000] radices = 192 16 16 16 0 0 0 0 0 0
1664 msec/iter = 179.01 ROE[avg,max] = [0.226935686, 0.281250000] radices = 208 16 16 16 0 0 0 0 0 0
1792 msec/iter = 198.90 ROE[avg,max] = [0.217843192, 0.281250000] radices = 224 16 16 16 0 0 0 0 0 0
1920 msec/iter = 213.38 ROE[avg,max] = [0.258705357, 0.312500000] radices = 240 16 16 16 0 0 0 0 0 0
2048 msec/iter = 217.56 ROE[avg,max] = [0.320089286, 0.375000000] radices = 256 16 16 16 0 0 0 0 0 0
2304 msec/iter = 259.63 ROE[avg,max] = [0.252232143, 0.312500000] radices = 288 16 16 16 0 0 0 0 0 0
2560 msec/iter = 323.83 ROE[avg,max] = [0.302678571, 0.375000000] radices = 160 16 16 32 0 0 0 0 0 0
2816 msec/iter = 340.89 ROE[avg,max] = [0.265848214, 0.312500000] radices = 176 16 16 32 0 0 0 0 0 0
3072 msec/iter = 381.85 ROE[avg,max] = [0.219266183, 0.281250000] radices = 192 16 16 32 0 0 0 0 0 0
3328 msec/iter = 416.90 ROE[avg,max] = [0.290401786, 0.343750000] radices = 208 16 16 32 0 0 0 0 0 0
3584 msec/iter = 438.84 ROE[avg,max] = [0.211718750, 0.250000000] radices = 224 16 16 32 0 0 0 0 0 0
3840 msec/iter = 490.59 ROE[avg,max] = [0.228404018, 0.257812500] radices = 240 16 16 32 0 0 0 0 0 0
4096 msec/iter = 513.95 ROE[avg,max] = [0.228599330, 0.312500000] radices = 256 16 16 32 0 0 0 0 0 0
4608 msec/iter = 612.99 ROE[avg,max] = [0.221770368, 0.250000000] radices = 288 16 16 32 0 0 0 0 0 0
5120 msec/iter = 706.41 ROE[avg,max] = [0.248325893, 0.312500000] radices = 160 16 32 32 0 0 0 0 0 0
5632 msec/iter = 788.81 ROE[avg,max] = [0.218415179, 0.281250000] radices = 176 16 32 32 0 0 0 0 0 0
6144 msec/iter = 891.81 ROE[avg,max] = [0.213281250, 0.250000000] radices = 24 16 16 16 32 0 0 0 0 0
6656 msec/iter = 978.03 ROE[avg,max] = [0.303348214, 0.375000000] radices = 208 16 32 32 0 0 0 0 0 0
7168 msec/iter = 1091.41 ROE[avg,max] = [0.215611049, 0.250000000] radices = 28 16 16 16 32 0 0 0 0 0
7680 msec/iter = 1270.44 ROE[avg,max] = [0.221010045, 0.281250000] radices = 240 16 32 32 0 0 0 0 0 0
Code:
17.1
1024 msec/iter = 81.63 ROE[avg,max] = [0.254687500, 0.312500000] radices = 256 8 16 16 0 0 0 0 0 0
1152 msec/iter = 91.43 ROE[avg,max] = [0.221044922, 0.250000000] radices = 288 8 16 16 0 0 0 0 0 0
1280 msec/iter = 104.53 ROE[avg,max] = [0.264508929, 0.343750000] radices = 160 16 16 16 0 0 0 0 0 0
1408 msec/iter = 120.61 ROE[avg,max] = [0.227343750, 0.265625000] radices = 176 16 16 16 0 0 0 0 0 0
1536 msec/iter = 153.76 ROE[avg,max] = [0.254241071, 0.312500000] radices = 192 16 16 16 0 0 0 0 0 0
1664 msec/iter = 147.07 ROE[avg,max] = [0.270758929, 0.312500000] radices = 208 16 16 16 0 0 0 0 0 0
1792 msec/iter = 161.98 ROE[avg,max] = [0.220532663, 0.250000000] radices = 224 16 16 16 0 0 0 0 0 0
1920 msec/iter = 172.94 ROE[avg,max] = [0.234137835, 0.265625000] radices = 60 32 32 16 0 0 0 0 0 0
2048 msec/iter = 185.06 ROE[avg,max] = [0.223493304, 0.250000000] radices = 64 32 32 16 0 0 0 0 0 0
2304 msec/iter = 213.12 ROE[avg,max] = [0.268526786, 0.312500000] radices = 144 32 16 16 0 0 0 0 0 0
2560 msec/iter = 237.47 ROE[avg,max] = [0.236908831, 0.312500000] radices = 160 32 16 16 0 0 0 0 0 0
2816 msec/iter = 274.68 ROE[avg,max] = [0.224888393, 0.312500000] radices = 44 32 32 32 0 0 0 0 0 0
3072 msec/iter = 301.81 ROE[avg,max] = [0.224818638, 0.251953125] radices = 48 32 32 32 0 0 0 0 0 0
3328 msec/iter = 327.87 ROE[avg,max] = [0.220971680, 0.250000000] radices = 52 32 32 32 0 0 0 0 0 0
3584 msec/iter = 360.36 ROE[avg,max] = [0.223172433, 0.250000000] radices = 56 32 32 32 0 0 0 0 0 0
3840 msec/iter = 382.60 ROE[avg,max] = [0.224260603, 0.250000000] radices = 60 32 32 32 0 0 0 0 0 0
4096 msec/iter = 403.07 ROE[avg,max] = [0.295089286, 0.343750000] radices = 128 32 32 16 0 0 0 0 0 0
4608 msec/iter = 462.49 ROE[avg,max] = [0.258928571, 0.312500000] radices = 144 32 32 16 0 0 0 0 0 0
5120 msec/iter = 513.99 ROE[avg,max] = [0.237137277, 0.281250000] radices = 160 32 32 16 0 0 0 0 0 0
5632 msec/iter = 584.83 ROE[avg,max] = [0.256919643, 0.312500000] radices = 176 32 32 16 0 0 0 0 0 0
6144 msec/iter = 667.16 ROE[avg,max] = [0.246651786, 0.281250000] radices = 192 32 32 16 0 0 0 0 0 0
6656 msec/iter = 702.36 ROE[avg,max] = [0.266085379, 0.312500000] radices = 208 16 32 32 0 0 0 0 0 0
7168 msec/iter = 798.79 ROE[avg,max] = [0.224874442, 0.281250000] radices = 224 32 32 16 0 0 0 0 0 0
7680 msec/iter = 882.90 ROE[avg,max] = [0.237053571, 0.281250000] radices = 240 32 32 16 0 0 0 0 0 0
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Economic prospects for solar photovoltaic power | cheesehead | Science & Technology | 137 | 2018-06-26 15:46 |
| Which SIMD flag to use for Raspberry Pi | BrainStone | Mlucas | 14 | 2017-11-19 00:59 |
| compiler/assembler optimizations possible? | ixfd64 | Software | 7 | 2011-02-25 20:05 |
| Running 32-bit builds on a Win7 system | ewmayer | Programming | 34 | 2010-10-18 22:36 |
| SIMD string->int | fivemack | Software | 7 | 2009-03-23 18:15 |