mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2020-05-22, 22:37   #1
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

110018 Posts
Default AVX512 hardware recommendations?

Please suggest a cost effective reliable laptop that supports mlucas and prime95 AVX512 operation. One with a keyboard that lasts would be good. (My Dell G3 keyboard is approaching a year old and already certain keys are unreliable.)
kriesel is offline   Reply With Quote
Old 2020-05-22, 23:03   #2
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

2×7×47 Posts
Default

Jury is still out on the P95 performance of the Ryzen 4000 mobile chips (Zen 2). It's AVX2 not AVX512 but in a lot of the typical benchmarks even the lower end Ryzen parts are beating the higher end intel parts, notably power consumption looks to be in Ryzen's favour but the cache has been reduced which may affect P95 heavily. The match up is interesting at least, an efficient instruction set on an inefficient node vs a less efficient instruction set on a more efficient node.



But as always, if the grunt is purely for P95 then you're better off getting a cheap laptop and putting the saved pennies towards another Radeon VII instead.
M344587487 is online now   Reply With Quote
Old 2020-05-22, 23:23   #3
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

389 Posts
Default

I believe the mobile Ice Lake implementations with AVX-512 are single unit, so may not offer any more throughput than AVX2 anyway. Two unit AVX-512 is on the HEDT platform and some Xeons.

The smaller cache on mobile Zen 2 CPUs (compared to desktop) is a concern, but it depends on what tests you're doing. The other "problem" with Zen in general is the CCX nature, and limited internal bandwidth back to ram, although I don't know if that applies to the mobile parts, it does on the desktop ones.

Worth seeking out benchmarks to compare options. I'm not sure I'd want any laptop to run this type of load for a sustained time.
mackerel is offline   Reply With Quote
Old 2020-05-23, 00:25   #4
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

120116 Posts
Default

The point of requiring AVX512 is to be able to test prime95 on exponents that AVX2 or FMA3 hardware won't run, and builds of mlucas and mfactor for AVX512. It would be my only AVX512 hardware.

Re laptops running GIMPS software at high duty cycle, my HP G60-B72 is in year 10 of that and the keyboard still works too. The clamshell hinge anchors broke after too many dives from the arm of the couch to the carpet, but it still works. Alas it is a lowly i3-370M.

The Dell G3 i7-8750H is under extended warranty so perhaps it will go home to Dell for keyboard repair at some point. Since it's 6-core and also has a discrete gtx1050Ti gpu, its keyboard can get uncomfortably hot.

Radeon VIIs are great. I'm not averse to adding some. But this inquiry is for a different purpose, different software, not raw throughput/watt-hour in gpuowl.

Last fiddled with by kriesel on 2020-05-23 at 00:30
kriesel is offline   Reply With Quote
Old 2020-05-23, 22:16   #5
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

389 Posts
Default

Is it the case that Prime95 can do some work with AVX-512 it can't without? My understanding was AVX-512 is more a throughput thing in this use case. It does more of the same, not doing something new as such. Not familiar with the other software.
mackerel is offline   Reply With Quote
Old 2020-05-23, 23:14   #6
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

9,791 Posts
Default

Quote:
Originally Posted by mackerel View Post
Is it the case that Prime95 can do some work with AVX-512 it can't without? My understanding was AVX-512 is more a throughput thing in this use case. It does more of the same, not doing something new as such. Not familiar with the other software.
ISTR - soeone please correct me if I'm wrong - that Prime95's expo limit depends on the width of the SIMD supported by the architecture. Not sure how fundamental this limit is, though, i.e. whether, say, upping it for AVX2 is a mere matter of fiddling a #define or whether there is an FFT-code-related reason for the limits to be lower for AVX2 than for AVX-512.

Ken, not nec. the cheapest solution, but have a gander at the roadmap for the Intel NUC for a possible compact-footprint (my Broadwell/AVX2 one is on back of my monitor) option.
ewmayer is offline   Reply With Quote
Old 2020-05-24, 23:59   #7
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

110018 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Ken, not nec. the cheapest solution, but have a gander at the roadmap for the Intel NUC for a possible compact-footprint (my Broadwell/AVX2 one is on back of my monitor) option.
Does any NUC offer AVX512? AVX512 is required for prime95 / mprime above 920.8M exponent computations involving ffts, as I understand it; everything except TF which I'm unsure of and regard as moot. See https://www.mersenneforum.org/showth...374#post546374

Last fiddled with by kriesel on 2020-05-25 at 00:25
kriesel is offline   Reply With Quote
Old 2020-05-25, 00:01   #8
Mysticial
 
Mysticial's Avatar
 
Sep 2016

7·47 Posts
Default

Quote:
Originally Posted by kriesel View Post
Does any NUC offer AVX512?

The 8121U Cannon Lake. Discontinued though I think.
Mysticial is offline   Reply With Quote
Old 2020-05-25, 00:14   #9
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

100110001111112 Posts
Default

Quote:
Originally Posted by Mysticial View Post
The 8121U Cannon Lake. Discontinued though I think.
So we have to go back (to late 2018) not forward to find avx-512-supporting NUCs ... that could be ideal for Ken's needs, if he could score a used onesuch somewhere.

Here a couple online reviews:

https://www.anandtech.com/show/13405...ep-dive-review

https://www.tomshardware.com/news/in...ing,38191.html

Last fiddled with by ewmayer on 2020-05-25 at 00:17
ewmayer is offline   Reply With Quote
Old 2020-05-25, 16:39   #10
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

110018 Posts
Default

Found a refurb i8121U based complete little box for <$300, <$330 with shipping and taxes. They still have several left.
The following items are included in the box:
Intel NUC 8 Home, a Mini PC with Windows 10 - NUC8i3CYSM, with the following components already installed:
Intel NUC Board NUC8i3CYB, with soldered-down dual-core Intel Core i3 processor 8121U
Discrete graphic card AMD Radeon 540 (soldered down)
Intel Wireless-AC 9560 module (soldered down)
Two wireless antennas
8GB LPDDR4 2400 MHz memory (soldered down)
Pre-installed 1TB 2.5-inch HDD
Operating system
Windows 10
19V power adapter with US Power Cord

When WSL2 goes to general release, it might make a handy little split-personality Win10/linux system.

Still would like to find a laptop AVX512 with 4 cores or better, 8 threads w/ hyperthreaded, 17" screen, manufacturer known for durability (several years heavy use).

Last fiddled with by kriesel on 2020-05-25 at 16:55
kriesel is offline   Reply With Quote
Old 2020-05-25, 20:53   #11
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

9,791 Posts
Default

Found another Anandtech deep-dive review from 25 Jan 2019 which mentions how hard it is to find any kind of laptop using the 10nm Cannon Lake Core i3-8121U - they found just one, a budget-priced educational-focused[!] laptop from Lenovo for the China market. Reviewer notes the laptop was poorly designed, but has some interesting comments re. the "disappointing" early releases of Intel's 10nm process node:

Intel's 10nm Cannon Lake and Core i3-8121U Deep Dive Review | Ian Cutress, Anandtech -- from the conclusion on page 14 (did I mention this was a deep-dive review?), "Conclusion: I Actually Used the Cannon Lake Laptop as a Daily System":
Quote:
When we lived in a world with Intel’s Tick Tock, Cannon Lake would be a natural tick – a known microarchitecture with minor tweaks but on a new process node. The microarchitecture is a tried and tested design, as we now have had four generations of it from Skylake to Coffee Lake Refresh, however the chip just isn’t suitable for prime time.

Looking at how Intel has presented its improvements on 10nm, with features like using Cobalt, Dummy Gates, Contact Over Active Gates, and new power design rules, if we assume that every advancement works perfectly then 10nm should have been a hit out of the gate. The problem is, semiconductor design is like having 300 different dials to play with, and tuning one of those dials causes three to ten others to get worse. This is the problem Intel has had with 10nm, and it is clear that some potential features work and others do not – but the company is not saying which ones for competitive and obvious reasons.

At Intel’s Architecture Day in December, the Chief Engineering Officer Dr. Murthy Renduchintala was asked if the 10nm design had changed. His response was contradictory and cryptic: ‘It is changing, but it hasn’t changed’. At that event the company was firmly in the driving seat of committing to 10nm by the end of 2019, in a quad core Ice Lake mobile processor, in a new 3D packaging design called Lakefield, in an Ice Lake server CPU for 2020, and in a 5G/AI focused processor called Snow Ridge. Whatever 10nm variant of the process they’re planning to use, we will have to wait and see.

I’ll go back to this slide that Intel presented back at the Technology and Manufacturing Day:
[snip]
In this slide it shows on the right that 10nm (and its variants) have lower power through lower dynamic capacitance. However, on the left, Intel shows both 10nm (Cannon Lake) and 10nm+ (Ice Lake) as having lower transistor performance than 14nm++, the current generation of Coffee Lake processors.

This means we might not see a truly high-performance processor on 10nm until the third generation of the process is put into place. Right now, based on our numbers on Cannon Lake, it’s clear that the first generation of 10nm was not ready for prime time.

Cannon Lake: The Blip That Almost Didn’t Happen

We managed to snap up a Cannon Lake chip by calling in a few favors to buy it from a Chinese reseller who I’m pretty sure should not have been selling them to the public. They were educational laptops that may not have sold well, and the reseller just needed to get rid of them. Given Intel’s reluctance to talk about anything 10nm at CES 2018, and we find that the chips ‘shipped for revenue’ end up in a backwater design like this, then it would look like that Intel was trying to hide them. That was our thought for a good while, until Intel announced the Cannon Lake NUC. Even then, from launch announcement to being at general retail took four months, and by that time most people had lost interest.

At some point Intel had to make good on its promises to investors by shipping something 10nm to somewhere. Exactly how many chips were sold (and to whom) is not discussed by Intel, but I have heard some numbers flying around. Based on our performance numbers, it’s obvious why Intel didn’t want to promote it. On the other hand, at least being told about it beyond a simple sentence would have been nice.

After testing the chip, the only way I’d recommend one of these things is for the AVX512 performance. It blows everything else in that market out of the water, however AVX512 enabled programs are few and far between. Plus, given what Intel has said about the Sunny Cove core, that part will have it instead. If you really need AVX512 in a small form factor, Intel will sell you a NUC.
I similarly could use a compact-size AVX-512-capable system for playing with code/build, especially some of the later-released portions of the instruction set not supported by the KNL I did my AVX-512 Mlucas code-dev on.

Found a "like new" system matching Ken's specs above for $255 and free-shipping on Amazon - ordered, will likely ditch the Win10 install for a clean Ubuntu 19.10 one, or perhaps co-install the latter. It will be interesting to compare the throughput with that of an AVX2 build on my venerable dual-core Broadwell NUC, which is ~1/2" lower-profile due it using an M2 module versus the SSD on the just-ordered one. (But a 1TB SSD is a nice chunk of new storage - hell, that is worth over $100 by itself.)

Question: Does it make sense to use the Radeon 540 GPU on these for either TF or LL/PRP testing? Getting some decent GIMPS work from that would be a nice bonus.

Last fiddled with by ewmayer on 2020-05-26 at 00:16
ewmayer is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
AVX512 performance on new shiny Intel kit heliosh Hardware 19 2020-01-18 04:01
29.5 build 5 beta with AVX512 optimizations shows a 15% speed increase simon389 Software 20 2018-12-13 21:01
Hardware recommendations for factoring Mr. Odd Hardware 7 2016-06-02 01:07
need recommendations for a PC ixfd64 Hardware 45 2012-11-14 01:19
Hardware recommendations Mr. Odd Factoring 12 2011-11-19 00:32

All times are UTC. The time now is 08:18.

Mon Oct 26 08:18:01 UTC 2020 up 46 days, 5:28, 0 users, load averages: 2.56, 1.98, 1.92

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.