![]() |
![]() |
#133 | |
∂2ω=0
Sep 2002
República de California
3·72·79 Posts |
![]() Quote:
Will look into replacing the side panel in question with a fine-perforated metal-mesh one, similar to the one on top of the casem covering the 2 water-cooler vent fans. Here some Mlucas avx-512 build timings at 64M-FFT - more below on why that large FFT length is of special interest ATM - on the KNL, all same FFT length, 1-thread-per-core (I found no benefit from any combination of hyperthreading I tried), #threads from 1-64. Parallel scaling is good through 16-threads but then falls off a cliff beyond that: Code:
64M FFT, 1-thread-per-core, #threads from 1-64: #thread: || scaling (vs 1-thr): 65536 msec/iter = 1765.36 radices = 16 16 16 16 16 32 1 1.00 65536 msec/iter = 943.43 radices = 16 16 16 16 16 32 2 .936 65536 msec/iter = 496.24 radices = 16 16 16 16 16 32 4 .889 65536 msec/iter = 259.18 radices = 16 16 16 16 16 32 8 .851 65536 msec/iter = 125.93 radices = 16 16 16 16 16 32 16 .876 65536 msec/iter = 85.70 radices = 256 16 16 16 32 32 .644 65536 msec/iter = 69.06 radices = 256 16 16 16 32 64 .399 [A] Hyperthreading: Physical-cores 0-15, 2-threads-per-core, radices 256,16,16,16,32: 131.97 ms/iter, slower than 16-thr/1-per-core; [B] 2 side-by-side runs, each using 16-thr: Each nets 136 ms/iter, 1.85x total throughput of one 16-thr job, 1.26x total throughput of one 32-thr job; [C] 4 side-by-side runs, each using 16-thr: Each nets 170 ms/iter, 2.96x total throughput of one 16-thr job, 1.62x total throughput of one 64-thr job. First task I set the KNL on is to complete the 64M-FFT one of the pair of primality test of F30 I started several years ago. I did ~2/3 of the needed 2^30-1 = 1073741823 iterations of said test on a pair of machines: one at 60M on a 32-core AVX2 Xeon server, the other at 64M on the GIMPS KNL. Both machines were physically hosted by David Stanfill, who went AWOL early this year. Ryan Proper was kind enough to pick up the 60M run and complete that on a manycore virtual machine machine he had access to, but the 64M one remained in need of completion. Picked that up at iteration 730M last night, based on timings so far ETA for completion is a little over 8 months. Again, per the above table, this is getting less half the total throughput the CPU is capable of. The above multiple-job results [B] and [C] indicate that a much better total throughput would be for me, as soon as the in-development Mlucas v20 has a working p-1 Stage 1 with restart-from savefile capability, I should switch the above F30-test completion to 32-threaded and fire up a second 32-threaded job, in form of a deep p-1 Stage 1 on F33. By deep I mean something on the order of a year's runtime. At that point - assuming none of the occasional GCDs during Stage 1 turns up a factor, which is the expected result based on the TF depth to date - the Stage 1 residue can be distributed to volunteers in possession of bigmem systems - any kind of halfway-fast Stage 2 will need at least 128GB of RAM - to run various Stage 2 subintervals in hopes one finds a factor. Said Stage 1 will of course slow the finishing-off of the F30@64M job, but we already know that number to be composite via small TF-found factor, the primality test is to generate a residue for cofactor PRP-checking. |
|
![]() |
![]() |
![]() |
#134 |
"Ed Hall"
Dec 2009
Adirondack Mtns
5×727 Posts |
![]()
Thanks ewmayer! I followed the "Frankencable" thread as well as this one. These are part of why I'm interested. I'm just usually shy of spending so much at once. As you probably know I dabble with discards rather than upgrading or jumping into something with potential. My trouble is that even if I bought a brand new system that could replace all the machines I curently run for less running cost, I'd probably just add it in with the others instead of replacing them. The same seems to be what I do with most of my additions, although I actually did retire all my Pentium 4 machines.
|
![]() |
![]() |
![]() |
#135 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
4,931 Posts |
![]()
My temp & turbo reports were also with the motherboard-side cover off. Two fans draw air through the big radiator above them.
The lines from block on the cpu to radiator cross through the space where gpus would reside, so any gpu squeezed in there would need to be a low profile type, like the 2GB RX550 I have, or there would be mechanical interference. Mine also has the unusual power supply dimensions Ernst describes, and the physical mounting is unimpressive, involving 2 screws at one end and a zip tie at the other that still leaves it a big wiggly. Does the new Mlucas P-1 code support only Fermats, or also Mersennes? When will the new code be available for others to use? Mlucas on CentOS: 170ms/iter x four 16-thread instances of 64M fft length corresponds to 4 x 1000 / 170 = 23.53 iters/sec throughput on 64 of the 68 cores. (At what average clock rate?) Prime95 on Windows 10 at the same 64M fft length benchmarked as 25.37 iters/sec throughput on all 68 cores. Straight line interpolating down to 64 cores would give 22.15 iters/sec, which is probably a bit pessimistic.The indicated throughput Mlucas vs. prime95 is within 8%, one way or the other. Last fiddled with by kriesel on 2020-11-30 at 04:47 |
![]() |
![]() |
![]() |
#136 | ||||
∂2ω=0
Sep 2002
República de California
101101010111012 Posts |
![]() Quote:
Quote:
Quote:
Code:
processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 87 model name : Intel(R) Xeon Phi(TM) CPU 7250 @ 1.40GHz stepping : 1 microcode : 0x1b0 cpu MHz : 1501.193 cache size : 1024 KB Quote:
Do you have any watts-at-wall numbers for your system, idle and under load? All my wattmeters are currently hooked up to GPU-hosting systems which I don't want to unplug. Last fiddled with by ewmayer on 2020-11-30 at 20:01 |
||||
![]() |
![]() |
![]() |
#137 | ||||
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
4,931 Posts |
![]() Quote:
Quote:
Quote:
Quote:
|
||||
![]() |
![]() |
![]() |
#138 |
∂2ω=0
Sep 2002
República de California
3·72·79 Posts |
![]()
I found some useful perforated-metal-sheet product listings here, but those are building-supply-oriented, few fine-mesh ones, but possibly a few usable. Per my measurement the precise side-panel WxH = 14" x 12 7/16" (35.6 x 31.6 cm), can you confirm/deny those dimensions?
Some fine steel woven filter mesh, the kind one puts over vent to keep bugs out, might also suit, but most I've seen via cursory search has max dimension <= 12". Perhaps something like this, or even a perforated baking mat cut to size and stretched over the side opening. ================ One last theme of interest re. KNL setup: exploring overclocking the CPU and/or onboard memory, and disabling the power-saving/auto-throttling modes, to boost performance. I've made the mobo manual for the Hydra workstation available here (10.6MB). BIOS setup is Chapter 7, very long and detailed. The key sections and settings for performance-tweaking appear to be (default settings noted with **): [pg 7-6] CPU Configuration: Lots of stuff there, main items of interest appear to be frequency settings and power-saving/auto-throttling modes *enable*/disable. [pg 7-13] Memory Configuration o Enforce POR Select Enforce POR to enforce the onboard memory DIMM modules to operate and run at the frequency and voltage as specified by the Intel POR specifications. The options are *Enforce POR*, Disabled and Enforce Stretch Goals. o Memory Frequency Use this feature to set the maximum memory frequency for onboard memory modules. The options are *Auto*, 1600, 1867, 2133, and 2400. |
![]() |
![]() |
![]() |
#139 |
"Ed Hall"
Dec 2009
Adirondack Mtns
E3316 Posts |
![]()
Would window screening be too course?
[EWM: oops, I hit 'edit' instead of 'reply', but will just leave my reply here - this is me hijacking EdH's post below :] That would be similar in terms of fineness as the latter 2 items I linked, but you want a bit of stiffness and typical window-sreen mesh is too supple and thus needs a frame. Using the setscrews to stretch the mesh obviates the too-pliant issue, but w/o a frame you'd tear holes in the window mesh. Not sure if the baking-mat mesh is any sturdier, but cheap enough to just order one, if proves unsuitable for the intended use you can still use it as, well, a baking mat. :) Last fiddled with by ewmayer on 2020-11-30 at 23:06 |
![]() |
![]() |
![]() |
#140 | |||
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
4,931 Posts |
![]() Quote:
Quote:
Quote:
Will be interested to see what you come up with for BIOS setting changes. |
|||
![]() |
![]() |
![]() |
#141 |
∂2ω=0
Sep 2002
República de California
3·72·79 Posts |
![]()
First performance-tweak I tried was fiddling with the onboard-memory frequency. The mobo User Manual uses a quite stupid scheme for indicating submenu levels, right arrows of slightly differing sizes rather than section/subsection numbering. So to get to the
Advanced Setup Configurations -> Chipset Configuration -> North Bridge -> Memory Configuration Inside that rightmost submenu, I set the Enforce POR option to Disable (the only options were "Enforce POR" and "Disable"; the Enforce Stretch Goals" option cited in the manual was not listed.) Next I set Memory Frequency from its default "Auto" - nowhere did I see what actual clock setting that yielded - to the highest available, 2400, then rebooted. There is a related MemTest (Memory Test) option, for which we read "Select Enabled to enable memory testing during system boot. The options are *Enabled* and Disabled", so that is on by default and presumably we got at least a basic mem-test during boot. Fired up Mlucas to resume the 64-threaded F30 continuation run, waited for next checkpoint ... no change in timings. So it seems "Auto" was already setting the onboard-mem to the max displayed value of 2400. |
![]() |
![]() |
![]() |
#142 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
4,931 Posts |
![]() Quote:
Last fiddled with by kriesel on 2020-12-01 at 21:49 |
|
![]() |
![]() |
![]() |
#143 | ||
∂2ω=0
Sep 2002
República de California
3·72·79 Posts |
![]() Quote:
"Memory Frequency Use this feature to set the maximum memory frequency for onboard memory modules. The options are Auto, 1600, 1867, 2133, and 2400." Does 'onboard' not refer to the 16GB, um, onboard memory? OTOH the above clock settings are def. DDR-range, not MCDRAM range. From the Anandtech article you linked: Quote:
I searched for 'mcdram' and 'flat' in the mobo manual, nothing for either. Last fiddled with by ewmayer on 2020-12-01 at 22:30 |
||
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
AMD vs Intel | dtripp | Software | 3 | 2013-02-19 20:20 |
Intel NUC | nucleon | Hardware | 2 | 2012-05-10 23:53 |
Intel RNG API? | R.D. Silverman | Programming | 19 | 2011-09-17 01:43 |
AMD or Intel | mack | Information & Answers | 7 | 2009-09-13 01:48 |
Intel Mac? | penguain | NFSNET Discussion | 0 | 2006-06-12 01:31 |