![]() |
![]() |
#188 |
Sep 2016
238 Posts |
![]()
I've got my KNL box idle again, is there an updated repo where I can test out your code branch?
I've also got a machine with four K20's idle. |
![]() |
![]() |
![]() |
#189 |
Einyen
Dec 2003
Denmark
2·17·101 Posts |
![]()
You should do some trial factoring on those K20's or even some LL testing. I believe they have decent double precision performance?
You can get trial factoring assignments through GPU72: https://www.gpu72.com |
![]() |
![]() |
![]() |
#190 |
∂2ω=0
Sep 2002
República de California
5·2,351 Posts |
![]() |
![]() |
![]() |
![]() |
#191 |
∂2ω=0
Sep 2002
República de California
5×2,351 Posts |
![]()
I didn't want to do any lengthy runs on the shared-dev KNL so as to not interfere with any timings my fellow developers may be doing, but I do need to do some multihour runs to test various aspects related to optimal configuration for production-mode LL testing. If any other users of the system need to do some unloaded timings, please let me know (or ask David s. to kill my stuff).
Some further notes related to LL-testing, based on my experiments with some DC-exponents, all slightly larger than 40m, i.e. using FFT length 2304K: 1. I got no gain from trying to pin threads to virtual cores with index stagger > 1, e.g. for a 16-threaded run setting affinity to the default cores 0:15 is as fast or faster than anything else I tried. Timings get worse at larger staggers, e.g. using cores 0:255:16 (i.e. cores 0,16,32,...,240) runs only half as fast as 0:15. 2. Running just one job 16-threaded with default affinity (cores 0:15), 'top' shows ~1200% utilization, upping that to 4 such jobs shows each's utilization dropping to ~300%. To make sure this wasn;t some OS quirk I let all 4 same-affinity-set jobs run to the next checkpoint-write and indeed, the timings quadrupled relative to the single-job's 0.0090 sec/iter. 3. At that point I killed jobs 2-4, left #1 running on cores 0:15, and restarted 2-4 on core sets 16:31, 32:47, 48:63, respectively, using the new -cpu affinity-setting option of the dev-branch Mlucas code. Now the utilizations look as hoped-for, each job running at around the same 1200% of just the one 16-thread job. In other words the enhanced affinity-setting user option is mandatory for such multiple-multithreaded jobs. 4. Here is the first checkpoint output line from the first job, running all by itself: [Sep 21 21:07:13] M40****** Iter# = 100000 [ 0.24% complete] [ 0.0090 sec/iter] And here with all 4 jobs running on the above nonoverlapping core 16-tets: [Sep 21 22:36:40] M40****** Iter# = 400000 [ 0.98% complete] [ 0.0091 sec/iter] [Sep 21 22:26:58] M40****** Iter# = 200000 [ 0.49% complete] [ 0.0090 sec/iter] [Sep 21 22:29:49] M40****** Iter# = 200000 [ 0.49% complete] [ 0.0091 sec/iter] [Sep 21 22:30:55] M40****** Iter# = 200000 [ 0.49% complete] [ 0.0092 sec/iter] 5. Next I killed all 4 jobs and restarted with each running 32-threaded, i.e. a total of 128 threads on 64 physical cores - note that a quirk (polite term for 'bug') in the code's options-handling requires the user to supply an exponent matching the one in the topmost line of the local worktodo.ini file in order to also allow the core setting: cd run0 && nice ../Mlucas -m 40****** -cpu 0:31 & cd run1 && nice ../Mlucas -m 40****** -cpu 32:63 & cd run2 && nice ../Mlucas -m 40****** -cpu 64:95 & cd run3 && nice ../Mlucas -m 40****** -cpu 96:127 & 'top' now shows each job running at ~1800%, but the next-checkpoint summary outputs show things running ~5% slower for each job. So reverted to 4 x 16-thread, will let those DCs run overnight as a stability-under-load test. Each run is proceeding at ~10 million iters/day, thus with 4 of them going, getting around one DC-per-day throughput. Need to at least double that! |
![]() |
![]() |
![]() |
#192 |
Serpentine Vermin Jar
Jul 2014
5×677 Posts |
![]()
I'm pinning a lot of hope on AVX-512 helping out a lot... maybe I shouldn't get my hopes up too much, but it will be nice to see some #'s start coming out related to that part of it all. Apples-to-apples, single core, single worker, how does AVX compare to AVX-512; that kind of thing.
|
![]() |
![]() |
![]() |
#193 | |
"Ben"
Feb 2007
7×13×41 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#194 |
Romulan Interpreter
"name field"
Jun 2011
Thailand
10,273 Posts |
![]() |
![]() |
![]() |
![]() |
#195 | |
∂2ω=0
Sep 2002
República de California
5·2,351 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#196 |
"Ben"
Feb 2007
7×13×41 Posts |
![]() |
![]() |
![]() |
![]() |
#197 |
P90 years forever!
Aug 2002
Yeehaw, FL
815010 Posts |
![]()
Do you mind if I stop/resume these every now and then to run a quick timing test?
|
![]() |
![]() |
![]() |
#198 |
∂2ω=0
Sep 2002
República de California
5·2,351 Posts |
![]() |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
LLR development version 3.8.7 is available! | Jean Penné | Software | 39 | 2012-04-27 12:33 |
LLR 3.8.5 Development version | Jean Penné | Software | 6 | 2011-04-28 06:21 |
Do you have a dedicated system for gimps? | Surge | Hardware | 5 | 2010-12-09 04:07 |
Query - Running GIMPS on a 4 way system | Unregistered | Hardware | 6 | 2005-07-04 04:27 |
System tweaks to speed GIMPS | Uncwilly | Software | 46 | 2004-02-05 09:38 |