mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Msieve (https://www.mersenneforum.org/forumdisplay.php?f=83)
-   -   msieve on KNL (https://www.mersenneforum.org/showthread.php?t=21649)

frmky 2016-10-12 00:19

msieve on KNL
 
I've been playing with msieve linear algebra on Knights Landing cpus. Specifically, each compute node has one Intel(R) Xeon Phi(TM) CPU 7250 @ 1.40GHz. This processor has 68 cores in 34 tiles, each with 4 threads, for a total of 272 threads per node.

I compiled msieve with MPI support using icc with the -xMIC-AVX512 option. This worked just fine. I also tried disabling the ASM instructions and using just the C code to see if the compiler would vectorize using AVX-512, but the resultant binary was slightly slower.

Trying out different parameters, I get by far the best performance with one MPI process per tile with 8 threads per process. So with one compute node, the best layout is a 2x17 MPI grid with 8 threads. Here is a table of estimated runtimes on a 42.1M matrix:

[CODE]cores nodes time (hrs)
68 1 444
136 2 233
272 4 131
544 8 83
1088 16 46
2176 32 33
[/CODE]

The last entry uses a 32x34 MPI grid, which is the largest I can use without recompiling and rebuilding the matrix.

Would explicit use of AVX-512 speed up the matmul?

jasonp 2016-10-12 02:33

Probably, the scatter-gather instructions could be useful. Using 512-bit vectors explicitly in block Lanczos may or may not be faster, the vector-vector operations would need hugely more memory for precomputations.

frmky 2016-11-05 23:03

Turns out KNL doesn't like a nearly symmetric grid. In the table above, I had run 544 cores as a 16x17 grid, but instead using an 8x34 grid runs nearly 10% faster. Therefore I have also removed the 2176 core run, which used a 32x34 grid.

[CODE]cores nodes time (hrs)
68 1 444
136 2 233
272 4 131
544 8 76
1088 16 46
2176 32 ??[/CODE]

Currently msieve has a max MPI grid dimension of 35. Is increasing this simply a matter of changing the value in include/common.h, or are there possible overflows or other gotchas to watch out for?

BTW, the last half of the 2,1285- linear algebra was run using the KNL nodes, so it works correctly. :smile:

jasonp 2016-11-06 11:45

I saw, that was awesome. The maximum grid size is just a definition in the code, but also controls the size of a binary file, so once you change the definition you will be binary incompatible with previous savefiles.

(Just change MAX_MPI_GRID_DIM in common.h)


All times are UTC. The time now is 19:40.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.