![]() |
msieve on KNL
I've been playing with msieve linear algebra on Knights Landing cpus. Specifically, each compute node has one Intel(R) Xeon Phi(TM) CPU 7250 @ 1.40GHz. This processor has 68 cores in 34 tiles, each with 4 threads, for a total of 272 threads per node.
I compiled msieve with MPI support using icc with the -xMIC-AVX512 option. This worked just fine. I also tried disabling the ASM instructions and using just the C code to see if the compiler would vectorize using AVX-512, but the resultant binary was slightly slower. Trying out different parameters, I get by far the best performance with one MPI process per tile with 8 threads per process. So with one compute node, the best layout is a 2x17 MPI grid with 8 threads. Here is a table of estimated runtimes on a 42.1M matrix: [CODE]cores nodes time (hrs) 68 1 444 136 2 233 272 4 131 544 8 83 1088 16 46 2176 32 33 [/CODE] The last entry uses a 32x34 MPI grid, which is the largest I can use without recompiling and rebuilding the matrix. Would explicit use of AVX-512 speed up the matmul? |
Probably, the scatter-gather instructions could be useful. Using 512-bit vectors explicitly in block Lanczos may or may not be faster, the vector-vector operations would need hugely more memory for precomputations.
|
Turns out KNL doesn't like a nearly symmetric grid. In the table above, I had run 544 cores as a 16x17 grid, but instead using an 8x34 grid runs nearly 10% faster. Therefore I have also removed the 2176 core run, which used a 32x34 grid.
[CODE]cores nodes time (hrs) 68 1 444 136 2 233 272 4 131 544 8 76 1088 16 46 2176 32 ??[/CODE] Currently msieve has a max MPI grid dimension of 35. Is increasing this simply a matter of changing the value in include/common.h, or are there possible overflows or other gotchas to watch out for? BTW, the last half of the 2,1285- linear algebra was run using the KNL nodes, so it works correctly. :smile: |
I saw, that was awesome. The maximum grid size is just a definition in the code, but also controls the size of a binary file, so once you change the definition you will be binary incompatible with previous savefiles.
(Just change MAX_MPI_GRID_DIM in common.h) |
| All times are UTC. The time now is 00:49. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.