![]() |
478__891_13m1 factored
1 Attachment(s)
[QUOTE=richs;584027]Taking 478__891_13m1[/QUOTE]
[CODE]p76 factor: 3431470368665622242184217103591598096099932327907456020495877029935727723147 p82 factor: 7615720048137620822959331660523770456031672477991612884633860118579072277536159943[/CODE] Approximately 6 hours on 6 threads of a Core i7-10510U with 12 GB memory for a 2.67M matrix at TD=100. Log attached and at [URL="https://pastebin.com/ursm0Hmb"]https://pastebin.com/ursm0Hmb[/URL] Factors added to factordb. |
[QUOTE=RichD;584512]Pretty impressive. Do you know how big of a matrix can be solved on that 16G card?[/QUOTE]
It's the 32 GB version, but we need to store both the matrix and its transpose on the card in CSR format. Otherwise random reads from global memory kill performance. So we can go up to about 25M matrices, probably a little larger, on a single card. Larger matrices can be divided across multiple GPU's with MPI. Currently, though, we have to transfer vectors off and back on the GPU for the MPI comms multiple times in each iteration, which introduces a large performance hit. I've added support for CUDA-aware MPI but OpenMPI still transfers off the card for collective reductions. MVAPICH2-GDR I think supports collective reductions on the card, but it's still being tested on the SDSC Expanse cluster. Hopefully that will be working in a few weeks. But for now quick tests show a 43M matrix on two cards in ~70 hours and an 84M matrix on four cards in ~350 hours. |
275__471_7m1 is factored and posted. The 4.9M matrix took 37 minutes to solve on a V100.
|
| All times are UTC. The time now is 08:25. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.