The filtering does take a ton of memory and a previous CADO paper showed how to split the merge phase across many threads of a single machine; per the report the filtering machine had 1.5TB of memory.
The other high memory step is the middle part of block Wiedemann, and the more clusters you split the LA across the more memory that middle part uses.
The square root does not need high memory if you use Montgomery's very complex square root algorithm, or you can throw memory at the problem and use the much simpler algorithm that CADO-NFS does.
|