mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Msieve (https://www.mersenneforum.org/forumdisplay.php?f=83)
-   -   Msieve with GNFS support (https://www.mersenneforum.org/showthread.php?t=5413)

frmky 2007-10-31 17:00

[QUOTE=frmky;117431]
Overnight, I am running the matrix with a single thread to approximate how long it would take to complete.
[/QUOTE]

The 6.4M matrix would have taken 7.7 days to complete using a single thread. For matrices of this size on Xeon 5160's, using 4 threads is 1.7x faster. For smaller matrices about 1.5M in size, I measured earlier on the same processors that 4 threads are about 2.5x faster.

Greg

jasonp 2007-10-31 17:18

[QUOTE=frmky;117457]The 6.4M matrix would have taken 7.7 days to complete using a single thread. For matrices of this size on Xeon 5160's, using 4 threads is 1.7x faster. For smaller matrices about 1.5M in size, I measured earlier on the same processors that 4 threads are about 2.5x faster.
[/QUOTE]
Hmm, my guess is that it's harder to run from cache when the matrix is this big. I guess it's time for another poll: what needs to be improved the most right now? For the largest jobs, the filtering, linear algebra and square root all take about the same amount of memory, i.e. too much. I think they all take too long as well.

frmky 2007-10-31 18:11

[QUOTE=jasonp;117459]what needs to be improved the most right now?[/QUOTE]

I'm not worried about the memory use. If you're doing jobs [I]this[/I] big, I don't think it's unreasonable to require the computer to have 4 GB of memory. Any linear algebra speed-ups would have the largest affect on overall runtime, so it would seem to have the biggest payoff. Having said that, after running the linear algebra for 4.5 days, waiting two hours for each square root, hoping it finds nontrivial factors, is excruciating! :smile: Because of the memory required, running multiple square root runs in parallel on the same machine isn't possible. How difficult would it be to thread the FFT? I know FFTW supports it, but that doesn't make it easy and I understand your decision not to use FFTW because of the licensing.

Greg

jasonp 2007-10-31 20:43

[QUOTE=frmky;117461]I'm not worried about the memory use. If you're doing jobs [I]this[/I] big, I don't think it's unreasonable to require the computer to have 4 GB of memory. Any linear algebra speed-ups would have the largest affect on overall runtime, so it would seem to have the biggest payoff. Having said that, after running the linear algebra for 4.5 days, waiting two hours for each square root, hoping it finds nontrivial factors, is excruciating! :smile: Because of the memory required, running multiple square root runs in parallel on the same machine isn't possible. How difficult would it be to thread the FFT? I know FFTW supports it, but that doesn't make it easy and I understand your decision not to use FFTW because of the licensing.
[/QUOTE]
2-way parallelism for the FFT is easy, and other optimizations can make it take less time too. Manually swapping the large numbers to disk can also reduce much of the need for gigs of memory, but that's a lot more work. I'll think about it.

fivemack 2007-11-01 00:18

[QUOTE=jasonp;117459]Hmm, my guess is that it's harder to run from cache when the matrix is this big. I guess it's time for another poll: what needs to be improved the most right now? For the largest jobs, the filtering, linear algebra and square root all take about the same amount of memory, i.e. too much. I think they all take too long as well.[/QUOTE]

I think they take perfectly reasonable amounts of both memory and time, though it's a bit of a pity to get only a 1.7x speed increase when using four threads; if it's taken several CPU-months to collect the relations, a CPU-week on a 4G 64-bit dual-core to process them is really not a concern.

fivemack 2007-11-01 00:21

[QUOTE=frmky;117431]Hi all,

As Richard posted in the NFSNet Discussion subforum, msieve 1.28 was used to complete the postprocessing of the NFSNet factorization of 5,323-. This one was a bit different in that a total of 87.9M relations were collected, but the CWI tools were used to initially filter the data, removing singletons and cliques to the point that there were 3.4M excess for ideals > 10M. The resultant 26M relations were fed to msieve to complete the filtering.
[/QUOTE]

Was the two-stage process needed because msieve choked when fed all 87.9M relations? I have had one previous example where msieve became much happier once I'd manually deduplicated the input and removed cases where a prime appeared only once on some side.

frmky 2007-11-01 00:40

[QUOTE=fivemack;117497]Was the two-stage process needed because msieve choked when fed all 87.9M relations? [/QUOTE]
Nope. It was done merely to reduce the number of relations that needed to be transferred and converted from CWI to GGNFS format by 70%.

Greg

jbristow 2007-11-11 02:44

If I use the msieve Win32 binary in 64-bit Vista will it be able to use 4GB of memory or will it be limited to 2GB? (I don't know too much about this, but I read about a "large-address aware" option that can be selected at compile time.)

Would I be losing a lot of performance using the Win32 binary instead of trying to compile it myself in a 64-bit environment?

Thanks.

jasonp 2007-11-11 15:44

[QUOTE=jbristow;118195]If I use the msieve Win32 binary in 64-bit Vista will it be able to use 4GB of memory or will it be limited to 2GB? (I don't know too much about this, but I read about a "large-address aware" option that can be selected at compile time.)

Would I be losing a lot of performance using the Win32 binary instead of trying to compile it myself in a 64-bit environment?
[/QUOTE]
Microsoft's compiler does have such an option. I don't know about gcc for 64-bit windows; supposedly there's a win64 gcc cross-compiler available for use with MinGW.

Brian Gladman is the primary maintainer for the MSVC project usable for building msieve. It should work fine, though I doubt the resulting binary is much faster than the 32-bit version. Greg reports that 64-bit linear algebra is faster.

Also, I don't know for sure but I would think that 32-bit binaries in 64-bit Vista would still be limited to 2GB of memory, since the registers used for pointers are still 32-bits in size, and array offsets are still signed integers 32 bits in size. 32-bit linux binaries should be limited to 3GB of virtual address space because the OS by default only takes the top 1GB of every process

Shiva 2007-11-13 08:07

Linear algebra checkpoints for smaller jobs
 
Hi Jason,

I'm thinking of modifying my copy of msieve so that it does linear algebra checkpointing on NFS jobs that are too "small" for the checkpointing currently. It looks like the changes to the source should be straightforward, but I'm wondering if there are any pitfalls I should watch for, other than not wasting too much time doing the checkpoints. Is there something about smaller matrices that prevents the checkpoints from working?

Thanks for all msieve work!

Dennis

jasonp 2007-11-13 14:40

[QUOTE=Shiva;118345]
I'm thinking of modifying my copy of msieve so that it does linear algebra checkpointing on NFS jobs that are too "small" for the checkpointing currently. It looks like the changes to the source should be straightforward, but I'm wondering if there are any pitfalls I should watch for, other than not wasting too much time doing the checkpoints. Is there something about smaller matrices that prevents the checkpoints from working?
[/QUOTE]
Nothing I can see; just modify common/lanczos/lanczos.c starting around line 1318.

One thing to watch out for is that the linear algebra has to do at least 4 iterations before checkpointing becomes possible (the code enforces this automatically). This means the input linear system has to have dimension > 256 or so.


All times are UTC. The time now is 15:29.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.