![]() |
Weird results of Msieve
I have tested to factor C100 on two machines with GPU:
USE_CUDA = True Command:"..\factMsieve.py example.n" [COLOR="darkred"]1、CPU:i7 3.60GHz RAM:8G GTX 745 elapsed time 7hours[/COLOR] [COLOR="darkred"]2、CPU:E5-2670 2.60GHz RAM:64G Tesla P100-PCIE-16GB elapsed time 16hours[/COLOR] The result confuesd me.It seems that GPU card doesn't work.Why Tesla is much slower? And if only test the first step,the command "./msieve -np1 -nps "stage1norm=xeyy stage2norm=uevv x,y" -t 4 -s gpu0polyfile"got the same result of "./msieve -np1 -nps "stage1norm=xeyy stage2norm=uevv x,y"-g 0 -t 8 -s gpu0polyfile".It seems both GPU and multi threads do not work.How to choose the number of thread?(-t number) |
[QUOTE=jacky;469436]I have tested to factor C100 on two machines with GPU:
USE_CUDA = True Command:"..\factMsieve.py example.n" [COLOR="darkred"]1、CPU:i7 3.60GHz RAM:8G GTX 745 elapsed time 7hours[/COLOR] [COLOR="darkred"]2、CPU:E5-2670 2.60GHz RAM:64G Tesla P100-PCIE-16GB elapsed time 16hours[/COLOR] The result confuesd me.It seems that GPU card doesn't work.Why Tesla is much slower? And if only test the first step,the command "./msieve -np1 -nps "stage1norm=xeyy stage2norm=uevv x,y" -t 4 -s gpu0polyfile"got the same result of "./msieve -np1 -nps "stage1norm=xeyy stage2norm=uevv x,y"-g 0 -t 8 -s gpu0polyfile".It seems both GPU and multi threads do not work.How to choose the number of thread?(-t number)[/QUOTE] Leave off "-nps". Only "-np1" step is GPU-enabled. The "-nps -npr" steps are CPU only and single-threaded only. |
[QUOTE=wombatman;469437]Leave off "-nps". Only "-np1" step is GPU-enabled. The "-nps -npr" steps are CPU only and single-threaded only.[/QUOTE]
Thanks. For the first result,why Tesla is slower if using python to factor.Is it the same reason? |
[QUOTE=jacky;469439]Thanks.
For the first result,why Tesla is slower if using python to factor.Is it the same reason?[/QUOTE] You won't be able to tell if the Tesla is actually slower until you run the "-np1" step by itself. If a given CPU is slower, then running the CPU-bound -nps and -npr steps would make the overall process slower. |
I don't think a GPU is any faster than CPU for a 100-digit input. That job is so small that invoking the GPU at all is a waste of effort; you may not be able to tell which is faster when the GPU works for something like 2 minutes.
GPU-enhanced poly select doesn't become notably more powerful than CPU until 140 digits or so. |
[QUOTE=VBCurtis;469442]I don't think a GPU is any faster than CPU for a 100-digit input. That job is so small that invoking the GPU at all is a waste of effort; you may not be able to tell which is faster when the GPU works for something like 2 minutes.
GPU-enhanced poly select doesn't become notably more powerful than CPU until 140 digits or so.[/QUOTE] Thanks.I seem to have found the problem according to your answer. Now I am testing a 155-digit number for np1 on two machines with the same command"msieve.exe -np1 -g 0 -t 4 -s gpu0polyfile".I want to know if GPU works. Is it right? How many hours it usually takes for np1? |
Now Tesla 100 is 4 times faster than GTX 745.It seems not to have achieved the desired result.I think Tesla should be much faster.Should I give more threads?
|
2 Attachment(s)
Here is the log from my run to generate a poly for a C167, which took 7:19:19. Note I only ran -npr on the best 200 lines output by -np1 -nps. Limiting the range of leading coefficients to search would speed it up.
I've also attached the perl script I ran to generate it. It's designed to work on Linux and requires UNIX utilities sort and tail. But it might work on a UNIX like environment under Windows. You would need to update the paths to resources in it, particularly GPUSORT and GPUPTX. I hope you find it useful. Chris |
[QUOTE=chris2be8;469484]Here is the log from my run to generate a poly for a C167, which took 7:19:19. Note I only ran -npr on the best 200 lines output by -np1 -nps. Limiting the range of leading coefficients to search would speed it up.
I've also attached the perl script I ran to generate it. It's designed to work on Linux and requires UNIX utilities sort and tail. But it might work on a UNIX like environment under Windows. You would need to update the paths to resources in it, particularly GPUSORT and GPUPTX. I hope you find it useful. Chris[/QUOTE] Thank you very much.It's very useful to me.But I have a question,if limiting the range of leading coefficients to search,will it lead to failure in polynomial selection ? |
[QUOTE=jacky;469523]Thank you very much.It's very useful to me.But I have a question,if limiting the range of leading coefficients to search,will it lead to failure in polynomial selection ?[/QUOTE]
Nope. |
That run searched for leading coefficients between 1 and 8000. Look at the LCmin and LCmax variables in the script. Or put them into the .n file as follows: [code]
lcmin: 1 lcmax: 8000 [/code] LCstep will be ignored in a GPU run, it's only useful if you have several systems searching for a poly by CPU. The time taken to search varies with the leading coefficient. Searching from 8000 to 16000 should take less time that 1 to 8000. So you can search a wider range if the coefficients are larger. What range of leading coefficients to search for a given size of number is another issue. I've not had enough experience to offer advice. Chris |
| All times are UTC. The time now is 01:13. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.