mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Msieve (https://www.mersenneforum.org/forumdisplay.php?f=83)
-   -   Weird results of Msieve (https://www.mersenneforum.org/showthread.php?t=22631)

jacky 2017-10-09 03:00

Weird results of Msieve
 
I have tested to factor C100 on two machines with GPU:

USE_CUDA = True

Command:"..\factMsieve.py example.n"

[COLOR="darkred"]1、CPU:i7 3.60GHz RAM:8G GTX 745 elapsed time 7hours[/COLOR]

[COLOR="darkred"]2、CPU:E5-2670 2.60GHz RAM:64G Tesla P100-PCIE-16GB elapsed time 16hours[/COLOR]


The result confuesd me.It seems that GPU card doesn't work.Why Tesla is much slower?

And if only test the first step,the command "./msieve -np1 -nps "stage1norm=xeyy stage2norm=uevv x,y" -t 4 -s gpu0polyfile"got the same result of "./msieve -np1 -nps "stage1norm=xeyy stage2norm=uevv x,y"-g 0 -t 8 -s gpu0polyfile".It seems both GPU and multi threads do not work.How to choose the number of thread?(-t number)

wombatman 2017-10-09 03:04

[QUOTE=jacky;469436]I have tested to factor C100 on two machines with GPU:

USE_CUDA = True

Command:"..\factMsieve.py example.n"

[COLOR="darkred"]1、CPU:i7 3.60GHz RAM:8G GTX 745 elapsed time 7hours[/COLOR]

[COLOR="darkred"]2、CPU:E5-2670 2.60GHz RAM:64G Tesla P100-PCIE-16GB elapsed time 16hours[/COLOR]


The result confuesd me.It seems that GPU card doesn't work.Why Tesla is much slower?

And if only test the first step,the command "./msieve -np1 -nps "stage1norm=xeyy stage2norm=uevv x,y" -t 4 -s gpu0polyfile"got the same result of "./msieve -np1 -nps "stage1norm=xeyy stage2norm=uevv x,y"-g 0 -t 8 -s gpu0polyfile".It seems both GPU and multi threads do not work.How to choose the number of thread?(-t number)[/QUOTE]

Leave off "-nps". Only "-np1" step is GPU-enabled. The "-nps -npr" steps are CPU only and single-threaded only.

jacky 2017-10-09 03:21

[QUOTE=wombatman;469437]Leave off "-nps". Only "-np1" step is GPU-enabled. The "-nps -npr" steps are CPU only and single-threaded only.[/QUOTE]

Thanks.
For the first result,why Tesla is slower if using python to factor.Is it the same reason?

wombatman 2017-10-09 04:07

[QUOTE=jacky;469439]Thanks.
For the first result,why Tesla is slower if using python to factor.Is it the same reason?[/QUOTE]

You won't be able to tell if the Tesla is actually slower until you run the "-np1" step by itself. If a given CPU is slower, then running the CPU-bound -nps and -npr steps would make the overall process slower.

VBCurtis 2017-10-09 04:18

I don't think a GPU is any faster than CPU for a 100-digit input. That job is so small that invoking the GPU at all is a waste of effort; you may not be able to tell which is faster when the GPU works for something like 2 minutes.

GPU-enhanced poly select doesn't become notably more powerful than CPU until 140 digits or so.

jacky 2017-10-09 06:57

[QUOTE=VBCurtis;469442]I don't think a GPU is any faster than CPU for a 100-digit input. That job is so small that invoking the GPU at all is a waste of effort; you may not be able to tell which is faster when the GPU works for something like 2 minutes.

GPU-enhanced poly select doesn't become notably more powerful than CPU until 140 digits or so.[/QUOTE]

Thanks.I seem to have found the problem according to your answer.
Now I am testing a 155-digit number for np1 on two machines with the same command"msieve.exe -np1 -g 0 -t 4 -s gpu0polyfile".I want to know if GPU works.
Is it right? How many hours it usually takes for np1?

jacky 2017-10-09 07:27

Now Tesla 100 is 4 times faster than GTX 745.It seems not to have achieved the desired result.I think Tesla should be much faster.Should I give more threads?

chris2be8 2017-10-09 16:08

2 Attachment(s)
Here is the log from my run to generate a poly for a C167, which took 7:19:19. Note I only ran -npr on the best 200 lines output by -np1 -nps. Limiting the range of leading coefficients to search would speed it up.

I've also attached the perl script I ran to generate it. It's designed to work on Linux and requires UNIX utilities sort and tail. But it might work on a UNIX like environment under Windows. You would need to update the paths to resources in it, particularly GPUSORT and GPUPTX.

I hope you find it useful.

Chris

jacky 2017-10-10 01:41

[QUOTE=chris2be8;469484]Here is the log from my run to generate a poly for a C167, which took 7:19:19. Note I only ran -npr on the best 200 lines output by -np1 -nps. Limiting the range of leading coefficients to search would speed it up.

I've also attached the perl script I ran to generate it. It's designed to work on Linux and requires UNIX utilities sort and tail. But it might work on a UNIX like environment under Windows. You would need to update the paths to resources in it, particularly GPUSORT and GPUPTX.

I hope you find it useful.

Chris[/QUOTE]

Thank you very much.It's very useful to me.But I have a question,if limiting the range of leading coefficients to search,will it lead to failure in polynomial selection ?

VBCurtis 2017-10-10 05:47

[QUOTE=jacky;469523]Thank you very much.It's very useful to me.But I have a question,if limiting the range of leading coefficients to search,will it lead to failure in polynomial selection ?[/QUOTE]

Nope.

chris2be8 2017-10-10 16:09

That run searched for leading coefficients between 1 and 8000. Look at the LCmin and LCmax variables in the script. Or put them into the .n file as follows: [code]
lcmin: 1
lcmax: 8000
[/code]
LCstep will be ignored in a GPU run, it's only useful if you have several systems searching for a poly by CPU.

The time taken to search varies with the leading coefficient. Searching from 8000 to 16000 should take less time that 1 to 8000. So you can search a wider range if the coefficients are larger.

What range of leading coefficients to search for a given size of number is another issue. I've not had enough experience to offer advice.

Chris


All times are UTC. The time now is 01:13.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.