20110520, 09:27  #1 
Mar 2010
110011011_{2} Posts 
To CPU or to GPU: stage1 polyselect research
As the topic suggests, I did this little research on stage one of polynomial selection. Jason suggests doing only stage 1 on GPU in the latest Msieve NFS manual, instead of combined search.
You will find all the data in the table below(which is actually a picture). Now, here are the numbers used: Code:
c90: 69547303316492118149408022547436991785350544259733986981963408781389041467965565432357913 c100: 776701169123392332156259995503621920821842983186779441829856164218194104805531020576372063083759743 c110: 9852238794222235462151171006209225112694534204051316346901796414666349372723740059709872068581469631350831777 c120: 227026401879198463827454062763506967433946202931364283264305901265796403304094997028726671231101499391384899872706432273 c130: 1142067639655668244421675659299472228892561443039817892634518492620626571274348874167990804267595223355216327568784472374242262183 c140: 2333435938791497921844814519438006187812846866005520174850758490445958886619248051390445682925499819800889470401806186619479812725646985161 Code:
1. 1,10000 2. 1000000,1010000 3. 1000000000,1000010000 Some specs: Code:
CPU: Core 2 Quad Q6600 @ 3.0 Ghz GPU: Nvidia GeForce GTX 480 @ 800/950 Mhz with 270.61 ForceWare OS: W7 x64 with SP1 Code:
CPU  64 bit binary from latest SVN(thanks Jeff!) GPU  64 bit CUDAenabled binary from latest SVN(thanks Brian!) Some conclusions: 1. On C90 and C100, CUDAenabled Msieve performs better than the one which uses the CPU only. 2. Using some kind of script to utilize all available hardware would be the best way to do stage1 of polynomial selection(sort of obvious). 
20110520, 12:06  #2 
Jun 2003
Ottawa, Canada
491_{16} Posts 
Interesting, so even at a C140 level, the GPU version still produces more candidates than the CPU version in the range but takes a lot longer.
When you get up to a C232 (for the RSA768) the GPU version produces many less candidates in the range (50k range at 90M). In this case the GPU and CPU took the same amount of time. GPU: using GPU 1 (Tesla T10 Processor) elapsed time 25:47:14 500 candidates found CPU: using a Core i5 760 elapsed time 25:47:11 11456 candidates found 
20110520, 15:27  #3  
Mar 2010
411_{10} Posts 
Quote:
Intresting, then using even the latest and greatest GPUs for composites like C232 is a bad idea due to poor candidates generation. 

20110520, 17:30  #4 
Tribal Bullet
Oct 2004
110111001110_{2} Posts 
Kleinjung's algorithm for stage 1 poly selection can be expressed as either memorybound or computebound. The memorybound version uses the birthday paradox to become efficient as the problem size goes up, and the computebound version uses GPU hardware to perform a large number of tests at once but the amount of work to do increases quadratically as the problem size increases. For small problems, the memory bound version cannot build a large enough hashtable to become efficient, and the GPU code gets more results.
Incidentally, you may also want to include the number of stage 1 hits per second as a more accurate measure of efficiency, because msieve checks for time limit exceeded less often in the GPU code, so that especially for small problems you spend much more time running the GPU version. You can also just run stage 1 and count the number of lines in the .m file that are produced, to avoid being misled by the large amount of debug output the library currently produces. Running stage 2 will confuse the timings, because the CPU and GPU versions will both switch off at random times to possibly spend quite a while running stage 2. The two versions will not produce the same answers. 
20111110, 03:48  #5 
Sep 2008
Kansas
2^{2}×811 Posts 
What is the root (no pun intended) cause/concern why a polynomial optimization technique can be employed for poly selection? Or is this more of a TSP (Traveling Salesman Problem)?

20111110, 12:05  #6 
Tribal Bullet
Oct 2004
2·3·19·31 Posts 
NFS polynomial selection is divided into two phases; the first is a combinatorial search that is essentially clever brute force, the second involves a range of techniques from applied mathematics, including multivariate minimization. Each of the stages is an active research area, and we're going over techniques for the first stage.
Incidentally, the cuda11_test branch of the msieve SVN repository has a lot of work by jrk that uses GPUs for highperformance sorting, and this makes GPUs drastically better for large problems than they used to be. 
20111111, 04:14  #7 
Sep 2008
Kansas
2^{2}·811 Posts 

20111112, 19:35  #8 
Mar 2010
3·17 Posts 
Curious about the cuda11_test branch in svn. I have the regular msieve149 version running on a
GTX 465 Nvidia card and has been running for about 24 hours. So far that has not really given me much. About 20 poly candidates. So I compiled the cuda11_test brach on linux 64bit and am running that on a GT8800 Nvidia card. That version gave me about 80 candidates in 2 hours with the best shown below. It's way faster. Just curious why this branch is so much faster. [root@msievelinux]# cat number.poly1 n: 8570085955425991975416682393322092672178022622762559843295498103055560639891369587 621880789145959722489390679513189319822121048710982502573214202944915989 # norm 6.077880e15 alpha 6.591518 e 3.228e12 rroots 5 skew: 14813998.57 c0: 271830330224374546052564805914266259840 c1: 928034253578711160424890240906 c2: 26681915219460396213323881 c3: 680113183828316262 c4: 141225440666 c5: 84 Y0: 2521959333809067094313594330127 Y1: 53520081350873741 type: gnfs Last fiddled with by sleigher on 20111112 at 19:36 
20111112, 19:46  #9 
Tribal Bullet
Oct 2004
2×3×19×31 Posts 
jrk has made a great deal of progress improving the GPU code in the last few months. That branch will get folded into the trunk when he feels it's ready.

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Base method of montgomery polyselect implementation  csg  Homework Help  2  20171213 07:04 
Collaborative Research  a1call  Information & Answers  7  20170129 17:55 
Playing with CADO's individual polyselect binaries  Dubslow  CADONFS  22  20160704 19:54 
PhD Research Proposal  pinhodecarlos  Soap Box  2  20120818 22:01 
Research topics  Robert Holmes  Miscellaneous Math  28  20100624 19:09 