![]() |
![]() |
#1 |
Mar 2010
6338 Posts |
![]()
As the topic suggests, I did this little research on stage one of polynomial selection. Jason suggests doing only stage 1 on GPU in the latest Msieve NFS manual, instead of combined search.
You will find all the data in the table below(which is actually a picture). Now, here are the numbers used: Code:
c90: 69547303316492118149408022547436991785350544259733986981963408781389041467965565432357913 c100: 776701169123392332156259995503621920821842983186779441829856164218194104805531020576372063083759743 c110: 9852238794222235462151171006209225112694534204051316346901796414666349372723740059709872068581469631350831777 c120: 227026401879198463827454062763506967433946202931364283264305901265796403304094997028726671231101499391384899872706432273 c130: 1142067639655668244421675659299472228892561443039817892634518492620626571274348874167990804267595223355216327568784472374242262183 c140: 2333435938791497921844814519438006187812846866005520174850758490445958886619248051390445682925499819800889470401806186619479812725646985161 Code:
1. 1,10000 2. 1000000,1010000 3. 1000000000,1000010000 Some specs: Code:
CPU: Core 2 Quad Q6600 @ 3.0 Ghz GPU: Nvidia GeForce GTX 480 @ 800/950 Mhz with 270.61 ForceWare OS: W7 x64 with SP1 Code:
CPU - 64 bit binary from latest SVN(thanks Jeff!) GPU - 64 bit CUDA-enabled binary from latest SVN(thanks Brian!) Some conclusions: 1. On C90 and C100, CUDA-enabled Msieve performs better than the one which uses the CPU only. 2. Using some kind of script to utilize all available hardware would be the best way to do stage1 of polynomial selection(sort of obvious). |
![]() |
![]() |
![]() |
#2 |
Jun 2003
Ottawa, Canada
3×17×23 Posts |
![]()
Interesting, so even at a C140 level, the GPU version still produces more candidates than the CPU version in the range but takes a lot longer.
When you get up to a C232 (for the RSA-768) the GPU version produces many less candidates in the range (50k range at 90M). In this case the GPU and CPU took the same amount of time. GPU: using GPU 1 (Tesla T10 Processor) elapsed time 25:47:14 500 candidates found CPU: using a Core i5 760 elapsed time 25:47:11 11456 candidates found |
![]() |
![]() |
![]() |
#3 | |
Mar 2010
3×137 Posts |
![]() Quote:
Intresting, then using even the latest and greatest GPUs for composites like C232 is a bad idea due to poor candidates generation. |
|
![]() |
![]() |
![]() |
#4 |
Tribal Bullet
Oct 2004
2·13·137 Posts |
![]()
Kleinjung's algorithm for stage 1 poly selection can be expressed as either memory-bound or compute-bound. The memory-bound version uses the birthday paradox to become efficient as the problem size goes up, and the compute-bound version uses GPU hardware to perform a large number of tests at once but the amount of work to do increases quadratically as the problem size increases. For small problems, the memory bound version cannot build a large enough hashtable to become efficient, and the GPU code gets more results.
Incidentally, you may also want to include the number of stage 1 hits per second as a more accurate measure of efficiency, because msieve checks for time limit exceeded less often in the GPU code, so that especially for small problems you spend much more time running the GPU version. You can also just run stage 1 and count the number of lines in the .m file that are produced, to avoid being misled by the large amount of debug output the library currently produces. Running stage 2 will confuse the timings, because the CPU and GPU versions will both switch off at random times to possibly spend quite a while running stage 2. The two versions will not produce the same answers. |
![]() |
![]() |
![]() |
#5 |
Sep 2008
Kansas
2×33×71 Posts |
![]()
What is the root (no pun intended) cause/concern why a polynomial optimization technique can be employed for poly selection? Or is this more of a TSP (Traveling Salesman Problem)?
|
![]() |
![]() |
![]() |
#6 |
Tribal Bullet
Oct 2004
2×13×137 Posts |
![]()
NFS polynomial selection is divided into two phases; the first is a combinatorial search that is essentially clever brute force, the second involves a range of techniques from applied mathematics, including multivariate minimization. Each of the stages is an active research area, and we're going over techniques for the first stage.
Incidentally, the cuda11_test branch of the msieve SVN repository has a lot of work by jrk that uses GPUs for high-performance sorting, and this makes GPUs drastically better for large problems than they used to be. |
![]() |
![]() |
![]() |
#7 |
Sep 2008
Kansas
383410 Posts |
![]() |
![]() |
![]() |
![]() |
#8 |
Mar 2010
3×17 Posts |
![]()
Curious about the cuda11_test branch in svn. I have the regular msieve-149 version running on a
GTX 465 Nvidia card and has been running for about 24 hours. So far that has not really given me much. About 20 poly candidates. So I compiled the cuda11_test brach on linux 64-bit and am running that on a GT8800 Nvidia card. That version gave me about 80 candidates in 2 hours with the best shown below. It's way faster. Just curious why this branch is so much faster. [root@msieve-linux]# cat number.poly-1 n: 8570085955425991975416682393322092672178022622762559843295498103055560639891369587 621880789145959722489390679513189319822121048710982502573214202944915989 # norm 6.077880e-15 alpha -6.591518 e 3.228e-12 rroots 5 skew: 14813998.57 c0: 271830330224374546052564805914266259840 c1: -928034253578711160424890240906 c2: -26681915219460396213323881 c3: 680113183828316262 c4: 141225440666 c5: 84 Y0: -2521959333809067094313594330127 Y1: 53520081350873741 type: gnfs Last fiddled with by sleigher on 2011-11-12 at 19:36 |
![]() |
![]() |
![]() |
#9 |
Tribal Bullet
Oct 2004
2×13×137 Posts |
![]()
jrk has made a great deal of progress improving the GPU code in the last few months. That branch will get folded into the trunk when he feels it's ready.
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Base method of montgomery polyselect implementation | csg | Homework Help | 2 | 2017-12-13 07:04 |
Collaborative Research | a1call | Information & Answers | 7 | 2017-01-29 17:55 |
Playing with CADO's individual polyselect binaries | Dubslow | CADO-NFS | 22 | 2016-07-04 19:54 |
PhD Research Proposal | pinhodecarlos | Soap Box | 2 | 2012-08-18 22:01 |
Research topics | Robert Holmes | Miscellaneous Math | 28 | 2010-06-24 19:09 |