mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2011-05-20, 09:27   #1
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

6338 Posts
Default To CPU or to GPU: stage1 polyselect research

As the topic suggests, I did this little research on stage one of polynomial selection. Jason suggests doing only stage 1 on GPU in the latest Msieve NFS manual, instead of combined search.

You will find all the data in the table below(which is actually a picture).

Now, here are the numbers used:
Code:
c90: 69547303316492118149408022547436991785350544259733986981963408781389041467965565432357913
c100: 776701169123392332156259995503621920821842983186779441829856164218194104805531020576372063083759743
c110: 9852238794222235462151171006209225112694534204051316346901796414666349372723740059709872068581469631350831777
c120: 227026401879198463827454062763506967433946202931364283264305901265796403304094997028726671231101499391384899872706432273
c130: 1142067639655668244421675659299472228892561443039817892634518492620626571274348874167990804267595223355216327568784472374242262183
c140: 2333435938791497921844814519438006187812846866005520174850758490445958886619248051390445682925499819800889470401806186619479812725646985161
And the ranges for all of them:
Code:
1. 1,10000
2. 1000000,1010000
3. 1000000000,1000010000
Now, apart from measuring the running time, I also measured how much strings(lines) were produced by searching all 3 ranges on a certain number, for both GPU and CPU, in msieve.dat.m file. The more the better. I did a sort -u before counting.

Some specs:
Code:
CPU: Core 2 Quad Q6600 @ 3.0 Ghz
GPU: Nvidia GeForce GTX 480 @ 800/950 Mhz with 270.61 ForceWare
OS: W7 x64 with SP1
Msieve versions used:
Code:
CPU - 64 bit binary from latest SVN(thanks Jeff!)
GPU - 64 bit CUDA-enabled binary from latest SVN(thanks Brian!)
I used timethis from windows 2000 resource kit to measure the running time of binaries for each range.


Some conclusions:
1. On C90 and C100, CUDA-enabled Msieve performs better than the one which uses the CPU only.
2. Using some kind of script to utilize all available hardware would be the best way to do stage1 of polynomial selection(sort of obvious).
Attached Thumbnails
Click image for larger version

Name:	research.png
Views:	298
Size:	31.7 KB
ID:	6621  
Karl M Johnson is offline   Reply With Quote
Old 2011-05-20, 12:06   #2
Jeff Gilchrist
 
Jeff Gilchrist's Avatar
 
Jun 2003
Ottawa, Canada

3×17×23 Posts
Default

Interesting, so even at a C140 level, the GPU version still produces more candidates than the CPU version in the range but takes a lot longer.

When you get up to a C232 (for the RSA-768) the GPU version produces many less candidates in the range (50k range at 90M). In this case the GPU and CPU took the same amount of time.

GPU:
using GPU 1 (Tesla T10 Processor)
elapsed time 25:47:14
500 candidates found

CPU:
using a Core i5 760
elapsed time 25:47:11
11456 candidates found
Jeff Gilchrist is offline   Reply With Quote
Old 2011-05-20, 15:27   #3
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

3×137 Posts
Default

Quote:
Originally Posted by Jeff Gilchrist View Post
When you get up to a C232 (for the RSA-768) the GPU version produces many less candidates in the range (50k range at 90M).
I wonder why.

Intresting, then using even the latest and greatest GPUs for composites like C232 is a bad idea due to poor candidates generation.
Karl M Johnson is offline   Reply With Quote
Old 2011-05-20, 17:30   #4
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

2·13·137 Posts
Default

Kleinjung's algorithm for stage 1 poly selection can be expressed as either memory-bound or compute-bound. The memory-bound version uses the birthday paradox to become efficient as the problem size goes up, and the compute-bound version uses GPU hardware to perform a large number of tests at once but the amount of work to do increases quadratically as the problem size increases. For small problems, the memory bound version cannot build a large enough hashtable to become efficient, and the GPU code gets more results.

Incidentally, you may also want to include the number of stage 1 hits per second as a more accurate measure of efficiency, because msieve checks for time limit exceeded less often in the GPU code, so that especially for small problems you spend much more time running the GPU version. You can also just run stage 1 and count the number of lines in the .m file that are produced, to avoid being misled by the large amount of debug output the library currently produces. Running stage 2 will confuse the timings, because the CPU and GPU versions will both switch off at random times to possibly spend quite a while running stage 2. The two versions will not produce the same answers.
jasonp is offline   Reply With Quote
Old 2011-11-10, 03:48   #5
RichD
 
RichD's Avatar
 
Sep 2008
Kansas

2×33×71 Posts
Default

What is the root (no pun intended) cause/concern why a polynomial optimization technique can be employed for poly selection? Or is this more of a TSP (Traveling Salesman Problem)?
RichD is offline   Reply With Quote
Old 2011-11-10, 12:05   #6
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

2×13×137 Posts
Default

NFS polynomial selection is divided into two phases; the first is a combinatorial search that is essentially clever brute force, the second involves a range of techniques from applied mathematics, including multivariate minimization. Each of the stages is an active research area, and we're going over techniques for the first stage.

Incidentally, the cuda11_test branch of the msieve SVN repository has a lot of work by jrk that uses GPUs for high-performance sorting, and this makes GPUs drastically better for large problems than they used to be.
jasonp is offline   Reply With Quote
Old 2011-11-11, 04:14   #7
RichD
 
RichD's Avatar
 
Sep 2008
Kansas

383410 Posts
Default

Quote:
Originally Posted by jasonp View Post
... the first is a combinatorial search that is essentially clever brute force, ...
That would have been my follow-up question. Thanks for your reply.

Now, back to the drawing thinking board. :-)
Oh, what's the use...
RichD is offline   Reply With Quote
Old 2011-11-12, 19:35   #8
sleigher
 
Mar 2010

3×17 Posts
Default

Curious about the cuda11_test branch in svn. I have the regular msieve-149 version running on a
GTX 465 Nvidia card and has been running for about 24 hours. So far that has not really given me much.
About 20 poly candidates.

So I compiled the cuda11_test brach on linux 64-bit and am running that on a GT8800 Nvidia card.
That version gave me about 80 candidates in 2 hours with the best shown below. It's way faster.

Just curious why this branch is so much faster.


[root@msieve-linux]# cat number.poly-1
n: 8570085955425991975416682393322092672178022622762559843295498103055560639891369587
621880789145959722489390679513189319822121048710982502573214202944915989
# norm 6.077880e-15 alpha -6.591518 e 3.228e-12 rroots 5
skew: 14813998.57
c0: 271830330224374546052564805914266259840
c1: -928034253578711160424890240906
c2: -26681915219460396213323881
c3: 680113183828316262
c4: 141225440666
c5: 84
Y0: -2521959333809067094313594330127
Y1: 53520081350873741
type: gnfs

Last fiddled with by sleigher on 2011-11-12 at 19:36
sleigher is offline   Reply With Quote
Old 2011-11-12, 19:46   #9
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

2×13×137 Posts
Default

jrk has made a great deal of progress improving the GPU code in the last few months. That branch will get folded into the trunk when he feels it's ready.
jasonp is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Base method of montgomery polyselect implementation csg Homework Help 2 2017-12-13 07:04
Collaborative Research a1call Information & Answers 7 2017-01-29 17:55
Playing with CADO's individual polyselect binaries Dubslow CADO-NFS 22 2016-07-04 19:54
PhD Research Proposal pinhodecarlos Soap Box 2 2012-08-18 22:01
Research topics Robert Holmes Miscellaneous Math 28 2010-06-24 19:09

All times are UTC. The time now is 23:53.


Thu Mar 30 23:53:55 UTC 2023 up 224 days, 21:22, 1 user, load averages: 0.75, 0.88, 0.82

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔