mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Twin Prime Search (https://www.mersenneforum.org/forumdisplay.php?f=65)
-   -   TPSieve CUDA Testing Thread (https://www.mersenneforum.org/showthread.php?t=13835)

Ken_g6 2010-09-04 21:37

TPSieve CUDA Testing Thread
 
You asked for it, and I've finally made it. Download TPSieve-CUDA [url=https://sites.google.com/site/kenscode/prime-programs/tpsieve-cuda.zip?attredirects=0]here[/url]. :smile:

I haven't done extensive testing on twin primes, so probably somebody should go over a short range (100G? Perhaps 1T?) with TPSieve-CUDA to make sure it gets the same factors.

I hope it works well for everyone!

Karl M Johnson 2010-09-04 22:18

Could you please specify an example to run? I see it needed cudart static linking - that file comes with cuda sdk. Was it compiled with cuda toolkit 3.1 ?

Ken_g6 2010-09-04 22:25

OK. Supposing you downloaded the 480000-484999_30aug2010.txt sieve file, if you run:

./tpsieve-cuda-x86_64-linux -i 480000-484999_30aug2010.txt -p 710005180000000 -P 710005200000000

It should output:
710005185071411 | 5012115*2^481782+1
710005192340203 | 4018161*2^483419-1

very quickly. (I tested this on the emulator, so it runs really slow for me!) Expand the range, and you should get more of [url=http://www.sendspace.com/file/4frdyp]Mdettweiler's results[/url].

./tpsieve-cuda-x86_64-linux -i 480000-484999_30aug2010.txt -p 710T -P 715T

would produce all of them, for instance.

Edit: Compiled with the 2.3 toolkit. One place to get the appropriate libcudart.so would be [url=http://www.primegrid.com/download/libcudart.so.2.32bit]here[/url] or [url=http://www.primegrid.com/download/libcudart.so.2.64bit]here[/url].

Karl M Johnson 2010-09-05 06:50

:no:
Where do I get that fancy file?
I dont need that libcudart.so file, I'm on windows.

Karl M Johnson 2010-09-05 08:47

Ah, I got it working.
Found that fancy file from [URL="http://mersenneforum.org/showthread.php?t=12260"]this thread[/URL]
Here's the output:
[code]
tpsieve-cuda>tpsieve-cuda-x86-windows.exe -i 480000-484999_30aug2010.txt -p 710T -P 715T
tpsieve version cuda-0.1.5b (testing)
Found K's from 3 to 9999999.
Found N's from 480000 to 484999.
nstart=480000, nstep=27, gpu_nstep=27
Read 18013513 terms from NewPGen format input file `480000-484999_30aug2010.txt'
ppsieve initialized: 3 <= k <= 9999999, 480000 <= n <= 484999
Sieve started: 710000000000000 <= p < 715000000000000
Thread 0 starting
Detected GPU 0: GeForce GTX 285
Detected compute capability: 1.3
Detected 30 multiprocessors.
710001064441429 | 1473435*2^480477+1
710001781836203 | 3090555*2^482969+1
710002017069043 | 1947711*2^484889-1
710002639870109 | 7153191*2^483771+1
710003699276149 | 5489211*2^481645-1
710004831474721 | 9156609*2^482469+1
710005185071411 | 5012115*2^481782+1
710005192340203 | 4018161*2^483419-1
710005390472317 | 3240861*2^484861+1
710005916032213 | 5469669*2^482131+1
710006212449883 | 9438471*2^480253+1
710006478541837 | 942801*2^484681-1
p=710007273971713, 121.2M p/sec, 0.34 CPU cores, 0.1% done. ETA 05 Sep 23:11
710007380861971 | 3067731*2^482247-1
710007392845019 | 7483995*2^483443-1
710007480582299 | 1724049*2^480073-1
710008202353481 | 5813421*2^481371-1
710008811001043 | 9322383*2^480292-1
710008912579171 | 6024705*2^482149-1
710009562402587 | 5037609*2^482129-1
710010162887723 | 6614673*2^481762+1
710010987465557 | 6749691*2^483663+1
710011016356171 | 1349535*2^480408-1
710011368918931 | 7281273*2^482722-1
710011521417881 | 8617299*2^483945+1
710013019046899 | 3562503*2^481238-1
710013536554247 | 2683773*2^482840-1
p=710013762297857, 108.1M p/sec, 0.46 CPU cores, 0.3% done. ETA 05 Sep 23:50
710013880633081 | 4357815*2^480333+1
710013961546411 | 6488649*2^484015+1
710014319798129 | 1676877*2^480670-1
710014611723727 | 3195591*2^483289+1
710015165703751 | 1844445*2^483863+1
710016591664817 | 2155857*2^482100+1
710017445315627 | 9930375*2^480732-1
710017473222427 | 8642289*2^480555+1
710018153777579 | 5008965*2^484938-1
710018465445529 | 9185721*2^480167+1
p=710019807338497, 100.7M p/sec, 0.50 CPU cores, 0.4% done. ETA 06 Sep 00:21
710020260919457 | 1584663*2^483746-1
[/code]Now, will compiling x64 win binaries cause trouble?
I've had a "out of memory" error, even though I had like 1GB out of 4 free.


P.S.
It's not using GPU completely.
Peak GPU usage is reported 40%.
But I guess you already know that ?

amphoria 2010-09-05 09:16

I tried on a GTX465 on 64-bit linux using a range I had already tested so that I could compare the results. However I didn't get very far before getting an error.

[QUOTE]./tpsieve-cuda-x86_64-linux -i 480000-484999_19jun2010.txt -p 510T -P 515T
tpsieve version cuda-0.1.5b (testing)
Compiled Sep 4 2010 with GCC 4.3.3
Found K's from 3 to 9999999.
Found N's from 480000 to 484999.
nstart=480000, nstep=26, gpu_nstep=26
Read 18977477 terms from NewPGen format input file `480000-484999_19jun2010.txt'
ppsieve initialized: 3 <= k <= 9999999, 480000 <= n <= 484999
Sieve started: 510000000000000 <= p < 515000000000000
Thread 0 starting
Detected GPU 0: GeForce GTX 465
Detected compute capability: 2.0
Detected 11 multiprocessors.
510000064759291 | 604839*2^481707-1
510000994356869 | 2198475*2^482446+1
510001808585051 | 6049827*2^482948+1
510001965458981 | 9867039*2^480087-1
510002179900517 | 3334131*2^481253+1
510002930897567 | 8814495*2^481041+1
510003018137897 | 7665489*2^480401-1
510003129240001 | 4959981*2^480291+1
510003356427241 | 2391561*2^483615-1
510003644411923 | 7580307*2^484486-1
510003728553343 | 8313309*2^482255-1
510003886955161 | 3607413*2^482256-1
510004210312339 | 5073345*2^483515-1
Cuda error: cudaStreamCreate: out of memory
[/QUOTE]

Karl M Johnson 2010-09-05 09:27

[B]Amorphia[/B], that's exactly the same error I had on x86 windows.
Here it pops again:
[code]
tpsieve-cuda-x86-windows.exe -i 480000-484999_30aug2010.txt -p 900T -P 901T
tpsieve version cuda-0.1.5b (testing)
Found K's from 3 to 9999999.
Found N's from 480000 to 484999.
nstart=480000, nstep=27, gpu_nstep=27
Read 18013513 terms from NewPGen format input file `480000-484999_30aug2010.txt'
ppsieve initialized: 3 <= k <= 9999999, 480000 <= n <= 484999
Sieve started: 900000000000000 <= p < 901000000000000
Thread 0 starting
Detected GPU 0: GeForce GTX 285
Detected compute capability: 1.3
Detected 30 multiprocessors.
900000899028509 | 3182751*2^483513-1
900001860603749 | 9998469*2^481563+1
900001934059139 | 1853133*2^482022-1
900002540407273 | 8064075*2^482811+1
900002726446853 | 5749455*2^480565-1
900003355059173 | 3695019*2^484373-1
900003556591063 | 9754467*2^480376-1
900003917464219 | 7522179*2^481393-1
900004723972547 | 5306133*2^484306+1
900005287423111 | 6879159*2^482887-1
900007745466833 | 451935*2^481504+1
900009608245457 | 5425383*2^480786+1
p=900010489954305, 87.42M p/sec, 0.50 CPU cores, 1.0% done. ETA 05 Sep 15:17
900010638873601 | 378417*2^481830-1
900011291258897 | 6507645*2^482813+1
900011626245037 | 1340685*2^481238+1
900012104179271 | 645705*2^484085-1
900016125631741 | 2968161*2^483961+1
900016501038581 | 8124711*2^484951+1
900016817068751 | 75363*2^484216+1
900017662186813 | 525711*2^480789+1
900018252281867 | 6892521*2^484727-1
900020059012663 | 8598285*2^481068+1
900021150322181 | 4615461*2^482939+1
900021336561331 | 8746389*2^484435+1
900021361527311 | 3408945*2^482966-1
p=900021998075905, 95.90M p/sec, 0.54 CPU cores, 2.2% done. ETA 05 Sep 15:08
900022619382521 | 6958245*2^482800-1
900022833913493 | 6580995*2^483721+1
900022917366103 | 4560555*2^482723-1
900023322448907 | 1472211*2^480431-1
900024288211007 | 4808679*2^480371+1
900026935242913 | 3056079*2^482407-1
900028117404131 | 5600343*2^481214-1
900029413721059 | 815793*2^483818-1
900029829750299 | 4354917*2^483802-1
900030812047093 | 5639913*2^483592+1
p=900033017561089, 91.82M p/sec, 0.53 CPU cores, 3.3% done. ETA 05 Sep 15:08
900033789053611 | 7495449*2^481653-1
900034220560883 | 9094419*2^483205-1
900034570657763 | 9890505*2^480175-1
900035606160989 | 5867385*2^481157-1
900037077781057 | 8390829*2^481741+1
900037229605601 | 1863285*2^484553-1
900038990324497 | 3815157*2^482054+1
900040739108881 | 3513243*2^482350+1
900041542191221 | 6049533*2^482774-1
900042730035877 | 9304977*2^481916+1
900043309201403 | 136581*2^482397+1
p=900044056969217, 91.99M p/sec, 0.55 CPU cores, 4.4% done. ETA 05 Sep 15:08
900044321638183 | 8388129*2^484645-1
900044489973593 | 8240649*2^483659+1
900044550938063 | 7226823*2^484696-1
900045358508729 | 7763775*2^483076-1
900047216136989 | 2338305*2^482753-1
900047780897267 | 4008369*2^483695+1
900048470300299 | 2963115*2^481453-1
900048762025013 | 383355*2^480270-1
900049228276043 | 8622855*2^483971-1
900049467999349 | 660627*2^481816-1
900049796295679 | 2937537*2^483980-1
900052042582919 | 385575*2^484714+1
900052572323899 | 7711221*2^484603+1
900053267475361 | 7173609*2^483949+1
900053714040401 | 633879*2^480079+1
900053996550817 | 6894867*2^480856-1
p=900055633248257, 96.46M p/sec, 0.54 CPU cores, 5.6% done. ETA 05 Sep 15:06
900055866972487 | 6849789*2^483481-1
900056741014807 | 2245995*2^482732-1
900056770768759 | 814365*2^482000-1
900057523274303 | 642045*2^480196+1
900057941699027 | 3370071*2^480999+1
900058480102739 | 9883737*2^484374+1
900060511991023 | 8680035*2^484611-1
900060730024969 | 7366341*2^482195+1
900060738679177 | 1099155*2^483395-1
900063136569923 | 6597225*2^483763-1
900063669798383 | 5873829*2^481137-1
900064551341591 | 9219153*2^483872+1
900064734779653 | 7558803*2^483916-1
900065290605601 | 7338225*2^482126-1
900065587257671 | 7356405*2^481242-1
900065728724587 | 8091525*2^484942-1
p=900067529342977, 99.13M p/sec, 0.51 CPU cores, 6.8% done. ETA 05 Sep 15:04
900067553916287 | 8588259*2^483407+1
900068224207921 | 5907333*2^480414-1
900068309721587 | 4858185*2^483053+1
900069742623089 | 7249299*2^483067-1
900071614223911 | 974289*2^484133-1
900072154118867 | 3615069*2^480585-1
900072931824211 | 6749313*2^480668-1
900073013900513 | 1479111*2^482079-1
900073241850151 | 3667035*2^484867-1
900075811775299 | 5091681*2^482559+1
900076383783517 | 6995187*2^481406-1
p=900079152807937, 96.86M p/sec, 0.52 CPU cores, 7.9% done. ETA 05 Sep 15:03
900079180930459 | 8088465*2^482743+1
900080177837117 | 9137745*2^481706+1
900081068828399 | 116547*2^480962+1
900082664855509 | 4331577*2^481696-1
900084606014311 | 7923375*2^480228+1
900084897625079 | 7498953*2^482154+1
900085127281819 | 3059145*2^480229-1
900086877470243 | 2313279*2^483107+1
900087308304337 | 5166585*2^482543-1
p=900090529857537, 94.80M p/sec, 0.52 CPU cores, 9.1% done. ETA 05 Sep 15:03
900090831293629 | 9965295*2^481322-1
900091902233021 | 9990753*2^481446-1
900095462990077 | 9537003*2^480320+1
900096420949717 | 8525847*2^481988-1
900096832377143 | 1048245*2^483598+1
900096929852677 | 2153943*2^481358-1
900098509267721 | 7751367*2^481256+1
900099157340237 | 9244893*2^480360-1
900099669905143 | 9687819*2^484633+1
900101450465951 | 1940013*2^484300+1
p=900101851332609, 94.34M p/sec, 0.52 CPU cores, 10.2% done. ETA 05 Sep 15:03
900102028546621 | 2525739*2^482461+1
900102230642357 | 9699093*2^482344+1
900102319841591 | 8400777*2^481706-1
900102426091157 | 3881955*2^483157-1
900102488675867 | 337989*2^481711+1
900102580103633 | 9216783*2^482100+1
900102741563621 | 2272611*2^480277+1
900103553433571 | 7722345*2^483866-1
900104117029049 | 505821*2^480631-1
900105270926371 | 8850651*2^483739-1
900105302568581 | 7921695*2^482577+1
900106241542903 | 5146383*2^482750-1
900107926468921 | 5710305*2^481576-1
900110050114909 | 8376111*2^480199+1
900110665560263 | 7689909*2^483029-1
p=900112516399105, 88.77M p/sec, 0.52 CPU cores, 11.3% done. ETA 05 Sep 15:04
900113522958017 | 4037511*2^483349+1
900113818670537 | 2881989*2^483381+1
900114293440121 | 9168045*2^484941-1
900114895651987 | 1452225*2^484402+1
900116209588091 | 9696105*2^483814-1
900119156145683 | 1042815*2^481830+1
900120506278387 | 7095909*2^480787+1
900120924004501 | 8100075*2^480374-1
900121363886917 | 7647465*2^481616-1
900122553451477 | 5630019*2^482963+1
900122578082093 | 4954053*2^480218-1
900122941358243 | 6183075*2^482076+1
p=900123264303105, 89.57M p/sec, 0.55 CPU cores, 12.3% done. ETA 05 Sep 15:05
900124366202023 | 5674563*2^484186-1
900126758233421 | 757953*2^481946+1
900127178299367 | 2843865*2^484173-1
900128009292293 | 5166717*2^484434-1
900128886707417 | 4666179*2^484269+1
900129038645509 | 4965093*2^481026-1
900129732119477 | 8266305*2^480961-1
900131300857937 | 9063171*2^481141-1
900131738429219 | 7244415*2^480666+1
900131757667309 | 9087471*2^481013-1
900132376847051 | 8236677*2^480080+1
900133053419231 | 4099683*2^480928-1
p=900134169493505, 90.88M p/sec, 0.57 CPU cores, 13.4% done. ETA 05 Sep 15:05
900136707397699 | 4107933*2^480032+1
900138484885813 | 9160035*2^484905-1
900139177590013 | 8492325*2^481494+1
900141735781361 | 2119167*2^480722+1
900141821615489 | 5879169*2^480907+1
900143923881347 | 8023179*2^481613+1
900144031193809 | 1315365*2^482686+1
p=900144518938625, 86.23M p/sec, 0.60 CPU cores, 14.5% done. ETA 05 Sep 15:06
900146872538153 | 6797847*2^483152+1
900146934657221 | 31053*2^482810-1
900148138109243 | 6732345*2^484116-1
900149419577411 | 7471065*2^483931+1
900150206766011 | 4495635*2^480957-1
900152425932013 | 2111517*2^480778+1
900152581520117 | 8415135*2^481699-1
900153500000561 | 4769493*2^483890-1
900153813027347 | 4079283*2^481164+1
p=900154149060609, 80.25M p/sec, 0.58 CPU cores, 15.4% done. ETA 05 Sep 15:08
900155149794317 | 9979257*2^481892+1
900155755904123 | 2521533*2^483994+1
900158445390817 | 3743577*2^480768-1
900158816255227 | 6612759*2^481673-1
900159262356971 | 7420557*2^482234-1
900159600424079 | 7586655*2^481304+1
900159986271683 | 943605*2^481215+1
Cuda error: cudaStreamCreate: out of memory

tpsieve-cuda>pause
Press any key to continue . . .[/code]

Ken_g6 2010-09-05 15:45

I get the feeling I have a severe memory leak on the GPU that I didn't know I had. Someone helped me with the stream synchronization code, and it worked, but I'm starting to suspect that [B]each[/B] event and stream that is created also has to be destroyed. I'll fix it in the next release.

Ken_g6 2010-09-06 17:20

v0.1.6, of both PPSieve and TPSieve, is released. Many changes and fixes are included.

- Faster on the GPU than 0.1.5b (though about the same as 0.1.5c)
- Uses less CPU
- A huge memory leak on the GPU should be fixed.
- Input files are more often read correctly.
- Many other bugfixes and tweaks.

Get it at the usual URL, in the first post.

Edit: P.S. I've forgotten to post [url=http://github.com/Ken-g6/PSieve-CUDA]the source location[/url]!

amphoria 2010-09-07 17:24

[QUOTE=Ken_g6;228689]v0.1.6, of both PPSieve and TPSieve, is released. Many changes and fixes are included.[/QUOTE]

I have completed sieving 510-515T and the factors match those I previously found. I got 138M p/sec on a GTX465 using 0.41 CPU on a single core of a Core i7@3.6GHz. As the single core was not maxed out I decided to try running 2 instances on a single core (the other 3 cores were running instances of LLR). With 2 instances I got a combined throughput of 210M p/sec with 0.68 CPU used. This would suggest that the GTX465 wasn't maxed out either with a single instance.

Ken_g6 2010-09-07 18:28

[QUOTE=amphoria;228861]I have completed sieving 510-515T and the factors match those I previously found.[/quote]Good! :smile:

[QUOTE=amphoria;228861]With 2 instances I got a combined throughput of 210M p/sec with 0.68 CPU used. This would suggest that the GTX465 wasn't maxed out either with a single instance.[/QUOTE]
Interesting! Try fiddling with the -m option (probably going up from [i]8[/i] in increments of 1), and see if you can make a single instance do any better.


All times are UTC. The time now is 13:34.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.