![]() |
|
|
#1 | |
|
Jun 2005
lehigh.edu
20008 Posts |
Quote:
Code:
> using GPU 0 (Tesla C2050) An hour or so into a search on a c176 I have Code:
grep c5 msieve.dat.p | uniq -c
1 c5: 120120
23 c5: 180060
1 c5: 120120
3 c5: 180060
With Code:
grep e-13 msieve.dat.p | wc -l 18 grep e-14 msieve.dat.p | wc -l 10 from norm 1.900665e-17 alpha -7.051647 e 9.850e-14 rroots 5 to # norm 2.235712e-17 alpha -8.116017 e 1.092e-13 rroots 3 values among the best so far. Thanks to Greg for walking me through from loading drivers on the new linux machine, to fiddling with the Makefile, to getting the 10 ptx's in the correct directory; and to Serge for locating the c176 on the 3- extension. -Bruce Last fiddled with by bdodson on 2010-09-04 at 15:47 Reason: clarity/info |
|
|
|
|
|
|
#2 |
|
Tribal Bullet
Oct 2004
3,541 Posts |
Paul and I were also a little surprised that the best polynomials did not automatically have many real roots; the roots modulo small primes can make up for a lot of deficiencies in the polynomial size.
To get a list of the best polynomials given the temporary output from a poly selection run, I use Code:
grep norm msieve.dat.p | sort -gk7 | tail -20 |
|
|
|
|
|
#3 | |
|
Jun 2005
lehigh.edu
40016 Posts |
Quote:
Turns out that both of my first two jobs are running on the first board. I now have a 3rd job running, using '-g 1' this time, and got Code:
using GPU 1 (Tesla C2050) Code:
grep e-13 msieve.dat.p | grep -v ' e 1.0' # norm 2.419345e-17 alpha -9.198820 e 1.141e-13 rroots 3 will most like be useful, now that I've got both boards running. Serge also has a script; very subtle, as usual. Thanks for the gpu code! -Bruce Last fiddled with by bdodson on 2010-09-04 at 21:29 |
|
|
|
|
|
|
#4 |
|
May 2008
3·5·73 Posts |
Bruce, what does your CPU usage look like on those msieve jobs? And which SVN version did you pull?
I've been wondering if the CPU code might start becoming a bottleneck on these really fast GPUs... |
|
|
|
|
|
#5 | |
|
Jun 2005
lehigh.edu
210 Posts |
Quote:
Code:
http://msieve.svn.sourceforge.net/viewvc/msieve/trunk.tar.gz?view=tar not sure where to find the svn; of the files replying to -ltrd in the trunk directory, the latest dates are Code:
-rw-r--r-- 1 bad0 355 11760 Aug 12 17:50 Makefile -rw-r--r-- 1 bad0 355 91577 Aug 15 22:16 Changes -rw-r--r-- 1 bad0 355 15396 Aug 25 03:10 demo.c I've no idea about how I might separate gpu usage from cpu usage. I took an hour's reading from 'ps -ef' and got Code:
272 Sep 5 06:51 psef.sun07.txt 272 Sep 5 07:51 psef.sun08.txt with psef.sun07.txt: bad0 15286 7721 84 Sep04 pts/1 17:26:39 ./msieve -v -np 120001,180000 bad0 15373 7721 39 Sep04 pts/1 08:03:57 ./msieve -l msieveg1.log -v -np 180001,200000 bad0 18347 7721 51 Sep04 pts/1 07:12:16 ./msieve -g 1 -l msieveg1a.log -v -np 100001,120000 and psef.sun08.txt: bad0 15286 7721 85 Sep04 pts/1 18:26:21 ./msieve -v -np 120001,180000 bad0 15373 7721 39 Sep04 pts/1 08:24:13 ./msieve -l msieveg1.log -v -np 180001,200000 bad0 18347 7721 51 Sep04 pts/1 07:41:07 ./msieve -g 1 -l msieveg1a.log -v -np 100001,120000 much 60 minutes in 60 minutes, walltime. The second job, which also claims to have been running on '-g 0' appears to have accumulated c. 20 minutes; and the third job, claiming to be running on '-g 1' had c. 39 minutes. That sounds more like the 2nd and 3rd jobs both ran on '-g 1'; and perhaps the 'ps' reading is just showing walltime, rather than gputime (that's a word? for certain the 'ps' isn't showing cputime, as most of the polyn searching time is supposed to be in the stage 1 that runs on the card). Hmm. That's not good! The first job hadn't spit out any new polyn in c. 12hrs, so I decided to take that one off of '-g 0'; hoping to leave just the 2nd one running there. But now that I'm checking, kill -TERM took the only job running on '-g 0' off (I wasn't sure, but -TERM still works on the gpu); and during the past hour the 2nd and 3rd job are showing 30 minutes each, at least, as reported by ps, which sounds like confirmation that they're both running at half-time on just one of the cards. Sigh. I'll try -TERMing the two running; and resubmit one job each to -g 0 and -g 1. Not sure whether the 2nd logfile report that the 2nd job was also on -g 0 was false; and omitting -g successfully put one job on each card, but I'll go with -g and hope to get one job on each card (at last!). The machine has 6 cpus and the 2 cards (I lost a vote to buy 4 cards ...), and there are six bonic jobs running along with the gpu jobs; if that complicates figuring out the answer to your question. In any case, here's the logfile from the first job (it's short) Code:
Sat Sep 4 10:15:28 2010 Sat Sep 4 10:15:28 2010 Sat Sep 4 10:15:28 2010 Msieve v. 1.47 Sat Sep 4 10:15:28 2010 random seeds: acd21bbb b1aed647 Sat Sep 4 10:15:28 2010 factoring 14517736555533692118889909159833393968550085834163333 45491555533517024104751904134775278520633340964487093191995272576312480523412906546064107 1418096710862391316053652459514131 (176 digits) Sat Sep 4 10:15:30 2010 no P-1/P+1/ECM available, skipping Sat Sep 4 10:15:30 2010 commencing number field sieve (176-digit input) Sat Sep 4 10:15:30 2010 commencing number field sieve polynomial selection Sat Sep 4 10:15:30 2010 searching leading coefficients from 120001 to 180000 Sat Sep 4 10:15:30 2010 using GPU 0 (Tesla C2050) Sun Sep 5 10:13:32 2010 polynomial selection complete Sun Sep 5 10:13:32 2010 R0: -10379984941552728453972444081504096 Sun Sep 5 10:13:32 2010 R1: 9877593410632453061 Sun Sep 5 10:13:32 2010 A0: 255772912651460129632948684999205042316044455 Sun Sep 5 10:13:32 2010 A1: 1954830018665003941391142926871053678 Sun Sep 5 10:13:32 2010 A2: -471066766530980506862524849539 Sun Sep 5 10:13:32 2010 A3: 1778379708038918136760 Sun Sep 5 10:13:32 2010 A4: 97672503713942 Sun Sep 5 10:13:32 2010 A5: 120480 Sun Sep 5 10:13:32 2010 skew 76104896.21, size 2.678e-17, alpha -6.876, combined = 1.23 3e-13 rroots = 5 Sun Sep 5 10:13:32 2010 elapsed time 23:58:04 walltime, rather than either cputime or gputime(?). -Bruce |
|
|
|
|
|
|
#6 | |
|
Jun 2005
lehigh.edu
102410 Posts |
Quote:
not have been a good idea. (And perhaps I'd better update to the most recent client.) Code:
200 Sep 6 07:07 psef.mon07.txt 200 Sep 6 11:09 psef.mon11.txt the ps only shows c. 22:45 elapsed: Code:
psef.mon07: 10:56:44 ./msieve -g 0 -l msieveg0b.log -v -np 122401,180000 11:48:30 ./msieve -g 1 -l msieveg1b.log -v -np 187201,200000 --------- 22:45 like both jobs on one card (I probably approximated c. 20 =.c 20, instead of 19 and 22:45). So I was about to consider adding on a 3rd job to make sure that something was running on each card; but since we're thinking about cputime -vs- "time on the gpu", I tried cutting back to four boinc jobs, leaving two cores idle. Thinking that maybe one-or-both cards might have been waiting for an _idle_ cpu? So three hours later (with two free cores) Code:
psefmon11: 13:00:02 ./msieve -g 0 -l msieveg0b.log -v -np 122401,180000 13:47:44 ./msieve -g 1 -l msieveg1b.log -v -np 187201,200000 ---- 26:47 than 22:45 after 19 hrs. Of course, I'd be happiest demonstrating two well-performing cards with new best polyn; but there's no reason to believe that the two cards weren't happy running three jobs. Since the reset with g0/g1, there are just three new polyn in the top10 Code:
grep norm msieve.dat.p | sort -gk7 | tail # norm 2.425973e-17 alpha -7.915881 e 1.132e-13 rroots 5 # norm 2.419345e-17 alpha -9.198820 e 1.141e-13 rroots 3 # norm 2.410989e-17 alpha -7.817632 e 1.143e-13 rroots 5 *new* # norm 2.461850e-17 alpha -8.392563 e 1.145e-13 rroots 5 *new* # norm 2.435641e-17 alpha -6.972853 e 1.169e-13 rroots 3 # norm 2.486423e-17 alpha -7.092835 e 1.173e-13 rroots 3 # norm 2.499984e-17 alpha -8.999410 e 1.177e-13 rroots 3 *new* # norm 2.496365e-17 alpha -8.494013 e 1.180e-13 rroots 3 [last of 1st 3] # norm 2.490282e-17 alpha -7.085114 e 1.182e-13 rroots 3 # norm 2.677511e-17 alpha -6.875759 e 1.233e-13 rroots 5 best polyn from Code:
save 2.235712e-17 -8.1160 186600946.85 1.092049e-13 rroots 3 On the topic of cpu-use by msieve_gpu's polyn search, I'm just now seeing some from top Code:
8881 39 19 1605m 765m 752 R 100.1 3.2 19:14.98 lasievef_1.07_x 8926 39 19 1606m 766m 740 R 100.1 3.2 16:01.03 lasievef_1.07_x 30759 25 0 115m 86m 19m R 100.1 0.4 871:35.74 msieve 8970 39 19 1606m 766m 752 R 99.8 3.2 12:37.21 lasievef_1.07_x 9054 39 19 1606m 765m 744 R 99.5 3.2 5:33.25 lasievef_1.07_x 30752 17 0 115m 86m 19m R 44.2 0.4 799:28.80 msieve is running without nicing ("0"). So that's 100% on one core for pid 30759 and 44.2% on one of the other cores fro pid 30752. Does anyone believe that it is possible that the timings on these two jobs is showing ONLY the cputime; so 871 minutes for pid 30759 and 799 minutes for pid 30752? These are jobs that started 1465 hours ago, and that would say that notably past half of the time was spent on the cards waiting for the cpu to report back. -Bruce PS --- (1) So that would have the mon07 reading saying 22:45 hrs out of 2*19 = 38hrs spent on the cpu-bound stage. Top often shows the msieve_gpu's not listed ... NO!, it's a rare 10sec interval without one of them showing, and most often not at 0% like Code:
64.2 0.4 803:11.88 msieve 51.6 0.4 875:58.57 msieve more than 12 10sec readings) to get Code:
0.3 0.4 803:53.40 msieve 0.3 0.4 877:04.59 msieve almost every 10sec cycle shows either a msieve_gpu running or just finishing (starting?) a run at < 1%. Maybe I've accidentally answered jrk's question? (2) And keeping two idle cores, which raised the % of cputime to 4 hours cputime out of 2-times-3_hrs walltime brings the percentage in cputime up to 66% (4 of 6), with the lower percentage indicating the process having both the card and the cpu waiting? |
|
|
|
|
|
|
#7 |
|
(loop (#_fork))
Feb 2006
Cambridge, England
641910 Posts |
I usually find things like
Sat Sep 4 09:23:01 2010 time limit set to 37.65 hours Sat Sep 4 09:23:02 2010 using GPU 0 (GeForce GTX 275) Mon Sep 6 04:07:58 2010 polynomial selection complete Mon Sep 6 04:07:58 2010 elapsed time 42:44:58 where the elapsed time is 15% more than the time limit, if the msieve process got a CPU to itself, and if it had to share with sievers I see things like Tue Aug 31 08:01:45 2010 time limit set to 73.00 hours Sat Sep 4 01:21:42 2010 elapsed time 89:20:00 So the elapsed time is always a bit more than the time limit, and if the process isn't running on a whole CPU it's quite a lot more. The timings from 'top' show CPU-time only Last fiddled with by fivemack on 2010-09-06 at 17:42 |
|
|
|
|
|
#8 | |
|
Jun 2005
lehigh.edu
210 Posts |
Quote:
32-mn, xeon x5650. Seems a shame to keep all six cores idle; two idle doesn't seem too bad. Maybe I ought to be trying 6-threaded Msieve/Lanczos. Since I'm using -np, I don't get a "time limit". The number's a C176; and I'd like to see what fermi (x2) comes up with. -Bruce |
|
|
|
|
|
|
#9 | |
|
Jun 2005
lehigh.edu
20008 Posts |
Quote:
Code:
# norm 2.490282e-17 alpha -7.085114 e 1.182e-13 rroots 3 # norm 2.750309e-17 alpha -8.147588 e 1.217e-13 rroots 5 [*new*] # norm 2.677511e-17 alpha -6.875759 e 1.233e-13 rroots 5 to be had; and confirm that the 1.2e-13 wasn't unique. I have 12 search ranges like Code:
searching leading coefficients from 100001 to 120000 coeff 100020-102360 2787800423 3066580465 3066580466 3373238512 ... searching leading coefficients from 180001 to 200000 coeff 180060-182400 2955052791 3250558070 3250558071 3575613878 partially searched ranges and 2 more currently running. Not sure how much of 1 to 200000 is worth including as a test case (perhaps as a guide to objectives for gnfs181 and/or gnfs187?). For this gnfs176, looks like the 2 boards are searching the coef from a range of c. 20000/day; in which case half of 1-to-200000 takes five days. Timings for the cpu portion of the msieve_gpu searches seem to be all over. In the most recent 11 hrs, the job on g0 accumulated 6:40 hrs of cputime (as per ps); while the job on g1 accumulated just 3:10 hrs. Guess I can pitch the boinc jobs for a day, to see what happens. Equal access to an idle core isn't the only issue. Over the nearly two days of the present run, g0 found 12997 poly, and saved 350 of them; while g1 found 16686 poly, and saved 285. Hard to tell whether this is the expected variation; or maybe one of the cards is getting better cpu access. -Bruce |
|
|
|
|
|
|
#10 |
|
Tribal Bullet
Oct 2004
3,541 Posts |
Stage 2 still runs on the CPU, so a stage 1 hit will cause the GPU to stop running while stage 2 complete. Greg has seen stage 2 jobs that take many hours, so this should account for the difference in GPU time.
|
|
|
|
|
|
#11 | |
|
Jun 2005
lehigh.edu
20008 Posts |
Quote:
only one new top10 entry; in the 8th spot: Code:
# norm 2.482413e-17 alpha -8.632319 e 1.163e-13 rroots 5 [*new*] # norm 2.435641e-17 alpha -6.972853 e 1.169e-13 rroots 3 # norm 2.486423e-17 alpha -7.092835 e 1.173e-13 rroots 3 # norm 2.499984e-17 alpha -8.999410 e 1.177e-13 rroots 3 # norm 2.496365e-17 alpha -8.494013 e 1.180e-13 rroots 3 # norm 2.490282e-17 alpha -7.085114 e 1.182e-13 rroots 3 # norm 2.750309e-17 alpha -8.147588 e 1.217e-13 rroots 5 # norm 2.677511e-17 alpha -6.875759 e 1.233e-13 rroots 5 coeff c5, on the algebraic side; while Stage 2 finds the "coef" that gives the leading term Y1 on the rational side? The search range 180001-to-200000 finished before noon yesterday; and I started on 1-to-60000. (Previous experience suggesting that small c5's give some plausible candidates that are otherwise missed.) So, if I'm reading correctly, fermi-g1 promptly locates c5 = 240 at like 2 minutes before noon; and ever since one of the cores has been running through Y1's for that c5, at 800sec max/coeff. That would be 16:45 hrs straight cputime. Today's new 8th place candidate is one of those: Code:
poly 33 p 1818638683 q 2106911057 coeff 3831709949900617931 poly 30 p 1819259809 q 2106812507 coeff 3832839319083631163 poly 3 p 1819590263 q 2106375457 coeff 3832740271779375191 <--*** --- save 2.482413e-17 -8.6323 1287394159.87 1.163205e-13 rroots 5 --- # norm 2.482413e-17 alpha -8.632319 e 1.163e-13 rroots 5 skew: 1287394159.87 c0: -1042633613686720734290044400807825528939776194759 c1: 98429999145482704955198799209301519657 c2: 17523303038112215463835268730871 c3: 11356224849022898693143 c4: -11842426602752 c5: 240 <---* Y0: -36002941945870027187523624257231332 Y1: 3832740271779375191 <--- *** Off Topic PS: In other local news, the matrix for 2p1043, at 17M^2 is due on Saturday; the matrix 7M707 just started at 12M^2, and needs another week-or-so; and sieving for 2L2370 is already halfway done. |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Tweaking polynomial search for C197 | fivemack | Msieve | 38 | 2011-07-08 08:12 |
| 109!+1 polynomial search | fivemack | Factoring | 122 | 2009-02-24 07:03 |
| 5^421-1 polynomial search | fivemack | Factoring | 61 | 2008-07-21 11:16 |
| 6^383+1 by GNFS (polynomial search; now complete) | fivemack | Factoring | 20 | 2007-12-26 10:36 |
| GNFS polynomial search tools | JHansen | Factoring | 0 | 2004-11-07 12:15 |