![]() |
c176 polynomial search
[QUOTE=jasonp;221521]It looks like it's running okay. Do I read the output correctly that you are using a new Tesla card? If so, it's nice to know that the code runs on the latest and greatest cards.
...[/QUOTE] A preliminary report, I have two Msieve v. 1.47 jobs running, both of which say [code] > using GPU 0 (Tesla C2050) [/code] not sure whether I set two jobs on the same board or not (there are two). An hour or so into a search on a c176 I have [code] grep c5 msieve.dat.p | uniq -c 1 c5: 120120 23 c5: 180060 1 c5: 120120 3 c5: 180060 [/code] Two polyn from one range, and a flair (minor, at least) from the other. With [code] grep e-13 msieve.dat.p | wc -l 18 grep e-14 msieve.dat.p | wc -l 10 from norm 1.900665e-17 alpha -7.051647 e 9.850e-14 rroots 5 to # norm 2.235712e-17 alpha -8.116017 e 1.092e-13 rroots 3 [/code] I'm slightly surprised to see 1, 3 and 5 real roots; with different values among the best so far. Thanks to Greg for walking me through from loading drivers on the new linux machine, to fiddling with the Makefile, to getting the 10 ptx's in the correct directory; and to Serge for locating the c176 on the 3- extension. -Bruce |
Paul and I were also a little surprised that the best polynomials did not automatically have many real roots; the roots modulo small primes can make up for a lot of deficiencies in the polynomial size.
To get a list of the best polynomials given the temporary output from a poly selection run, I use [code] grep norm msieve.dat.p | sort -gk7 | tail -20 [/code] |
[QUOTE=jasonp;228413]Paul and I were also a little surprised that the best polynomials did not automatically have many real roots; the roots modulo small primes can make up for a lot of deficiencies in the polynomial size. ... I use
[code] grep norm msieve.dat.p | sort -gk7 | tail -20 [/code][/QUOTE] That would be PaulZ? Or xilman/PaulL? Yes, the small primes; looks right. Turns out that both of my first two jobs are running on the first board. I now have a 3rd job running, using '-g 1' this time, and got [code] using GPU 1 (Tesla C2050) [/code] Not much change in the largest so far (with two jobs on one of the boards ...) [code] grep e-13 msieve.dat.p | grep -v ' e 1.0' # norm 2.419345e-17 alpha -9.198820 e 1.141e-13 rroots 3 [/code] (since, by inspection of the e-13's, most were 'e 1.0'-s). The above will most like be useful, now that I've got both boards running. Serge also has a script; very subtle, as usual. Thanks for the gpu code! -Bruce |
Bruce, what does your CPU usage look like on those msieve jobs? And which SVN version did you pull?
I've been wondering if the CPU code might start becoming a bottleneck on these really fast GPUs... |
[QUOTE=jrk;228477]Bruce, what does your CPU usage look like on those msieve jobs? And which SVN version did you pull?
I've been wondering if the CPU code might start becoming a bottleneck on these really fast GPUs...[/QUOTE] I picked up [code] http://msieve.svn.sourceforge.net/viewvc/msieve/trunk.tar.gz?view=tar [/code] (an address from one of Greg's posts re: people not on svn), and am not sure where to find the svn; of the files replying to -ltrd in the trunk directory, the latest dates are [code] -rw-r--r-- 1 bad0 355 11760 Aug 12 17:50 Makefile -rw-r--r-- 1 bad0 355 91577 Aug 15 22:16 Changes -rw-r--r-- 1 bad0 355 15396 Aug 25 03:10 demo.c [/code] Best I can do for locating a range in which the svn can be. Likewise, I've no idea about how I might separate gpu usage from cpu usage. I took an hour's reading from 'ps -ef' and got [code] 272 Sep 5 06:51 psef.sun07.txt 272 Sep 5 07:51 psef.sun08.txt with psef.sun07.txt: bad0 15286 7721 84 Sep04 pts/1 17:26:39 ./msieve -v -np 120001,180000 bad0 15373 7721 39 Sep04 pts/1 08:03:57 ./msieve -l msieveg1.log -v -np 180001,200000 bad0 18347 7721 51 Sep04 pts/1 07:12:16 ./msieve -g 1 -l msieveg1a.log -v -np 100001,120000 and psef.sun08.txt: bad0 15286 7721 85 Sep04 pts/1 18:26:21 ./msieve -v -np 120001,180000 bad0 15373 7721 39 Sep04 pts/1 08:24:13 ./msieve -l msieveg1.log -v -np 180001,200000 bad0 18347 7721 51 Sep04 pts/1 07:41:07 ./msieve -g 1 -l msieveg1a.log -v -np 100001,120000 [/code] So the first job submitted to '-g 0' is shown in ps to have gotten pretty much 60 minutes in 60 minutes, walltime. The second job, which also claims to have been running on '-g 0' appears to have accumulated c. 20 minutes; and the third job, claiming to be running on '-g 1' had c. 39 minutes. That sounds more like the 2nd and 3rd jobs both ran on '-g 1'; and perhaps the 'ps' reading is just showing walltime, rather than gputime (that's a word? for certain the 'ps' isn't showing cputime, as most of the polyn searching time is supposed to be in the stage 1 that runs on the card). Hmm. That's not good! The first job hadn't spit out any new polyn in c. 12hrs, so I decided to take that one off of '-g 0'; hoping to leave just the 2nd one running there. But now that I'm checking, kill -TERM took the only job running on '-g 0' off (I wasn't sure, but -TERM still works on the gpu); and during the past hour the 2nd and 3rd job are showing 30 minutes each, at least, as reported by ps, which sounds like confirmation that they're both running at half-time on just one of the cards. Sigh. I'll try -TERMing the two running; and resubmit one job each to -g 0 and -g 1. Not sure whether the 2nd logfile report that the 2nd job was also on -g 0 was false; and omitting -g successfully put one job on each card, but I'll go with -g and hope to get one job on each card (at last!). The machine has 6 cpus and the 2 cards (I lost a vote to buy 4 cards ...), and there are six bonic jobs running along with the gpu jobs; if that complicates figuring out the answer to your question. In any case, here's the logfile from the first job (it's short) [code] Sat Sep 4 10:15:28 2010 Sat Sep 4 10:15:28 2010 Sat Sep 4 10:15:28 2010 Msieve v. 1.47 Sat Sep 4 10:15:28 2010 random seeds: acd21bbb b1aed647 Sat Sep 4 10:15:28 2010 factoring 14517736555533692118889909159833393968550085834163333 45491555533517024104751904134775278520633340964487093191995272576312480523412906546064107 1418096710862391316053652459514131 (176 digits) Sat Sep 4 10:15:30 2010 no P-1/P+1/ECM available, skipping Sat Sep 4 10:15:30 2010 commencing number field sieve (176-digit input) Sat Sep 4 10:15:30 2010 commencing number field sieve polynomial selection Sat Sep 4 10:15:30 2010 searching leading coefficients from 120001 to 180000 Sat Sep 4 10:15:30 2010 using GPU 0 (Tesla C2050) Sun Sep 5 10:13:32 2010 polynomial selection complete Sun Sep 5 10:13:32 2010 R0: -10379984941552728453972444081504096 Sun Sep 5 10:13:32 2010 R1: 9877593410632453061 Sun Sep 5 10:13:32 2010 A0: 255772912651460129632948684999205042316044455 Sun Sep 5 10:13:32 2010 A1: 1954830018665003941391142926871053678 Sun Sep 5 10:13:32 2010 A2: -471066766530980506862524849539 Sun Sep 5 10:13:32 2010 A3: 1778379708038918136760 Sun Sep 5 10:13:32 2010 A4: 97672503713942 Sun Sep 5 10:13:32 2010 A5: 120480 Sun Sep 5 10:13:32 2010 skew 76104896.21, size 2.678e-17, alpha -6.876, combined = 1.23 3e-13 rroots = 5 Sun Sep 5 10:13:32 2010 elapsed time 23:58:04 [/code] which has the current best polyn. The time here seems clearly to be walltime, rather than either cputime or gputime(?). -Bruce |
54.6% of the time spent on the cpu??
[QUOTE=bdodson;228541] ...
I've no idea about how I might separate gpu usage from cpu usage. I took an hour's reading from 'ps -ef' and got [code] 272 Sep 5 06:51 psef.sun07.txt 272 Sep 5 07:51 psef.sun08.txt ... [/code] ... I'll go with -g and hope to get one job on each card (at last!). ... The machine has 6 cpus and the 2 cards (I lost a vote to buy 4 cards ...), and there are six bonic jobs running along with the gpu jobs; ...The time here seems clearly to be walltime, rather than either cputime or gputime(?). -Bruce[/QUOTE] After looking at the first of two ps timings, the six boinc jobs may not have been a good idea. (And perhaps I'd better update to the most recent client.) [code] 200 Sep 6 07:07 psef.mon07.txt 200 Sep 6 11:09 psef.mon11.txt [/code] has c. 19hrs of walltime (from start, noon yesterday to mon07), but the ps only shows c. 22:45 elapsed: [code] psef.mon07: 10:56:44 ./msieve -g 0 -l msieveg0b.log -v -np 122401,180000 11:48:30 ./msieve -g 1 -l msieveg1b.log -v -np 187201,200000 --------- 22:45 [/code] Not looking very closely (at 7am, before class), this looked a lot like both jobs on one card (I probably approximated c. 20 =.c 20, instead of 19 and 22:45). So I was about to consider adding on a 3rd job to make sure that something was running on each card; but since we're thinking about cputime -vs- "time on the gpu", I tried cutting back to four boinc jobs, leaving two cores idle. Thinking that maybe one-or-both cards might have been waiting for an _idle_ cpu? So three hours later (with two free cores) [code] psefmon11: 13:00:02 ./msieve -g 0 -l msieveg0b.log -v -np 122401,180000 13:47:44 ./msieve -g 1 -l msieveg1b.log -v -np 187201,200000 ---- 26:47 [/code] shows 26:47 - 22:45, which is up 4 hrs after 3 hrs; a lot better than 22:45 after 19 hrs. Of course, I'd be happiest demonstrating two well-performing cards with new best polyn; but there's no reason to believe that the two cards weren't happy running three jobs. Since the reset with g0/g1, there are just three new polyn in the top10 [code] grep norm msieve.dat.p | sort -gk7 | tail # norm 2.425973e-17 alpha -7.915881 e 1.132e-13 rroots 5 # norm 2.419345e-17 alpha -9.198820 e 1.141e-13 rroots 3 # norm 2.410989e-17 alpha -7.817632 e 1.143e-13 rroots 5 *new* # norm 2.461850e-17 alpha -8.392563 e 1.145e-13 rroots 5 *new* # norm 2.435641e-17 alpha -6.972853 e 1.169e-13 rroots 3 # norm 2.486423e-17 alpha -7.092835 e 1.173e-13 rroots 3 # norm 2.499984e-17 alpha -8.999410 e 1.177e-13 rroots 3 *new* # norm 2.496365e-17 alpha -8.494013 e 1.180e-13 rroots 3 [last of 1st 3] # norm 2.490282e-17 alpha -7.085114 e 1.182e-13 rroots 3 # norm 2.677511e-17 alpha -6.875759 e 1.233e-13 rroots 5 [/code] (Actually, the longest running of the three jobs, over 25hrs, had best polyn from [code] save 2.235712e-17 -8.1160 186600946.85 1.092049e-13 rroots 3 [/code] which is the current 21st best; six are from the 1st job of 3.) On the topic of cpu-use by msieve_gpu's polyn search, I'm just now seeing some from top [code] 8881 39 19 1605m 765m 752 R 100.1 3.2 19:14.98 lasievef_1.07_x 8926 39 19 1606m 766m 740 R 100.1 3.2 16:01.03 lasievef_1.07_x 30759 25 0 115m 86m 19m R 100.1 0.4 871:35.74 msieve 8970 39 19 1606m 766m 752 R 99.8 3.2 12:37.21 lasievef_1.07_x 9054 39 19 1606m 765m 744 R 99.5 3.2 5:33.25 lasievef_1.07_x 30752 17 0 115m 86m 19m R 44.2 0.4 799:28.80 msieve [/code] in which the boinc jobs are at lowest priority ("19"), while msieve_gpu is running without nicing ("0"). So that's 100% on one core for pid 30759 and 44.2% on one of the other cores fro pid 30752. Does anyone believe that it is possible that the timings on these two jobs is showing ONLY the cputime; so 871 minutes for pid 30759 and 799 minutes for pid 30752? These are jobs that started 1465 hours ago, and that would say that notably past half of the time was spent on the cards waiting for the cpu to report back. -Bruce PS --- (1) So that would have the mon07 reading saying 22:45 hrs out of 2*19 = 38hrs spent on the cpu-bound stage. Top often shows the msieve_gpu's not listed ... NO!, it's a rare 10sec interval without one of them showing, and most often not at 0% like [code] 64.2 0.4 803:11.88 msieve 51.6 0.4 875:58.57 msieve [/code] 64% and 51% of the cpu. Mmmpf, I waited quite a long time (2 min? more than 12 10sec readings) to get [code] 0.3 0.4 803:53.40 msieve 0.3 0.4 877:04.59 msieve [/code] both < 1.0% That seems to be lowest (the 0.4 is % of memory use); almost every 10sec cycle shows either a msieve_gpu running or just finishing (starting?) a run at < 1%. Maybe I've accidentally answered jrk's question? (2) And keeping two idle cores, which raised the % of cputime to 4 hours cputime out of 2-times-3_hrs walltime brings the percentage in cputime up to 66% (4 of 6), with the lower percentage indicating the process having both the card and the cpu waiting? |
I usually find things like
Sat Sep 4 09:23:01 2010 time limit set to 37.65 hours Sat Sep 4 09:23:02 2010 using GPU 0 (GeForce GTX 275) Mon Sep 6 04:07:58 2010 polynomial selection complete Mon Sep 6 04:07:58 2010 elapsed time 42:44:58 where the elapsed time is 15% more than the time limit, if the msieve process got a CPU to itself, and if it had to share with sievers I see things like Tue Aug 31 08:01:45 2010 time limit set to 73.00 hours Sat Sep 4 01:21:42 2010 elapsed time 89:20:00 So the elapsed time is always a bit more than the time limit, and if the process isn't running on a whole CPU it's quite a lot more. The timings from 'top' show CPU-time only |
[QUOTE=fivemack;228693]I usually find things like
... where the elapsed time is 15% more than the time limit, if the msieve process got a CPU to itself, and if it had to share with sievers I see things like ... So the elapsed time is always a bit more than the time limit, and if the process isn't running on a whole CPU it's quite a lot more. The timings from 'top' show CPU-time only[/QUOTE] Thanks. This is a Westmere hexacore, 6 cores (rather than quadcore); 32-mn, xeon x5650. Seems a shame to keep all six cores idle; two idle doesn't seem too bad. Maybe I ought to be trying 6-threaded Msieve/Lanczos. Since I'm using -np, I don't get a "time limit". The number's a C176; and I'd like to see what fermi (x2) comes up with. -Bruce |
[QUOTE=bdodson;228710]Thanks. ... Seems a shame to keep all six cores idle; two
idle doesn't seem too bad. ... Since I'm using -np, I don't get a "time limit". ... The number's a C176; and I'd like to see what fermi (x2) comes up with. -Bruce[/QUOTE] Only one change in the best10, a new 2nd best [code] # norm 2.490282e-17 alpha -7.085114 e 1.182e-13 rroots 3 # norm 2.750309e-17 alpha -8.147588 e 1.217e-13 rroots 5 [*new*] # norm 2.677511e-17 alpha -6.875759 e 1.233e-13 rroots 5 [/code] On the "time limit", we were wondering whether there's a 1.3e-13 to be had; and confirm that the 1.2e-13 wasn't unique. I have 12 search ranges like [code] searching leading coefficients from 100001 to 120000 coeff 100020-102360 2787800423 3066580465 3066580466 3373238512 ... searching leading coefficients from 180001 to 200000 coeff 180060-182400 2955052791 3250558070 3250558071 3575613878 [/code] that were completely searched (800 sec/coef deadline), with 3 more partially searched ranges and 2 more currently running. Not sure how much of 1 to 200000 is worth including as a test case (perhaps as a guide to objectives for gnfs181 and/or gnfs187?). For this gnfs176, looks like the 2 boards are searching the coef from a range of c. 20000/day; in which case half of 1-to-200000 takes five days. Timings for the cpu portion of the msieve_gpu searches seem to be all over. In the most recent 11 hrs, the job on g0 accumulated 6:40 hrs of cputime (as per ps); while the job on g1 accumulated just 3:10 hrs. Guess I can pitch the boinc jobs for a day, to see what happens. Equal access to an idle core isn't the only issue. Over the nearly two days of the present run, g0 found 12997 poly, and saved 350 of them; while g1 found 16686 poly, and saved 285. Hard to tell whether this is the expected variation; or maybe one of the cards is getting better cpu access. -Bruce |
Stage 2 still runs on the CPU, so a stage 1 hit will cause the GPU to stop running while stage 2 complete. Greg has seen stage 2 jobs that take many hours, so this should account for the difference in GPU time.
|
[QUOTE=jasonp;228931]Stage 2 still runs on the CPU, so a stage 1 hit will cause the GPU to stop running while stage 2 complete. Greg has seen stage 2 jobs that take many hours, so this should account for the difference in GPU time.[/QUOTE]
This is quite timely; as I think that I can match that. First, an update, only one new top10 entry; in the 8th spot: [code] # norm 2.482413e-17 alpha -8.632319 e 1.163e-13 rroots 5 [*new*] # norm 2.435641e-17 alpha -6.972853 e 1.169e-13 rroots 3 # norm 2.486423e-17 alpha -7.092835 e 1.173e-13 rroots 3 # norm 2.499984e-17 alpha -8.999410 e 1.177e-13 rroots 3 # norm 2.496365e-17 alpha -8.494013 e 1.180e-13 rroots 3 # norm 2.490282e-17 alpha -7.085114 e 1.182e-13 rroots 3 # norm 2.750309e-17 alpha -8.147588 e 1.217e-13 rroots 5 # norm 2.677511e-17 alpha -6.875759 e 1.233e-13 rroots 5 [/code] Next, a clairification on the stages; does Stage 1 find the "leading" coeff c5, on the algebraic side; while Stage 2 finds the "coef" that gives the leading term Y1 on the rational side? The search range 180001-to-200000 finished before noon yesterday; and I started on 1-to-60000. (Previous experience suggesting that small c5's give some plausible candidates that are otherwise missed.) So, if I'm reading correctly, fermi-g1 promptly locates c5 = 240 at like 2 minutes before noon; and ever since one of the cores has been running through Y1's for that c5, at 800sec max/coeff. That would be 16:45 hrs straight cputime. Today's new 8th place candidate is one of those: [code] poly 33 p 1818638683 q 2106911057 coeff 3831709949900617931 poly 30 p 1819259809 q 2106812507 coeff 3832839319083631163 poly 3 p 1819590263 q 2106375457 coeff 3832740271779375191 <--*** --- save 2.482413e-17 -8.6323 1287394159.87 1.163205e-13 rroots 5 --- # norm 2.482413e-17 alpha -8.632319 e 1.163e-13 rroots 5 skew: 1287394159.87 c0: -1042633613686720734290044400807825528939776194759 c1: 98429999145482704955198799209301519657 c2: 17523303038112215463835268730871 c3: 11356224849022898693143 c4: -11842426602752 c5: 240 <---* Y0: -36002941945870027187523624257231332 Y1: 3832740271779375191 <--- *** [/code] -Bruce Off Topic PS: In other local news, the matrix for 2p1043, at 17M^2 is due on Saturday; the matrix 7M707 just started at 12M^2, and needs another week-or-so; and sieving for 2L2370 is already halfway done. |
| All times are UTC. The time now is 04:50. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.