mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2010-09-04, 15:43   #1
bdodson
 
bdodson's Avatar
 
Jun 2005
lehigh.edu

20008 Posts
Default c176 polynomial search

Quote:
Originally Posted by jasonp View Post
It looks like it's running okay. Do I read the output correctly that you are using a new Tesla card? If so, it's nice to know that the code runs on the latest and greatest cards.
...
A preliminary report, I have two Msieve v. 1.47 jobs running, both of which say
Code:
> using GPU 0 (Tesla C2050)
not sure whether I set two jobs on the same board or not (there are two).
An hour or so into a search on a c176 I have
Code:
 grep c5 msieve.dat.p | uniq -c
      1 c5:  120120
     23 c5:  180060
      1 c5:  120120
      3 c5:  180060
Two polyn from one range, and a flair (minor, at least) from the other.
With
Code:
grep e-13 msieve.dat.p | wc -l
18

grep e-14 msieve.dat.p | wc -l
10

from norm 1.900665e-17 alpha -7.051647 e 9.850e-14 rroots 5

to # norm 2.235712e-17 alpha -8.116017 e 1.092e-13 rroots 3
I'm slightly surprised to see 1, 3 and 5 real roots; with different
values among the best so far.

Thanks to Greg for walking me through from loading drivers on
the new linux machine, to fiddling with the Makefile, to getting
the 10 ptx's in the correct directory; and to Serge for locating
the c176 on the 3- extension. -Bruce

Last fiddled with by bdodson on 2010-09-04 at 15:47 Reason: clarity/info
bdodson is offline   Reply With Quote
Old 2010-09-04, 18:13   #2
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3,541 Posts
Default

Paul and I were also a little surprised that the best polynomials did not automatically have many real roots; the roots modulo small primes can make up for a lot of deficiencies in the polynomial size.

To get a list of the best polynomials given the temporary output from a poly selection run, I use
Code:
grep norm msieve.dat.p | sort -gk7 | tail -20
jasonp is offline   Reply With Quote
Old 2010-09-04, 21:28   #3
bdodson
 
bdodson's Avatar
 
Jun 2005
lehigh.edu

40016 Posts
Default

Quote:
Originally Posted by jasonp View Post
Paul and I were also a little surprised that the best polynomials did not automatically have many real roots; the roots modulo small primes can make up for a lot of deficiencies in the polynomial size. ... I use
Code:
grep norm msieve.dat.p | sort -gk7 | tail -20
That would be PaulZ? Or xilman/PaulL? Yes, the small primes; looks right.
Turns out that both of my first two jobs are running on the first board.
I now have a 3rd job running, using '-g 1' this time, and got
Code:
 using GPU 1 (Tesla C2050)
Not much change in the largest so far (with two jobs on one of the boards ...)
Code:
grep e-13 msieve.dat.p | grep -v ' e 1.0'
# norm 2.419345e-17 alpha -9.198820 e 1.141e-13 rroots 3
(since, by inspection of the e-13's, most were 'e 1.0'-s). The above
will most like be useful, now that I've got both boards running. Serge
also has a script; very subtle, as usual. Thanks for the gpu code! -Bruce

Last fiddled with by bdodson on 2010-09-04 at 21:29
bdodson is offline   Reply With Quote
Old 2010-09-05, 03:17   #4
jrk
 
jrk's Avatar
 
May 2008

3·5·73 Posts
Default

Bruce, what does your CPU usage look like on those msieve jobs? And which SVN version did you pull?

I've been wondering if the CPU code might start becoming a bottleneck on these really fast GPUs...
jrk is offline   Reply With Quote
Old 2010-09-05, 16:14   #5
bdodson
 
bdodson's Avatar
 
Jun 2005
lehigh.edu

210 Posts
Default

Quote:
Originally Posted by jrk View Post
Bruce, what does your CPU usage look like on those msieve jobs? And which SVN version did you pull?

I've been wondering if the CPU code might start becoming a bottleneck on these really fast GPUs...
I picked up
Code:
http://msieve.svn.sourceforge.net/viewvc/msieve/trunk.tar.gz?view=tar
(an address from one of Greg's posts re: people not on svn), and am
not sure where to find the svn; of the files replying to -ltrd in the
trunk directory, the latest dates are
Code:
-rw-r--r--  1 bad0 355  11760 Aug 12 17:50 Makefile
-rw-r--r--  1 bad0 355  91577 Aug 15 22:16 Changes
-rw-r--r--  1 bad0 355  15396 Aug 25 03:10 demo.c
Best I can do for locating a range in which the svn can be. Likewise,
I've no idea about how I might separate gpu usage from cpu usage.
I took an hour's reading from 'ps -ef' and got
Code:
  272 Sep  5 06:51 psef.sun07.txt
  272 Sep  5 07:51 psef.sun08.txt

with

psef.sun07.txt:

bad0     15286  7721 84 Sep04 pts/1    17:26:39 ./msieve -v -np 120001,180000
bad0     15373  7721 39 Sep04 pts/1    08:03:57 ./msieve -l msieveg1.log -v -np 180001,200000
bad0     18347  7721 51 Sep04 pts/1    07:12:16 ./msieve -g 1 -l msieveg1a.log -v -np 100001,120000

and

psef.sun08.txt:

bad0     15286  7721 85 Sep04 pts/1    18:26:21 ./msieve -v -np 120001,180000
bad0     15373  7721 39 Sep04 pts/1    08:24:13 ./msieve -l msieveg1.log -v -np 180001,200000
bad0     18347  7721 51 Sep04 pts/1    07:41:07 ./msieve -g 1 -l msieveg1a.log -v -np 100001,120000
So the first job submitted to '-g 0' is shown in ps to have gotten pretty
much 60 minutes in 60 minutes, walltime. The second job, which also
claims to have been running on '-g 0' appears to have accumulated c.
20 minutes; and the third job, claiming to be running on '-g 1' had
c. 39 minutes. That sounds more like the 2nd and 3rd jobs both ran
on '-g 1'; and perhaps the 'ps' reading is just showing walltime, rather
than gputime (that's a word? for certain the 'ps' isn't showing cputime,
as most of the polyn searching time is supposed to be in the stage 1
that runs on the card).

Hmm. That's not good! The first job hadn't spit out any new polyn
in c. 12hrs, so I decided to take that one off of '-g 0'; hoping to leave
just the 2nd one running there. But now that I'm checking, kill -TERM
took the only job running on '-g 0' off (I wasn't sure, but -TERM still
works on the gpu); and during the past hour the 2nd and 3rd job are
showing 30 minutes each, at least, as reported by ps, which sounds
like confirmation that they're both running at half-time on just one
of the cards. Sigh. I'll try -TERMing the two running; and resubmit
one job each to -g 0 and -g 1. Not sure whether the 2nd logfile report
that the 2nd job was also on -g 0 was false; and omitting -g successfully
put one job on each card, but I'll go with -g and hope to get one job on
each card (at last!).

The machine has 6 cpus and the 2 cards (I lost a vote to buy 4 cards ...),
and there are six bonic jobs running along with the gpu jobs; if that
complicates figuring out the answer to your question. In any case,
here's the logfile from the first job (it's short)
Code:
Sat Sep  4 10:15:28 2010  
Sat Sep  4 10:15:28 2010  
Sat Sep  4 10:15:28 2010  Msieve v. 1.47
Sat Sep  4 10:15:28 2010  random seeds: acd21bbb b1aed647
Sat Sep  4 10:15:28 2010  factoring 14517736555533692118889909159833393968550085834163333
45491555533517024104751904134775278520633340964487093191995272576312480523412906546064107
1418096710862391316053652459514131 (176 digits)
Sat Sep  4 10:15:30 2010  no P-1/P+1/ECM available, skipping
Sat Sep  4 10:15:30 2010  commencing number field sieve (176-digit input)
Sat Sep  4 10:15:30 2010  commencing number field sieve polynomial selection
Sat Sep  4 10:15:30 2010  searching leading coefficients from 120001 to 180000
Sat Sep  4 10:15:30 2010  using GPU 0 (Tesla C2050)
Sun Sep  5 10:13:32 2010  polynomial selection complete
Sun Sep  5 10:13:32 2010  R0: -10379984941552728453972444081504096
Sun Sep  5 10:13:32 2010  R1:  9877593410632453061
Sun Sep  5 10:13:32 2010  A0:  255772912651460129632948684999205042316044455
Sun Sep  5 10:13:32 2010  A1:  1954830018665003941391142926871053678
Sun Sep  5 10:13:32 2010  A2: -471066766530980506862524849539
Sun Sep  5 10:13:32 2010  A3:  1778379708038918136760
Sun Sep  5 10:13:32 2010  A4:  97672503713942
Sun Sep  5 10:13:32 2010  A5:  120480
Sun Sep  5 10:13:32 2010  skew 76104896.21, size 2.678e-17, alpha -6.876, combined = 1.23
3e-13 rroots = 5
Sun Sep  5 10:13:32 2010  elapsed time 23:58:04
which has the current best polyn. The time here seems clearly to be
walltime, rather than either cputime or gputime(?). -Bruce
bdodson is offline   Reply With Quote
Old 2010-09-06, 17:17   #6
bdodson
 
bdodson's Avatar
 
Jun 2005
lehigh.edu

102410 Posts
Default 54.6% of the time spent on the cpu??

Quote:
Originally Posted by bdodson View Post
...
I've no idea about how I might separate gpu usage from cpu usage.
I took an hour's reading from 'ps -ef' and got
Code:
  272 Sep  5 06:51 psef.sun07.txt
  272 Sep  5 07:51 psef.sun08.txt
     ...
... I'll go with -g and hope to get one job on each card (at last!).
...
The machine has 6 cpus and the 2 cards (I lost a vote to buy 4 cards ...),
and there are six bonic jobs running along with the gpu jobs;
...The time here seems clearly to be walltime, rather than either cputime
or gputime(?). -Bruce
After looking at the first of two ps timings, the six boinc jobs may
not have been a good idea. (And perhaps I'd better update to the
most recent client.)
Code:
 200 Sep  6 07:07 psef.mon07.txt
 200 Sep  6 11:09 psef.mon11.txt
has c. 19hrs of walltime (from start, noon yesterday to mon07), but
the ps only shows c. 22:45 elapsed:
Code:
psef.mon07:

 10:56:44 ./msieve -g 0 -l msieveg0b.log -v -np 122401,180000
 11:48:30 ./msieve -g 1 -l msieveg1b.log -v -np 187201,200000
---------
 22:45
Not looking very closely (at 7am, before class), this looked a lot
like both jobs on one card (I probably approximated c. 20 =.c 20,
instead of 19 and 22:45). So I was about to consider adding on
a 3rd job to make sure that something was running on each card;
but since we're thinking about cputime -vs- "time on the gpu",
I tried cutting back to four boinc jobs, leaving two cores idle.
Thinking that maybe one-or-both cards might have been waiting
for an _idle_ cpu?

So three hours later (with two free cores)
Code:
psefmon11:

 13:00:02 ./msieve -g 0 -l msieveg0b.log -v -np 122401,180000
 13:47:44 ./msieve -g 1 -l msieveg1b.log -v -np 187201,200000
----
  26:47
shows 26:47 - 22:45, which is up 4 hrs after 3 hrs; a lot better
than 22:45 after 19 hrs.

Of course, I'd be happiest demonstrating two well-performing cards
with new best polyn; but there's no reason to believe that the two
cards weren't happy running three jobs. Since the reset with g0/g1,
there are just three new polyn in the top10
Code:
 grep norm msieve.dat.p | sort -gk7 | tail 
# norm 2.425973e-17 alpha -7.915881 e 1.132e-13 rroots 5
# norm 2.419345e-17 alpha -9.198820 e 1.141e-13 rroots 3  
# norm 2.410989e-17 alpha -7.817632 e 1.143e-13 rroots 5  *new*
# norm 2.461850e-17 alpha -8.392563 e 1.145e-13 rroots 5  *new*
# norm 2.435641e-17 alpha -6.972853 e 1.169e-13 rroots 3
# norm 2.486423e-17 alpha -7.092835 e 1.173e-13 rroots 3
# norm 2.499984e-17 alpha -8.999410 e 1.177e-13 rroots 3  *new*
# norm 2.496365e-17 alpha -8.494013 e 1.180e-13 rroots 3     [last of 1st 3]
# norm 2.490282e-17 alpha -7.085114 e 1.182e-13 rroots 3
# norm 2.677511e-17 alpha -6.875759 e 1.233e-13 rroots 5
(Actually, the longest running of the three jobs, over 25hrs, had
best polyn from
Code:
save 2.235712e-17 -8.1160 186600946.85 1.092049e-13 rroots 3
which is the current 21st best; six are from the 1st job of 3.)

On the topic of cpu-use by msieve_gpu's polyn search, I'm just
now seeing some from top
Code:
 8881   39  19 1605m 765m  752 R 100.1  3.2  19:14.98 lasievef_1.07_x   
 8926   39  19 1606m 766m  740 R 100.1  3.2  16:01.03 lasievef_1.07_x   
30759   25   0  115m  86m  19m R 100.1  0.4 871:35.74 msieve            
 8970   39  19 1606m 766m  752 R 99.8  3.2  12:37.21 lasievef_1.07_x    
 9054   39  19 1606m 765m  744 R 99.5  3.2   5:33.25 lasievef_1.07_x    
30752  17   0  115m  86m  19m R 44.2  0.4 799:28.80 msieve
in which the boinc jobs are at lowest priority ("19"), while msieve_gpu
is running without nicing ("0"). So that's 100% on one core for pid
30759 and 44.2% on one of the other cores fro pid 30752. Does anyone
believe that it is possible that the timings on these two jobs is showing
ONLY the cputime; so 871 minutes for pid 30759 and 799 minutes for
pid 30752? These are jobs that started 1465 hours ago, and that would
say that notably past half of the time was spent on the cards waiting for
the cpu to report back. -Bruce

PS --- (1) So that would have the mon07 reading saying 22:45 hrs out of
2*19 = 38hrs spent on the cpu-bound stage. Top often shows the
msieve_gpu's not listed ... NO!, it's a rare 10sec interval without one
of them showing, and most often not at 0% like
Code:
 64.2  0.4 803:11.88 msieve             
 51.6  0.4 875:58.57 msieve
64% and 51% of the cpu. Mmmpf, I waited quite a long time (2 min?
more than 12 10sec readings) to get
Code:
  0.3  0.4 803:53.40 msieve             
  0.3  0.4 877:04.59 msieve
both < 1.0% That seems to be lowest (the 0.4 is % of memory use);
almost every 10sec cycle shows either a msieve_gpu running or just
finishing (starting?) a run at < 1%. Maybe I've accidentally answered
jrk's question?

(2) And keeping two idle cores, which raised the % of cputime to
4 hours cputime out of 2-times-3_hrs walltime brings the percentage
in cputime up to 66% (4 of 6), with the lower percentage indicating
the process having both the card and the cpu waiting?
bdodson is offline   Reply With Quote
Old 2010-09-06, 17:41   #7
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

641910 Posts
Default

I usually find things like

Sat Sep 4 09:23:01 2010 time limit set to 37.65 hours
Sat Sep 4 09:23:02 2010 using GPU 0 (GeForce GTX 275)
Mon Sep 6 04:07:58 2010 polynomial selection complete
Mon Sep 6 04:07:58 2010 elapsed time 42:44:58

where the elapsed time is 15% more than the time limit, if the msieve process got a CPU to itself, and if it had to share with sievers I see things like

Tue Aug 31 08:01:45 2010 time limit set to 73.00 hours
Sat Sep 4 01:21:42 2010 elapsed time 89:20:00

So the elapsed time is always a bit more than the time limit, and if the process isn't running on a whole CPU it's quite a lot more.

The timings from 'top' show CPU-time only

Last fiddled with by fivemack on 2010-09-06 at 17:42
fivemack is offline   Reply With Quote
Old 2010-09-06, 20:13   #8
bdodson
 
bdodson's Avatar
 
Jun 2005
lehigh.edu

210 Posts
Default

Quote:
Originally Posted by fivemack View Post
I usually find things like
...
where the elapsed time is 15% more than the time limit, if the msieve process got a CPU to itself, and if it had to share with sievers I see things like
...
So the elapsed time is always a bit more than the time limit, and if the process isn't running on a whole CPU it's quite a lot more.

The timings from 'top' show CPU-time only
Thanks. This is a Westmere hexacore, 6 cores (rather than quadcore);
32-mn, xeon x5650. Seems a shame to keep all six cores idle; two
idle doesn't seem too bad. Maybe I ought to be trying 6-threaded
Msieve/Lanczos. Since I'm using -np, I don't get a "time limit". The
number's a C176; and I'd like to see what fermi (x2) comes up with.
-Bruce
bdodson is offline   Reply With Quote
Old 2010-09-07, 13:45   #9
bdodson
 
bdodson's Avatar
 
Jun 2005
lehigh.edu

20008 Posts
Default

Quote:
Originally Posted by bdodson View Post
Thanks. ... Seems a shame to keep all six cores idle; two
idle doesn't seem too bad. ... Since I'm using -np, I don't get a "time limit".
... The number's a C176; and I'd like to see what fermi (x2) comes up with.
-Bruce
Only one change in the best10, a new 2nd best
Code:
# norm 2.490282e-17 alpha -7.085114 e 1.182e-13 rroots 3
# norm 2.750309e-17 alpha -8.147588 e 1.217e-13 rroots 5  [*new*]
# norm 2.677511e-17 alpha -6.875759 e 1.233e-13 rroots 5
On the "time limit", we were wondering whether there's a 1.3e-13
to be had; and confirm that the 1.2e-13 wasn't unique. I have 12
search ranges like
Code:
searching leading coefficients from 100001 to 120000
coeff 100020-102360 2787800423 3066580465 3066580466 3373238512
  ...
searching leading coefficients from 180001 to 200000
coeff 180060-182400 2955052791 3250558070 3250558071 3575613878
that were completely searched (800 sec/coef deadline), with 3 more
partially searched ranges and 2 more currently running. Not sure how
much of 1 to 200000 is worth including as a test case (perhaps as a
guide to objectives for gnfs181 and/or gnfs187?). For this gnfs176,
looks like the 2 boards are searching the coef from a range of c. 20000/day;
in which case half of 1-to-200000 takes five days.

Timings for the cpu portion of the msieve_gpu searches seem to be all
over. In the most recent 11 hrs, the job on g0 accumulated 6:40 hrs
of cputime (as per ps); while the job on g1 accumulated just 3:10 hrs.
Guess I can pitch the boinc jobs for a day, to see what happens. Equal
access to an idle core isn't the only issue. Over the nearly two days of
the present run, g0 found 12997 poly, and saved 350 of them; while
g1 found 16686 poly, and saved 285. Hard to tell whether this is the
expected variation; or maybe one of the cards is getting better cpu
access. -Bruce
bdodson is offline   Reply With Quote
Old 2010-09-07, 21:30   #10
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3,541 Posts
Default

Stage 2 still runs on the CPU, so a stage 1 hit will cause the GPU to stop running while stage 2 complete. Greg has seen stage 2 jobs that take many hours, so this should account for the difference in GPU time.
jasonp is offline   Reply With Quote
Old 2010-09-08, 10:55   #11
bdodson
 
bdodson's Avatar
 
Jun 2005
lehigh.edu

20008 Posts
Default

Quote:
Originally Posted by jasonp View Post
Stage 2 still runs on the CPU, so a stage 1 hit will cause the GPU to stop running while stage 2 complete. Greg has seen stage 2 jobs that take many hours, so this should account for the difference in GPU time.
This is quite timely; as I think that I can match that. First, an update,
only one new top10 entry; in the 8th spot:
Code:
# norm 2.482413e-17 alpha -8.632319 e 1.163e-13 rroots 5   [*new*]
# norm 2.435641e-17 alpha -6.972853 e 1.169e-13 rroots 3
# norm 2.486423e-17 alpha -7.092835 e 1.173e-13 rroots 3
# norm 2.499984e-17 alpha -8.999410 e 1.177e-13 rroots 3
# norm 2.496365e-17 alpha -8.494013 e 1.180e-13 rroots 3
# norm 2.490282e-17 alpha -7.085114 e 1.182e-13 rroots 3
# norm 2.750309e-17 alpha -8.147588 e 1.217e-13 rroots 5
# norm 2.677511e-17 alpha -6.875759 e 1.233e-13 rroots 5
Next, a clairification on the stages; does Stage 1 find the "leading"
coeff c5, on the algebraic side; while Stage 2 finds the "coef" that
gives the leading term Y1 on the rational side?

The search range 180001-to-200000 finished before noon yesterday;
and I started on 1-to-60000. (Previous experience suggesting that
small c5's give some plausible candidates that are otherwise missed.)
So, if I'm reading correctly, fermi-g1 promptly locates c5 = 240 at like
2 minutes before noon; and ever since one of the cores has been
running through Y1's for that c5, at 800sec max/coeff. That would be
16:45 hrs straight cputime. Today's new 8th place candidate is one
of those:
Code:
poly 33 p 1818638683 q 2106911057 coeff 3831709949900617931
poly 30 p 1819259809 q 2106812507 coeff 3832839319083631163
poly  3 p 1819590263 q 2106375457 coeff 3832740271779375191  <--***
---
save 2.482413e-17 -8.6323 1287394159.87 1.163205e-13 rroots 5
---
# norm 2.482413e-17 alpha -8.632319 e 1.163e-13 rroots 5
skew: 1287394159.87
c0: -1042633613686720734290044400807825528939776194759
c1:  98429999145482704955198799209301519657
c2:  17523303038112215463835268730871
c3:  11356224849022898693143
c4: -11842426602752
c5:  240   <---*
Y0: -36002941945870027187523624257231332
Y1:  3832740271779375191  <--- ***
-Bruce

Off Topic PS: In other local news, the matrix for 2p1043, at 17M^2
is due on Saturday; the matrix 7M707 just started at 12M^2, and
needs another week-or-so; and sieving for 2L2370 is already halfway
done.
bdodson is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Tweaking polynomial search for C197 fivemack Msieve 38 2011-07-08 08:12
109!+1 polynomial search fivemack Factoring 122 2009-02-24 07:03
5^421-1 polynomial search fivemack Factoring 61 2008-07-21 11:16
6^383+1 by GNFS (polynomial search; now complete) fivemack Factoring 20 2007-12-26 10:36
GNFS polynomial search tools JHansen Factoring 0 2004-11-07 12:15

All times are UTC. The time now is 00:49.


Sat Jul 17 00:49:05 UTC 2021 up 49 days, 22:36, 1 user, load averages: 1.58, 1.51, 1.40

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.