mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Msieve (https://www.mersenneforum.org/forumdisplay.php?f=83)
-   -   c176 polynomial search (https://www.mersenneforum.org/showthread.php?t=13905)

bdodson 2010-09-04 15:43

c176 polynomial search
 
[QUOTE=jasonp;221521]It looks like it's running okay. Do I read the output correctly that you are using a new Tesla card? If so, it's nice to know that the code runs on the latest and greatest cards.
...[/QUOTE]

A preliminary report, I have two Msieve v. 1.47 jobs running, both of which say
[code]
> using GPU 0 (Tesla C2050)
[/code]
not sure whether I set two jobs on the same board or not (there are two).
An hour or so into a search on a c176 I have
[code]
grep c5 msieve.dat.p | uniq -c
1 c5: 120120
23 c5: 180060
1 c5: 120120
3 c5: 180060 [/code]
Two polyn from one range, and a flair (minor, at least) from the other.
With
[code]
grep e-13 msieve.dat.p | wc -l
18

grep e-14 msieve.dat.p | wc -l
10

from norm 1.900665e-17 alpha -7.051647 e 9.850e-14 rroots 5

to # norm 2.235712e-17 alpha -8.116017 e 1.092e-13 rroots 3
[/code]

I'm slightly surprised to see 1, 3 and 5 real roots; with different
values among the best so far.

Thanks to Greg for walking me through from loading drivers on
the new linux machine, to fiddling with the Makefile, to getting
the 10 ptx's in the correct directory; and to Serge for locating
the c176 on the 3- extension. -Bruce

jasonp 2010-09-04 18:13

Paul and I were also a little surprised that the best polynomials did not automatically have many real roots; the roots modulo small primes can make up for a lot of deficiencies in the polynomial size.

To get a list of the best polynomials given the temporary output from a poly selection run, I use
[code]
grep norm msieve.dat.p | sort -gk7 | tail -20
[/code]

bdodson 2010-09-04 21:28

[QUOTE=jasonp;228413]Paul and I were also a little surprised that the best polynomials did not automatically have many real roots; the roots modulo small primes can make up for a lot of deficiencies in the polynomial size. ... I use
[code]
grep norm msieve.dat.p | sort -gk7 | tail -20
[/code][/QUOTE]

That would be PaulZ? Or xilman/PaulL? Yes, the small primes; looks right.
Turns out that both of my first two jobs are running on the first board.
I now have a 3rd job running, using '-g 1' this time, and got
[code]
using GPU 1 (Tesla C2050) [/code]
Not much change in the largest so far (with two jobs on one of the boards ...)
[code]
grep e-13 msieve.dat.p | grep -v ' e 1.0'
# norm 2.419345e-17 alpha -9.198820 e 1.141e-13 rroots 3
[/code]
(since, by inspection of the e-13's, most were 'e 1.0'-s). The above
will most like be useful, now that I've got both boards running. Serge
also has a script; very subtle, as usual. Thanks for the gpu code! -Bruce

jrk 2010-09-05 03:17

Bruce, what does your CPU usage look like on those msieve jobs? And which SVN version did you pull?

I've been wondering if the CPU code might start becoming a bottleneck on these really fast GPUs...

bdodson 2010-09-05 16:14

[QUOTE=jrk;228477]Bruce, what does your CPU usage look like on those msieve jobs? And which SVN version did you pull?

I've been wondering if the CPU code might start becoming a bottleneck on these really fast GPUs...[/QUOTE]

I picked up
[code]
http://msieve.svn.sourceforge.net/viewvc/msieve/trunk.tar.gz?view=tar
[/code]
(an address from one of Greg's posts re: people not on svn), and am
not sure where to find the svn; of the files replying to -ltrd in the
trunk directory, the latest dates are
[code]
-rw-r--r-- 1 bad0 355 11760 Aug 12 17:50 Makefile
-rw-r--r-- 1 bad0 355 91577 Aug 15 22:16 Changes
-rw-r--r-- 1 bad0 355 15396 Aug 25 03:10 demo.c [/code]

Best I can do for locating a range in which the svn can be. Likewise,
I've no idea about how I might separate gpu usage from cpu usage.
I took an hour's reading from 'ps -ef' and got
[code]
272 Sep 5 06:51 psef.sun07.txt
272 Sep 5 07:51 psef.sun08.txt

with

psef.sun07.txt:

bad0 15286 7721 84 Sep04 pts/1 17:26:39 ./msieve -v -np 120001,180000
bad0 15373 7721 39 Sep04 pts/1 08:03:57 ./msieve -l msieveg1.log -v -np 180001,200000
bad0 18347 7721 51 Sep04 pts/1 07:12:16 ./msieve -g 1 -l msieveg1a.log -v -np 100001,120000

and

psef.sun08.txt:

bad0 15286 7721 85 Sep04 pts/1 18:26:21 ./msieve -v -np 120001,180000
bad0 15373 7721 39 Sep04 pts/1 08:24:13 ./msieve -l msieveg1.log -v -np 180001,200000
bad0 18347 7721 51 Sep04 pts/1 07:41:07 ./msieve -g 1 -l msieveg1a.log -v -np 100001,120000
[/code]
So the first job submitted to '-g 0' is shown in ps to have gotten pretty
much 60 minutes in 60 minutes, walltime. The second job, which also
claims to have been running on '-g 0' appears to have accumulated c.
20 minutes; and the third job, claiming to be running on '-g 1' had
c. 39 minutes. That sounds more like the 2nd and 3rd jobs both ran
on '-g 1'; and perhaps the 'ps' reading is just showing walltime, rather
than gputime (that's a word? for certain the 'ps' isn't showing cputime,
as most of the polyn searching time is supposed to be in the stage 1
that runs on the card).

Hmm. That's not good! The first job hadn't spit out any new polyn
in c. 12hrs, so I decided to take that one off of '-g 0'; hoping to leave
just the 2nd one running there. But now that I'm checking, kill -TERM
took the only job running on '-g 0' off (I wasn't sure, but -TERM still
works on the gpu); and during the past hour the 2nd and 3rd job are
showing 30 minutes each, at least, as reported by ps, which sounds
like confirmation that they're both running at half-time on just one
of the cards. Sigh. I'll try -TERMing the two running; and resubmit
one job each to -g 0 and -g 1. Not sure whether the 2nd logfile report
that the 2nd job was also on -g 0 was false; and omitting -g successfully
put one job on each card, but I'll go with -g and hope to get one job on
each card (at last!).

The machine has 6 cpus and the 2 cards (I lost a vote to buy 4 cards ...),
and there are six bonic jobs running along with the gpu jobs; if that
complicates figuring out the answer to your question. In any case,
here's the logfile from the first job (it's short)
[code]
Sat Sep 4 10:15:28 2010
Sat Sep 4 10:15:28 2010
Sat Sep 4 10:15:28 2010 Msieve v. 1.47
Sat Sep 4 10:15:28 2010 random seeds: acd21bbb b1aed647
Sat Sep 4 10:15:28 2010 factoring 14517736555533692118889909159833393968550085834163333
45491555533517024104751904134775278520633340964487093191995272576312480523412906546064107
1418096710862391316053652459514131 (176 digits)
Sat Sep 4 10:15:30 2010 no P-1/P+1/ECM available, skipping
Sat Sep 4 10:15:30 2010 commencing number field sieve (176-digit input)
Sat Sep 4 10:15:30 2010 commencing number field sieve polynomial selection
Sat Sep 4 10:15:30 2010 searching leading coefficients from 120001 to 180000
Sat Sep 4 10:15:30 2010 using GPU 0 (Tesla C2050)
Sun Sep 5 10:13:32 2010 polynomial selection complete
Sun Sep 5 10:13:32 2010 R0: -10379984941552728453972444081504096
Sun Sep 5 10:13:32 2010 R1: 9877593410632453061
Sun Sep 5 10:13:32 2010 A0: 255772912651460129632948684999205042316044455
Sun Sep 5 10:13:32 2010 A1: 1954830018665003941391142926871053678
Sun Sep 5 10:13:32 2010 A2: -471066766530980506862524849539
Sun Sep 5 10:13:32 2010 A3: 1778379708038918136760
Sun Sep 5 10:13:32 2010 A4: 97672503713942
Sun Sep 5 10:13:32 2010 A5: 120480
Sun Sep 5 10:13:32 2010 skew 76104896.21, size 2.678e-17, alpha -6.876, combined = 1.23
3e-13 rroots = 5
Sun Sep 5 10:13:32 2010 elapsed time 23:58:04
[/code]
which has the current best polyn. The time here seems clearly to be
walltime, rather than either cputime or gputime(?). -Bruce

bdodson 2010-09-06 17:17

54.6% of the time spent on the cpu??
 
[QUOTE=bdodson;228541] ...
I've no idea about how I might separate gpu usage from cpu usage.
I took an hour's reading from 'ps -ef' and got
[code]
272 Sep 5 06:51 psef.sun07.txt
272 Sep 5 07:51 psef.sun08.txt
...
[/code]
... I'll go with -g and hope to get one job on each card (at last!).
...
The machine has 6 cpus and the 2 cards (I lost a vote to buy 4 cards ...),
and there are six bonic jobs running along with the gpu jobs;
...The time here seems clearly to be walltime, rather than either cputime
or gputime(?). -Bruce[/QUOTE]
After looking at the first of two ps timings, the six boinc jobs may
not have been a good idea. (And perhaps I'd better update to the
most recent client.)
[code]
200 Sep 6 07:07 psef.mon07.txt
200 Sep 6 11:09 psef.mon11.txt [/code]
has c. 19hrs of walltime (from start, noon yesterday to mon07), but
the ps only shows c. 22:45 elapsed:
[code]
psef.mon07:

10:56:44 ./msieve -g 0 -l msieveg0b.log -v -np 122401,180000
11:48:30 ./msieve -g 1 -l msieveg1b.log -v -np 187201,200000
---------
22:45 [/code]
Not looking very closely (at 7am, before class), this looked a lot
like both jobs on one card (I probably approximated c. 20 =.c 20,
instead of 19 and 22:45). So I was about to consider adding on
a 3rd job to make sure that something was running on each card;
but since we're thinking about cputime -vs- "time on the gpu",
I tried cutting back to four boinc jobs, leaving two cores idle.
Thinking that maybe one-or-both cards might have been waiting
for an _idle_ cpu?

So three hours later (with two free cores)
[code]
psefmon11:

13:00:02 ./msieve -g 0 -l msieveg0b.log -v -np 122401,180000
13:47:44 ./msieve -g 1 -l msieveg1b.log -v -np 187201,200000
----
26:47
[/code]
shows 26:47 - 22:45, which is up 4 hrs after 3 hrs; a lot better
than 22:45 after 19 hrs.

Of course, I'd be happiest demonstrating two well-performing cards
with new best polyn; but there's no reason to believe that the two
cards weren't happy running three jobs. Since the reset with g0/g1,
there are just three new polyn in the top10
[code]
grep norm msieve.dat.p | sort -gk7 | tail
# norm 2.425973e-17 alpha -7.915881 e 1.132e-13 rroots 5
# norm 2.419345e-17 alpha -9.198820 e 1.141e-13 rroots 3
# norm 2.410989e-17 alpha -7.817632 e 1.143e-13 rroots 5 *new*
# norm 2.461850e-17 alpha -8.392563 e 1.145e-13 rroots 5 *new*
# norm 2.435641e-17 alpha -6.972853 e 1.169e-13 rroots 3
# norm 2.486423e-17 alpha -7.092835 e 1.173e-13 rroots 3
# norm 2.499984e-17 alpha -8.999410 e 1.177e-13 rroots 3 *new*
# norm 2.496365e-17 alpha -8.494013 e 1.180e-13 rroots 3 [last of 1st 3]
# norm 2.490282e-17 alpha -7.085114 e 1.182e-13 rroots 3
# norm 2.677511e-17 alpha -6.875759 e 1.233e-13 rroots 5
[/code]
(Actually, the longest running of the three jobs, over 25hrs, had
best polyn from
[code]
save 2.235712e-17 -8.1160 186600946.85 1.092049e-13 rroots 3 [/code]
which is the current 21st best; six are from the 1st job of 3.)

On the topic of cpu-use by msieve_gpu's polyn search, I'm just
now seeing some from top
[code]
8881 39 19 1605m 765m 752 R 100.1 3.2 19:14.98 lasievef_1.07_x
8926 39 19 1606m 766m 740 R 100.1 3.2 16:01.03 lasievef_1.07_x
30759 25 0 115m 86m 19m R 100.1 0.4 871:35.74 msieve
8970 39 19 1606m 766m 752 R 99.8 3.2 12:37.21 lasievef_1.07_x
9054 39 19 1606m 765m 744 R 99.5 3.2 5:33.25 lasievef_1.07_x
30752 17 0 115m 86m 19m R 44.2 0.4 799:28.80 msieve
[/code]
in which the boinc jobs are at lowest priority ("19"), while msieve_gpu
is running without nicing ("0"). So that's 100% on one core for pid
30759 and 44.2% on one of the other cores fro pid 30752. Does anyone
believe that it is possible that the timings on these two jobs is showing
ONLY the cputime; so 871 minutes for pid 30759 and 799 minutes for
pid 30752? These are jobs that started 1465 hours ago, and that would
say that notably past half of the time was spent on the cards waiting for
the cpu to report back. -Bruce

PS --- (1) So that would have the mon07 reading saying 22:45 hrs out of
2*19 = 38hrs spent on the cpu-bound stage. Top often shows the
msieve_gpu's not listed ... NO!, it's a rare 10sec interval without one
of them showing, and most often not at 0% like
[code]
64.2 0.4 803:11.88 msieve
51.6 0.4 875:58.57 msieve [/code]
64% and 51% of the cpu. Mmmpf, I waited quite a long time (2 min?
more than 12 10sec readings) to get
[code]
0.3 0.4 803:53.40 msieve
0.3 0.4 877:04.59 msieve [/code]
both < 1.0% That seems to be lowest (the 0.4 is % of memory use);
almost every 10sec cycle shows either a msieve_gpu running or just
finishing (starting?) a run at < 1%. Maybe I've accidentally answered
jrk's question?

(2) And keeping two idle cores, which raised the % of cputime to
4 hours cputime out of 2-times-3_hrs walltime brings the percentage
in cputime up to 66% (4 of 6), with the lower percentage indicating
the process having both the card and the cpu waiting?

fivemack 2010-09-06 17:41

I usually find things like

Sat Sep 4 09:23:01 2010 time limit set to 37.65 hours
Sat Sep 4 09:23:02 2010 using GPU 0 (GeForce GTX 275)
Mon Sep 6 04:07:58 2010 polynomial selection complete
Mon Sep 6 04:07:58 2010 elapsed time 42:44:58

where the elapsed time is 15% more than the time limit, if the msieve process got a CPU to itself, and if it had to share with sievers I see things like

Tue Aug 31 08:01:45 2010 time limit set to 73.00 hours
Sat Sep 4 01:21:42 2010 elapsed time 89:20:00

So the elapsed time is always a bit more than the time limit, and if the process isn't running on a whole CPU it's quite a lot more.

The timings from 'top' show CPU-time only

bdodson 2010-09-06 20:13

[QUOTE=fivemack;228693]I usually find things like
...
where the elapsed time is 15% more than the time limit, if the msieve process got a CPU to itself, and if it had to share with sievers I see things like
...
So the elapsed time is always a bit more than the time limit, and if the process isn't running on a whole CPU it's quite a lot more.

The timings from 'top' show CPU-time only[/QUOTE]

Thanks. This is a Westmere hexacore, 6 cores (rather than quadcore);
32-mn, xeon x5650. Seems a shame to keep all six cores idle; two
idle doesn't seem too bad. Maybe I ought to be trying 6-threaded
Msieve/Lanczos. Since I'm using -np, I don't get a "time limit". The
number's a C176; and I'd like to see what fermi (x2) comes up with.
-Bruce

bdodson 2010-09-07 13:45

[QUOTE=bdodson;228710]Thanks. ... Seems a shame to keep all six cores idle; two
idle doesn't seem too bad. ... Since I'm using -np, I don't get a "time limit".
... The number's a C176; and I'd like to see what fermi (x2) comes up with.
-Bruce[/QUOTE]

Only one change in the best10, a new 2nd best
[code]
# norm 2.490282e-17 alpha -7.085114 e 1.182e-13 rroots 3
# norm 2.750309e-17 alpha -8.147588 e 1.217e-13 rroots 5 [*new*]
# norm 2.677511e-17 alpha -6.875759 e 1.233e-13 rroots 5
[/code]

On the "time limit", we were wondering whether there's a 1.3e-13
to be had; and confirm that the 1.2e-13 wasn't unique. I have 12
search ranges like
[code]
searching leading coefficients from 100001 to 120000
coeff 100020-102360 2787800423 3066580465 3066580466 3373238512
...
searching leading coefficients from 180001 to 200000
coeff 180060-182400 2955052791 3250558070 3250558071 3575613878
[/code]
that were completely searched (800 sec/coef deadline), with 3 more
partially searched ranges and 2 more currently running. Not sure how
much of 1 to 200000 is worth including as a test case (perhaps as a
guide to objectives for gnfs181 and/or gnfs187?). For this gnfs176,
looks like the 2 boards are searching the coef from a range of c. 20000/day;
in which case half of 1-to-200000 takes five days.

Timings for the cpu portion of the msieve_gpu searches seem to be all
over. In the most recent 11 hrs, the job on g0 accumulated 6:40 hrs
of cputime (as per ps); while the job on g1 accumulated just 3:10 hrs.
Guess I can pitch the boinc jobs for a day, to see what happens. Equal
access to an idle core isn't the only issue. Over the nearly two days of
the present run, g0 found 12997 poly, and saved 350 of them; while
g1 found 16686 poly, and saved 285. Hard to tell whether this is the
expected variation; or maybe one of the cards is getting better cpu
access. -Bruce

jasonp 2010-09-07 21:30

Stage 2 still runs on the CPU, so a stage 1 hit will cause the GPU to stop running while stage 2 complete. Greg has seen stage 2 jobs that take many hours, so this should account for the difference in GPU time.

bdodson 2010-09-08 10:55

[QUOTE=jasonp;228931]Stage 2 still runs on the CPU, so a stage 1 hit will cause the GPU to stop running while stage 2 complete. Greg has seen stage 2 jobs that take many hours, so this should account for the difference in GPU time.[/QUOTE]
This is quite timely; as I think that I can match that. First, an update,
only one new top10 entry; in the 8th spot:
[code]
# norm 2.482413e-17 alpha -8.632319 e 1.163e-13 rroots 5 [*new*]
# norm 2.435641e-17 alpha -6.972853 e 1.169e-13 rroots 3
# norm 2.486423e-17 alpha -7.092835 e 1.173e-13 rroots 3
# norm 2.499984e-17 alpha -8.999410 e 1.177e-13 rroots 3
# norm 2.496365e-17 alpha -8.494013 e 1.180e-13 rroots 3
# norm 2.490282e-17 alpha -7.085114 e 1.182e-13 rroots 3
# norm 2.750309e-17 alpha -8.147588 e 1.217e-13 rroots 5
# norm 2.677511e-17 alpha -6.875759 e 1.233e-13 rroots 5 [/code]
Next, a clairification on the stages; does Stage 1 find the "leading"
coeff c5, on the algebraic side; while Stage 2 finds the "coef" that
gives the leading term Y1 on the rational side?

The search range 180001-to-200000 finished before noon yesterday;
and I started on 1-to-60000. (Previous experience suggesting that
small c5's give some plausible candidates that are otherwise missed.)
So, if I'm reading correctly, fermi-g1 promptly locates c5 = 240 at like
2 minutes before noon; and ever since one of the cores has been
running through Y1's for that c5, at 800sec max/coeff. That would be
16:45 hrs straight cputime. Today's new 8th place candidate is one
of those:
[code]
poly 33 p 1818638683 q 2106911057 coeff 3831709949900617931
poly 30 p 1819259809 q 2106812507 coeff 3832839319083631163
poly 3 p 1819590263 q 2106375457 coeff 3832740271779375191 <--***
---
save 2.482413e-17 -8.6323 1287394159.87 1.163205e-13 rroots 5
---
# norm 2.482413e-17 alpha -8.632319 e 1.163e-13 rroots 5
skew: 1287394159.87
c0: -1042633613686720734290044400807825528939776194759
c1: 98429999145482704955198799209301519657
c2: 17523303038112215463835268730871
c3: 11356224849022898693143
c4: -11842426602752
c5: 240 <---*
Y0: -36002941945870027187523624257231332
Y1: 3832740271779375191 <--- *** [/code]

-Bruce

Off Topic PS: In other local news, the matrix for 2p1043, at 17M^2
is due on Saturday; the matrix 7M707 just started at 12M^2, and
needs another week-or-so; and sieving for 2L2370 is already halfway
done.


All times are UTC. The time now is 04:50.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.