mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2013-07-01, 03:48   #23
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

486110 Posts
Default

If you didn't mess with the stage-1 norm, it picks a random subspace to search, so that multiple people can search the same coeff without (much) overlap. For this number, the default search region was broken into 7 pieces. If you watch the output fly by, on the line where a new coeff starts you'll see something like "searching #4 of 7 random sets".

I learned toward the end of this search that I was restricting stage1 norm so much that I was forcing it to search just a small part of the first of those subspaces. So, I left stage1 at default for my C163 search.
VBCurtis is offline   Reply With Quote
Old 2013-07-01, 13:02   #24
wombatman
I moo ablest echo power!
 
wombatman's Avatar
 
May 2013

33518 Posts
Default

Ah, ok. I forgot about the random seeding, and I had my Stage 1 restricted slightly. Regardless, an excellent find!
wombatman is offline   Reply With Quote
Old 2013-07-01, 21:32   #25
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

4,861 Posts
Default

C155 conclusions: I run np1, nps, npr all separately.
Tight stage-1 norms do not produce a higher quantity of quality stage 2 hits. I used a few choices from 1.5e22 to 8e22 for stage 1, finding that the rate of GPU hits varies quite a lot, but the fraction of these hits that produce a nps size of 3.5e20 or lower is no better- in fact, it was slightly worse for very tight stage 1 norms.

With stage1 set to 1.5e22, increasing the number of threads on my 460M resulted in faster data production: 2 threads was 30% faster than 1, 3 was 10% faster than 2, 4 5% better than 3, and 5 and 6 matched 4 threads.

With stage1 set to 1.8e22, stage 1 hits were produced 45% faster than 1.5e22. Again, -t 2 was 30% faster, while 3 and 4 threads matched 2 threads in hit-rate. I did not test threads for default stage1-norm; I'll do that on the C157s posted in the other thread.

Summary: Use at least two threads for searches below 160 digits, and do not set a tight stage 1 bound.

Last fiddled with by VBCurtis on 2013-07-01 at 21:35 Reason: corrected percentages
VBCurtis is offline   Reply With Quote
Old 2013-07-01, 21:36   #26
wombatman
I moo ablest echo power!
 
wombatman's Avatar
 
May 2013

29·61 Posts
Default

Interesting. Thanks for the information!
wombatman is offline   Reply With Quote
Old 2013-07-03, 02:23   #27
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

4,861 Posts
Default

My best C163 find:
Code:
# norm 1.006380e-015 alpha -7.184964 e 1.015e-012 rroots 5
skew: 1125907.72
c0: 13552563320177965549083201722855195952
c1: 56748107658641416546177074853658
c2: -129936131458372038527876113
c3: -234012107772195153040
c4: 121278479274204
c5: 26142480
Y0: -8362756448659493213350626044999
Y1: 126296599858935253
This happened to come from the 1GB file that had -np1 done separately. There was also a 9.85e-13 from the same flare/coeff.
I am done with these numbers, but willing to help with any C170 or larger searches.
VBCurtis is offline   Reply With Quote
Old 2013-07-03, 09:01   #28
schickel
 
schickel's Avatar
 
"Frank <^>"
Dec 2004
CDP Janesville

1000010010102 Posts
Default

Thank you all for the time spent on these. Unfortunately, it will probably take most of this summer to churn my way through the current queue, depending on the length of the current heat wave.

I had to throttle my hex core down to 2 or 3 active cores (depending on the job) to keep the system at a tolerable temp during the day.
schickel is offline   Reply With Quote
Old 2013-07-03, 11:32   #29
lorgix
 
lorgix's Avatar
 
Sep 2010
Scandinavia

3·5·41 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
C155 conclusions: I run np1, nps, npr all separately.
Tight stage-1 norms do not produce a higher quantity of quality stage 2 hits. I used a few choices from 1.5e22 to 8e22 for stage 1, finding that the rate of GPU hits varies quite a lot, but the fraction of these hits that produce a nps size of 3.5e20 or lower is no better- in fact, it was slightly worse for very tight stage 1 norms.

With stage1 set to 1.5e22, increasing the number of threads on my 460M resulted in faster data production: 2 threads was 30% faster than 1, 3 was 10% faster than 2, 4 5% better than 3, and 5 and 6 matched 4 threads.

With stage1 set to 1.8e22, stage 1 hits were produced 45% faster than 1.5e22. Again, -t 2 was 30% faster, while 3 and 4 threads matched 2 threads in hit-rate. I did not test threads for default stage1-norm; I'll do that on the C157s posted in the other thread.

Summary: Use at least two threads for searches below 160 digits, and do not set a tight stage 1 bound.
Could it be that given a loose enough stg1 norm/hard enough job, there is a benefit to running as many threads as there are SMs (streaming multiprocessors) in the GPU? The 460M has four. By that reasoning; a GTX460 could use seven, and a GTX560 could use eight.
I would probably subtract one from that number if I had a monitor connected to it at the same time. It worked something like this when I tried trial factoring mersennes on my GPU.
lorgix is offline   Reply With Quote
Old 2013-07-03, 12:10   #30
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

DD516 Posts
Default

This problem is different from Mersenne trial factoring; we use a sorting library that automatically chooses how much work to give the card, and the amount chosen will nearly saturate the card every time because each block of work is large and a kernel launch has hundreds or thousands of blocks.

I think it's just a coincidence that best thread number = number of SMs here.

Last fiddled with by jasonp on 2013-07-03 at 12:10
jasonp is offline   Reply With Quote
Old 2013-07-03, 12:23   #31
lorgix
 
lorgix's Avatar
 
Sep 2010
Scandinavia

26716 Posts
Default

Quote:
Originally Posted by jasonp View Post
This problem is different from Mersenne trial factoring; we use a sorting library that automatically chooses how much work to give the card, and the amount chosen will nearly saturate the card every time because each block of work is large and a kernel launch has hundreds or thousands of blocks.

I think it's just a coincidence that best thread number = number of SMs here.
I see. Thanks for the info.

The law of small numbers strikes again.
lorgix is offline   Reply With Quote
Old 2013-10-05, 03:07   #32
schickel
 
schickel's Avatar
 
"Frank <^>"
Dec 2004
CDP Janesville

2×1,061 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
My best C163 find:
Code:
# norm 1.006380e-015 alpha -7.184964 e 1.015e-012 rroots 5
skew: 1125907.72
c0: 13552563320177965549083201722855195952
c1: 56748107658641416546177074853658
c2: -129936131458372038527876113
c3: -234012107772195153040
c4: 121278479274204
c5: 26142480
Y0: -8362756448659493213350626044999
Y1: 126296599858935253
This happened to come from the 1GB file that had -np1 done separately. There was also a 9.85e-13 from the same flare/coeff.
I am done with these numbers, but willing to help with any C170 or larger searches.
I've started a job using this poly. (Best I could find with some days of local search was ~9.6e-13, so thank you for this find!) ETA is around the first week in November.....

Last fiddled with by schickel on 2013-10-05 at 03:08 Reason: adding note
schickel is offline   Reply With Quote
Old 2013-11-05, 19:11   #33
schickel
 
schickel's Avatar
 
"Frank <^>"
Dec 2004
CDP Janesville

84A16 Posts
Default

Quote:
Originally Posted by schickel View Post
I've started a job using this poly. (Best I could find with some days of local search was ~9.6e-13, so thank you for this find!) ETA is around the first week in November.....
Code:
Tue Nov 05 11:06:15 2013  matrix is 7330605 x 7330831 (2155.5 MB) with weight 554942282 (75.70/col)
Tue Nov 05 11:06:15 2013  sparse part has weight 491749598 (67.08/col)
Tue Nov 05 11:06:15 2013  using block size 65536 for processor cache size 6144 kB
Tue Nov 05 11:06:42 2013  commencing Lanczos iteration (4 threads)
Tue Nov 05 11:06:42 2013  memory use: 1909.6 MB
Tue Nov 05 11:07:40 2013  linear algebra at 0.0%, ETA 75h 3m
Tue Nov 05 11:07:59 2013  checkpointing every 100000 dimensions
schickel is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Poly Search vs Sieving times EdH Factoring 10 2013-10-14 20:00
Resume msieve poly search job? Andi47 Msieve 1 2011-03-28 04:30
gpu poly search error bdodson Msieve 10 2010-11-09 19:46
Poly search for c157 from 4788:2422 henryzz Aliquot Sequences 59 2009-07-04 06:27
Poly search for c137 from 4788:2408 axn Aliquot Sequences 15 2009-05-28 16:50

All times are UTC. The time now is 00:50.


Sat Jul 17 00:50:54 UTC 2021 up 49 days, 22:38, 1 user, load averages: 1.86, 1.59, 1.44

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.