mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > Twin Prime Search

Reply
 
Thread Tools
Old 2010-09-12, 15:50   #23
Ken_g6
 
Ken_g6's Avatar
 
Jan 2005
Caught in a sieve

5·79 Posts
Default

Looks to be CPU bound. Can you try:

1. Create a file named tpsieve.txt with only the following line:
Code:
blocksize=1M
2. If that doesn't help enough, try doubling it up to 8M.
3. Try running two instances at once (in different directories).

Oh, and when one or more of those works, you might even try increasing the threads per multiprocessor again!
Ken_g6 is offline   Reply With Quote
Old 2010-09-14, 18:29   #24
amphoria
 
amphoria's Avatar
 
"Dave"
Sep 2005
UK

AD816 Posts
Default

Increasing blocksize had no significant effect. I then tried running 2 instances on 2 cores. This gave me a combined total of 451M p/sec using 1.82 CPU cores. I didn't experiment with increasing threads per multiprocessor as this would already be a combined total of 49,152.
amphoria is offline   Reply With Quote
Old 2010-09-17, 03:15   #25
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA

2·47·67 Posts
Default

I did some benchmarks on a GTX 460 with the 710T-715T test range and various -m values:
Code:
24064: 199M p/sec., .80 CPU usage
24448: 201M p/sec., .81 CPU usage
24576: 201M p/sec., .81 CPU usage
24960: 202M p/sec., .82 CPU usage
25344: 203M p/sec., .82 CPU usage
25728: 205M p/sec., .82 CPU usage
26368: 207M p/sec., .83 CPU usage
27648: 211M p/sec., .85 CPU usage
30208: 221M p/sec., .88 CPU usage
32768: 232M p/sec., .92 CPU usage
35328: 241M p/sec., .95 CPU usage
37888: 242M p/sec., .96 CPU usage
38016: 242M p/sec., .96 CPU usage
38144: 242M p/sec., .96 CPU usage
38272: 243M p/sec., .96 CPU usage
38400: 243M p/sec., .96 CPU usage
38528: 0 p/sec., 1.00 CPU usage (?)
39168: 0 p/sec., 1.00 CPU usage (?)
40448: 0 p/sec., 1.00 CPU usage (?)
Observations:

-The "sweet spot" seems to be at -m 38400 on this GPU. Would it be correct to assume that the same will hold true for higher p-values? Also, should -m 38400 be optimal for ppsieve as well, or will that be totally different?

-Any -m value 38528 or greater resulted in 0 p/sec. and 1.00 CPU usage; while it outputted its progress every minute as usual, it seemed effectively frozen since Ctrl-C didn't stop it (at least, it didn't respond to Ctrl-C within 10-15 seconds). I ended up having to use a -SIGKILL to stop tpsieve.

-A diff of tpfactors.txt vs. the relevant portion of my original factors for this range shows no discrepancies.

Last fiddled with by mdettweiler on 2010-09-17 at 03:17
mdettweiler is offline   Reply With Quote
Old 2010-09-17, 04:18   #26
Ken_g6
 
Ken_g6's Avatar
 
Jan 2005
Caught in a sieve

5·79 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
Would it be correct to assume that the same will hold true for higher p-values? Also, should -m 38400 be optimal for ppsieve as well, or will that be totally different?
Probably, and definitely not. Actually, the necessity of this -m is more related to this particular K and N range (N being sort of too small and K sort of too big). The best -m on the PrimeGrid PPSE range seems to be 2048 for this card. I wonder in this case whether the large -m is more related to being CPU-bound by the small prime sieve? By the way, I have no idea what would cause that freezing above -m 38400; but I never considered a -m that high when I added the option.

Other ideas to try:
- Increase blocksize in the config file, as above. (or -B)
- Try setting a qmax. (-Q) Perhaps 10e6. That sends a few composites to the GPU, but might speed things up overall?
Ken_g6 is offline   Reply With Quote
Old 2010-09-17, 04:36   #27
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA

2×47×67 Posts
Default

Quote:
Originally Posted by Ken_g6 View Post
Probably, and definitely not. Actually, the necessity of this -m is more related to this particular K and N range (N being sort of too small and K sort of too big). The best -m on the PrimeGrid PPSE range seems to be 2048 for this card. I wonder in this case whether the large -m is more related to being CPU-bound by the small prime sieve? By the way, I have no idea what would cause that freezing above -m 38400; but I never considered a -m that high when I added the option.
Interesting. I'll keep that in mind when I try out ppsieve on various NPLB and CRUS sieves (which are all over the spectrum in terms of k/n ratio).
Quote:
Other ideas to try:
- Increase blocksize in the config file, as above. (or -B)
- Try setting a qmax. (-Q) Perhaps 10e6. That sends a few composites to the GPU, but might speed things up overall?
Okay, I'll try that on the 1010T-1015T "live" range the GPU is working on now:

-blocksize=512k, qmax=sqrt(pmax) (defaults): 233M p/sec., .96 CPU usage
-blocksize=1297k (CPU L2 cache is 8MB), qmax=10e6: 49M p/sec., .97 CPU usage

Yowch! Okay, let's try each of the changes individually:

-blocksize=1297k, qmax=sqrt(pmax): 46M p/sec., .89 CPU usage
-blocksize=512k, qmax=10e6: 279M p/sec., .93 CPU usage

So it seems that blocksize should stay as is, but lowering qmax helps quite a bit. I'll play around with some even lower qmax sizes and post the results in a little while.
mdettweiler is offline   Reply With Quote
Old 2010-09-17, 05:05   #28
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA

2×47×67 Posts
Default

Results with various qmax values:
Code:
sqrt(pmax): 233M p/sec., .96 CPU usage
10e7: 230M p/sec., .96 CPU usage
5e7: 231M p/sec., .96 CPU usage
25e6: 243M p/sec., .96 CPU usage
11e6: 277M p/sec., .93 CPU usage
10e6: 279M p/sec., .93 CPU usage
9e6: 278M p/sec., .92 CPU usage
5e6: 273M p/sec., .85 CPU usage
10e5: 238M p/sec., .73 CPU usage
Would you know, the 10e6 estimate for qmax was spot-on.

To summarize for the benefit of anyone else wanting to run a GTX 460 on the TPS variable-n sieve: the optimal settings for this GPU seem to be -m 38400, -Q 10e6.

Last fiddled with by mdettweiler on 2010-09-17 at 05:06 Reason: typo
mdettweiler is offline   Reply With Quote
Old 2010-09-17, 05:32   #29
Oddball
 
Oddball's Avatar
 
May 2010

1F316 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
-blocksize=512k, qmax=sqrt(pmax) (defaults): 233M p/sec., .96 CPU usage
-blocksize=1297k (CPU L2 cache is 8MB), qmax=10e6: 49M p/sec., .97 CPU usage

Yowch! Okay, let's try each of the changes individually:

-blocksize=1297k, qmax=sqrt(pmax): 46M p/sec., .89 CPU usage
-blocksize=512k, qmax=10e6: 279M p/sec., .93 CPU usage

So it seems that blocksize should stay as is
Would reducing blocksize give you better results? Just wondering.
Oddball is offline   Reply With Quote
Old 2010-09-17, 05:47   #30
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA

2×47×67 Posts
Default

Quote:
Originally Posted by Oddball View Post
Would reducing blocksize give you better results? Just wondering.
It doesn't seem so:
Code:
600k: 257M p/sec., .85 CPU usage
512k: 279M p/sec., .93 CPU usage
450k: 273M p/sec., .94 CPU usage
384k: 251M p/sec., .90 CPU usage
256k: 247M p/sec., .94 CPU usage
I also thre in a blocksize=600k run to see how that would work (just in case the maxima was somewhere between the previously tried 512k and 1297k), but it seems again that the original figure of 512k is spot on.
mdettweiler is offline   Reply With Quote
Old 2010-09-17, 16:11   #31
Ken_g6
 
Ken_g6's Avatar
 
Jan 2005
Caught in a sieve

5×79 Posts
Default

One more thing you can try: running two processes at once. Today I was surprised to see that a member of my TeAm posted that running two SETI instances at once on a 460 is faster than one. So maybe it'll work for TPSieve too?

By the way, that 10e6 figure was the default Geoff set in the original TPSieve. So that's probably why it's so spot-on.
Ken_g6 is offline   Reply With Quote
Old 2010-09-17, 16:23   #32
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA

2×47×67 Posts
Default

Quote:
Originally Posted by Ken_g6 View Post
One more thing you can try: running two processes at once. Today I was surprised to see that a member of my TeAm posted that running two SETI instances at once on a 460 is faster than one. So maybe it'll work for TPSieve too?

By the way, that 10e6 figure was the default Geoff set in the original TPSieve. So that's probably why it's so spot-on.
Okay, I'll give it a try with two instances.

BTW, I noticed something a little odd at the end of the 1010T-1015T I did last night. Instead of the usual bevy of timing information, the last minute of the program's output looked like this:
Code:
p=1014987735506945, 279.3M p/sec, 0.93 CPU cores, 99.8% done. ETA 17 Sep 04:11  
1014987789919531 | 9144435*2^480245+1
1014988064291101 | 2077017*2^481584+1
1014990041606131 | 6128685*2^484245-1
1014990936303989 | 6899217*2^480650-1
1014992033887231 | 8498901*2^484297+1
1014994988986481 | 5370849*2^483115-1
1014996004433129 | 3063291*2^482229-1
1014997040516873 | 7940673*2^480424-1
1014998718470129 | 7147809*2^480033+1
Found 5192 factors
The CPU version of tpsieve always outputs the full timing information when it stops whether by Ctrl-C or at the end of the range. tpsieve-CUDA outputs that information when I hit Ctrl-C, but apparently not at the end of a range. Is this how it's written in the code, or did the last few lines get eaten by the tee command I was piping this into?
mdettweiler is offline   Reply With Quote
Old 2010-09-17, 16:27   #33
Ken_g6
 
Ken_g6's Avatar
 
Jan 2005
Caught in a sieve

5×79 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
or did the last few lines get eaten by the tee command I was piping this into?
That's probably the cause. Some output goes to stderr (so that BOINC will save it, and/or because I'm lazy and haven't sorted out what shouldn't go to stderr ). Try a 2>&1 before your tee.
Ken_g6 is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Fast Mersenne Testing on the GPU using CUDA Andrew Thall GPU Computing 109 2014-07-28 22:14
Inconsistent factors with TPSieve Caldera Twin Prime Search 7 2013-01-05 18:32
tpsieve-cuda slows down with increasing p amphoria Twin Prime Search 0 2011-07-23 10:52
Is TPSieve-0.2.1 faster than Newpgen? cipher Twin Prime Search 4 2009-05-18 18:36
Thread for non-PrimeNet LL testing ThomRuley Lone Mersenne Hunters 6 2005-10-16 20:11

All times are UTC. The time now is 13:34.


Fri Jul 7 13:34:37 UTC 2023 up 323 days, 11:03, 0 users, load averages: 1.29, 1.23, 1.20

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔