![]() |
|
|
#23 |
|
Jan 2005
Caught in a sieve
5·79 Posts |
Looks to be CPU bound. Can you try:
1. Create a file named tpsieve.txt with only the following line: Code:
blocksize=1M 3. Try running two instances at once (in different directories). Oh, and when one or more of those works, you might even try increasing the threads per multiprocessor again! |
|
|
|
|
|
#24 |
|
"Dave"
Sep 2005
UK
277610 Posts |
Increasing blocksize had no significant effect. I then tried running 2 instances on 2 cores. This gave me a combined total of 451M p/sec using 1.82 CPU cores. I didn't experiment with increasing threads per multiprocessor as this would already be a combined total of 49,152.
|
|
|
|
|
|
#25 |
|
A Sunny Moo
Aug 2007
USA
2·47·67 Posts |
I did some benchmarks on a GTX 460 with the 710T-715T test range and various -m values:
Code:
24064: 199M p/sec., .80 CPU usage 24448: 201M p/sec., .81 CPU usage 24576: 201M p/sec., .81 CPU usage 24960: 202M p/sec., .82 CPU usage 25344: 203M p/sec., .82 CPU usage 25728: 205M p/sec., .82 CPU usage 26368: 207M p/sec., .83 CPU usage 27648: 211M p/sec., .85 CPU usage 30208: 221M p/sec., .88 CPU usage 32768: 232M p/sec., .92 CPU usage 35328: 241M p/sec., .95 CPU usage 37888: 242M p/sec., .96 CPU usage 38016: 242M p/sec., .96 CPU usage 38144: 242M p/sec., .96 CPU usage 38272: 243M p/sec., .96 CPU usage 38400: 243M p/sec., .96 CPU usage 38528: 0 p/sec., 1.00 CPU usage (?) 39168: 0 p/sec., 1.00 CPU usage (?) 40448: 0 p/sec., 1.00 CPU usage (?) -The "sweet spot" seems to be at -m 38400 on this GPU. Would it be correct to assume that the same will hold true for higher p-values? Also, should -m 38400 be optimal for ppsieve as well, or will that be totally different? -Any -m value 38528 or greater resulted in 0 p/sec. and 1.00 CPU usage; while it outputted its progress every minute as usual, it seemed effectively frozen since Ctrl-C didn't stop it (at least, it didn't respond to Ctrl-C within 10-15 seconds). I ended up having to use a -SIGKILL to stop tpsieve. -A diff of tpfactors.txt vs. the relevant portion of my original factors for this range shows no discrepancies.
Last fiddled with by mdettweiler on 2010-09-17 at 03:17 |
|
|
|
|
|
#26 | |
|
Jan 2005
Caught in a sieve
5×79 Posts |
Quote:
Other ideas to try: - Increase blocksize in the config file, as above. (or -B) - Try setting a qmax. (-Q) Perhaps 10e6. That sends a few composites to the GPU, but might speed things up overall? |
|
|
|
|
|
|
#27 | ||
|
A Sunny Moo
Aug 2007
USA
2×47×67 Posts |
Quote:
Quote:
-blocksize=512k, qmax=sqrt(pmax) (defaults): 233M p/sec., .96 CPU usage -blocksize=1297k (CPU L2 cache is 8MB), qmax=10e6: 49M p/sec., .97 CPU usage Yowch! Okay, let's try each of the changes individually:-blocksize=1297k, qmax=sqrt(pmax): 46M p/sec., .89 CPU usage -blocksize=512k, qmax=10e6: 279M p/sec., .93 CPU usage So it seems that blocksize should stay as is, but lowering qmax helps quite a bit. I'll play around with some even lower qmax sizes and post the results in a little while. |
||
|
|
|
|
|
#28 |
|
A Sunny Moo
Aug 2007
USA
629810 Posts |
Results with various qmax values:
Code:
sqrt(pmax): 233M p/sec., .96 CPU usage 10e7: 230M p/sec., .96 CPU usage 5e7: 231M p/sec., .96 CPU usage 25e6: 243M p/sec., .96 CPU usage 11e6: 277M p/sec., .93 CPU usage 10e6: 279M p/sec., .93 CPU usage 9e6: 278M p/sec., .92 CPU usage 5e6: 273M p/sec., .85 CPU usage 10e5: 238M p/sec., .73 CPU usage ![]() To summarize for the benefit of anyone else wanting to run a GTX 460 on the TPS variable-n sieve: the optimal settings for this GPU seem to be -m 38400, -Q 10e6. Last fiddled with by mdettweiler on 2010-09-17 at 05:06 Reason: typo |
|
|
|
|
|
#29 | |
|
May 2010
499 Posts |
Quote:
|
|
|
|
|
|
|
#30 | |
|
A Sunny Moo
Aug 2007
USA
189A16 Posts |
Quote:
Code:
600k: 257M p/sec., .85 CPU usage 512k: 279M p/sec., .93 CPU usage 450k: 273M p/sec., .94 CPU usage 384k: 251M p/sec., .90 CPU usage 256k: 247M p/sec., .94 CPU usage |
|
|
|
|
|
|
#31 |
|
Jan 2005
Caught in a sieve
39510 Posts |
One more thing you can try: running two processes at once. Today I was surprised to see that a member of my TeAm posted that running two SETI instances at once on a 460 is faster than one. So maybe it'll work for TPSieve too?
By the way, that 10e6 figure was the default Geoff set in the original TPSieve. So that's probably why it's so spot-on.
|
|
|
|
|
|
#32 | |
|
A Sunny Moo
Aug 2007
USA
2·47·67 Posts |
Quote:
BTW, I noticed something a little odd at the end of the 1010T-1015T I did last night. Instead of the usual bevy of timing information, the last minute of the program's output looked like this: Code:
p=1014987735506945, 279.3M p/sec, 0.93 CPU cores, 99.8% done. ETA 17 Sep 04:11 1014987789919531 | 9144435*2^480245+1 1014988064291101 | 2077017*2^481584+1 1014990041606131 | 6128685*2^484245-1 1014990936303989 | 6899217*2^480650-1 1014992033887231 | 8498901*2^484297+1 1014994988986481 | 5370849*2^483115-1 1014996004433129 | 3063291*2^482229-1 1014997040516873 | 7940673*2^480424-1 1014998718470129 | 7147809*2^480033+1 Found 5192 factors |
|
|
|
|
|
|
#33 | |
|
Jan 2005
Caught in a sieve
5·79 Posts |
Quote:
). Try a 2>&1 before your tee.
|
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Fast Mersenne Testing on the GPU using CUDA | Andrew Thall | GPU Computing | 109 | 2014-07-28 22:14 |
| Inconsistent factors with TPSieve | Caldera | Twin Prime Search | 7 | 2013-01-05 18:32 |
| tpsieve-cuda slows down with increasing p | amphoria | Twin Prime Search | 0 | 2011-07-23 10:52 |
| Is TPSieve-0.2.1 faster than Newpgen? | cipher | Twin Prime Search | 4 | 2009-05-18 18:36 |
| Thread for non-PrimeNet LL testing | ThomRuley | Lone Mersenne Hunters | 6 | 2005-10-16 20:11 |