mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2015-12-08, 00:23   #12
Gordon
 
Gordon's Avatar
 
Nov 2008

509 Posts
Default

Quote:
Originally Posted by kladner View Post
Unlike one of my 580s which is currently factoring at a pathetic 505 Ghz days/day, running at 76 C, ATM.
..and you get to burn 68% more electricity 244w vs 145w for the privilege of getting 20% more throughput
Gordon is offline   Reply With Quote
Old 2015-12-08, 00:43   #13
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2·3·1,693 Posts
Default

Quote:
Originally Posted by Gordon View Post
..and you get to burn 68% more electricity 244w vs 145w for the privilege of getting 20% more throughput
Too true. It is more than that. The 580 is overclocked. Unfortunately, my KillaWatt bit the dust, so I can't say just how much.
kladner is offline   Reply With Quote
Old 2015-12-08, 08:50   #14
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

2×7×461 Posts
Default

Quote:
Originally Posted by Gordon View Post
Maybe, but it's superb at factoring - 420 Ghz days/day
The GTX970 is also very decent at ECM (16 hours for 416 curves at 4e8 vs 24 hours for 512 curves at 4e8 on GTX580), and not much slower at GNFS polynomial search.
fivemack is offline   Reply With Quote
Old 2015-12-08, 16:23   #15
Gordon
 
Gordon's Avatar
 
Nov 2008

1FD16 Posts
Default

Quote:
Originally Posted by fivemack View Post
The GTX970 is also very decent at ECM (16 hours for 416 curves at 4e8 vs 24 hours for 512 curves at 4e8 on GTX580), and not much slower at GNFS polynomial search.
As long as the exponent is small enough - less than 2^1018?
Gordon is offline   Reply With Quote
Old 2015-12-08, 18:26   #16
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

2×7×461 Posts
Default

Quote:
Originally Posted by Gordon View Post
As long as the exponent is small enough - less than 2^1018?
Yes, of course; I'm using it for numbers that I plan to GNFS afterwards, so that's not really a restriction for my use case :)
fivemack is offline   Reply With Quote
Old 2017-12-29, 16:37   #17
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

24×3×163 Posts
Default

Quote:
Originally Posted by fivemack View Post
I would be readier to replace my GTX580 with a GTX970 if the GTX970 ran cudalucas at all - all the intermediate residues come out as zeroes when I run cudalucas -d 1.
How far have you investigated or tuned the setup of the 970 in CUDALucas?

Use the May 5 2016 beta of CUDALucas which catches such errors and others.

If you haven't already, check what thread count is being used. Some card types and thread counts don't mix. (Check the ini file and the threads file and for any command line parameters.)

Various errors will produce repeating residue series, which might have residue value repeatedly zero, oxfffffffffffffffd, or two. A real Mersenne prime has a residue series where only the last residue is zero. Sometimes these errors will lead to quicker than expected iteration times. Such runs should be abandoned as early as possible, and the issue resolved. An erroneous run may look something like this:

Using threads: square 1024, splice 128.
Starting M80381387 fft length = 4320K
| Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done |
| Mar 27 21:50:33 | M80381387 10000 0xfffffffffffffffd | 4320K 0.11852 0.6739 6.73s | 15:02:43 0.01% |
| Mar 27 21:50:40 | M80381387 20000 0xfffffffffffffffd | 4320K 0.10074 0.6607 6.60s | 14:53:48 0.02% |

where an expected ms/iter value for the FFT length is much larger. The fft.txt file for this gpu contained a record as follows, meaning an expected iteration time was 8.8msec not 0.7.
4374 80879779 8.8042

Relaunch after correcting whatever the problem was without getting rid of the bad checkpoint files produces bad following residues. Garbage in, garbage out, at expected speed.

Using threads: square 256, splice 128.
Continuing M80381387 @ iteration 246202 with fft length 4320K, 0.31% done
| Date Time | Test Num Iter Residue | FFT Error
ms/It Time | ETA Done |
| Mar 28 00:22:29 | M80381387 250000 0x0000000000000002 | 4320K 0.000058.2352 31.27s | 17:22:30 0.31% |
| Mar 28 00:23:51 | M80381387 260000 0x0000000000000002 | 4320K 0.000058.2653 82.65s | 23:46:47 0.32% |
| Mar 28 00:25:14 | M80381387 270000 0x0000000000000002 | 4320K 0.000058.2623 82.62s | 1:05:42:21 0.33% |

Normal function output for the same GPU looks like this:

| Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done |
| Mar 05 01:30:03 | M78157153 15410000 0xe5e7a3b4c1deab80 | 4320K 0.14337 8.2636 82.62s | 6:00:08:24 19.71% |
| Mar 05 01:31:26 | M78157153 15420000 0x32f8bddda17ba94e | 4320K 0.14230 8.2600 82.60s | 6:00:07:01 19.72% |

Some cards seem to produce bad residues if square 1024 is used, and work otherwise. A symptom of that case that shows up in thread benchmarking console output is a sharp discontinuity in per-iteration timings:

fft = 4320K, ave time = 1.2804 ms, square: 32, splice: 128
fft = 4320K, ave time = 1.2819 ms, square: 64, splice: 128
fft = 4320K, ave time = 1.4082 ms, square: 128, splice: 128
fft = 4320K, ave time = 1.4996 ms, square: 256, splice: 128
fft = 4320K, ave time = 1.5713 ms, square: 512, splice: 128
fft = 4320K, ave time = 0.6497 ms, square: 1024, splice: 128
fft = 4320K, ave time = 0.6513 ms, square: 1024, splice: 32
fft = 4320K, ave time = 0.6498 ms, square: 1024, splice: 64
fft = 4320K, ave time = 0.6498 ms, square: 1024, splice: 128
fft = 4320K, ave time = 0.6500 ms, square: 1024, splice: 256
fft = 4320K, ave time = 0.6498 ms, square: 1024, splice: 512
fft = 4320K, ave time = 0.6533 ms, square: 1024, splice: 1024
fft = 4320K, min time = 0.6498 ms, square: 1024, splice: 64

Such a card should not be benchmarked for or used with 1024 squaring threads. It will produce incorrect residues. It will produce an incorrect threads file. See the m parameter of -threadbench to avoid 1024-thread in benchmarking. A similar issue with 32 threads exists for some cards.
kriesel is online now   Reply With Quote
Old 2018-02-02, 17:18   #18
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

24×3×163 Posts
Default gtx970 and zeroes

Quote:
Originally Posted by fivemack View Post
I would be readier to replace my GTX580 with a GTX970 if the GTX970 ran cudalucas at all - all the intermediate residues come out as zeroes when I run cudalucas -d 1.
Is the GTX970 known for that? I don't recall that. Are you running CUDA9? See http://www.mersenneforum.org/showpos...2&postcount=42
In that case try an older driver if you can, that would put it back at CUDA 8 (or earlier).

Last fiddled with by kriesel on 2018-02-02 at 17:18
kriesel is online now   Reply With Quote
Reply



All times are UTC. The time now is 15:00.


Fri Jul 7 15:00:38 UTC 2023 up 323 days, 12:29, 0 users, load averages: 1.51, 1.23, 1.15

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔