mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2016-05-03, 03:03   #408
bverka86
 
May 2011

10002 Posts
Default

Quote:
Originally Posted by bverka86 View Post
So for example, if 1536K wasn't good enough for M35861153, 2048K performs at least 3x better than 1728K. I would not mind digging into this algorithm, but I've been sick for quite some time and haven't dug in too deeply. Is the entire FFT length selection done in clLucas.cpp?
*****
For a precise example, using M38542223:
Auto-Select:
Iteration 300000 M( 38542223 )C, 0xfd1aedd779d62f55, n = 2240K, clLucas v1.04 err = 0.0308 (33:27 real, 20.0766 ms/iter, ETA 213:02:07)
Manual Select:
Iteration 300000 M( 38542223 )C, 0xfd1aedd779d62f55, n = 2560K, clLucas v1.04 err = 0.0019 (9:39 real, 5.7906 ms/iter, ETA 61:26:39)
bverka86 is offline   Reply With Quote
Old 2016-05-03, 03:12   #409
axn
 
axn's Avatar
 
Jun 2003

23·683 Posts
Default

Quote:
Originally Posted by bverka86 View Post
*****
For a precise example, using M38542223:
Auto-Select:
Iteration 300000 M( 38542223 )C, 0xfd1aedd779d62f55, n = 2240K, clLucas v1.04 err = 0.0308 (33:27 real, 20.0766 ms/iter, ETA 213:02:07)
Manual Select:
Iteration 300000 M( 38542223 )C, 0xfd1aedd779d62f55, n = 2560K, clLucas v1.04 err = 0.0019 (9:39 real, 5.7906 ms/iter, ETA 61:26:39)
Could you also try the following FFTs and see if any of them are faster than 2560?
2187
2304
2401
2500
2592

Last fiddled with by axn on 2016-05-03 at 03:15
axn is offline   Reply With Quote
Old 2016-05-03, 08:31   #410
bverka86
 
May 2011

10002 Posts
Default

Quote:
Originally Posted by axn View Post
Could you also try the following FFTs and see if any of them are faster than 2560?
2187
2304
2401
2500
2592
I've got to reboot the box running that number anyway (SuperMicro's built in PWM is terribly annoying, switching back to my home-grown pwm program). I'll backup my current checkpoints and run those to 30K (assuming each FFT listed works under 1.04)

**Update - Results of requested FFTs**
2187K:
Starting M38542223 fft length = 2187K
FFT length error.

2304K (thanks for the tip!):
Iteration 30000 M( 38542223 )C, 0xdbd7530295df2924, n = 2304K, clLucas v1.04 err = 0.0175 (0:52 real, 5.1583 ms/iter, ETA 55:10:47)

2401K:
Starting M38542223 fft length = 2401K
FFT length error.

2500K (still faster than 2560K, but not as good as 2304):
Iteration 30000 M( 38542223 )C, 0xdbd7530295df2924, n = 2500K, clLucas v1.04 err = 0.0028 (0:57 real, 5.6499 ms/iter, ETA 60:26:18)

2592K (faster than 2560K, but only slightly):
Iteration 30000 M( 38542223 )C, 0xdbd7530295df2924, n = 2592K, clLucas v1.04 err = 0.0016 (0:57 real, 5.7399 ms/iter, ETA 61:24:04)

clFFT-2.12.0 (if it matters, I have the sources to build 2.10.1 & 2.10.2 if you think those would show further improvement)

Also, given 2187K & 2401K weren't even in terms of K, I tried 2176K & 2402K & 2410K (2048+128 & 2304+92 & 2304+96, could see any as a reasonable typo), but that returned an error of a different nature:
Starting M38542223 fft length = 2176K
OPENCL_V_THROWERROR< CLFFT_NOTIMPLEMENTED > (1753): Failed to clfftCreateDefaultPlan.
terminate called after throwing an instance of 'std::runtime_error'
what(): OPENCL_V_THROWERROR< CLFFT_NOTIMPLEMENTED > (1753): Failed to clfftCreateDefaultPlan.
Aborted

Starting M38542223 fft length = 2402K
OPENCL_V_THROWERROR< CLFFT_NOTIMPLEMENTED > (1753): Failed to clfftCreateDefaultPlan.
terminate called after throwing an instance of 'std::runtime_error'
what(): OPENCL_V_THROWERROR< CLFFT_NOTIMPLEMENTED > (1753): Failed to clfftCreateDefaultPlan.
Aborted

Starting M38542223 fft length = 2410K
OPENCL_V_THROWERROR< CLFFT_NOTIMPLEMENTED > (1753): Failed to clfftCreateDefaultPlan.
terminate called after throwing an instance of 'std::runtime_error'
what(): OPENCL_V_THROWERROR< CLFFT_NOTIMPLEMENTED > (1753): Failed to clfftCreateDefaultPlan.
Aborted

Last fiddled with by bverka86 on 2016-05-03 at 09:28 Reason: Added Results
bverka86 is offline   Reply With Quote
Old 2016-05-03, 09:43   #411
axn
 
axn's Avatar
 
Jun 2003

23×683 Posts
Default

Quote:
Originally Posted by bverka86 View Post
Also, given 2187K & 2401K weren't even in terms of K, I tried 2176K & 2402K (2048+128 & 2304+92, could see it as a reasonable typo, couldn't figure an alternate for 2401K), but that returned an error of a different nature:
They weren't typos, but I wasn't sure they'd work in clLucas. Basically, I was looking for all FFTs between 2048 and 4096 which was of the form 2^x*p^y (x, y >=0, p in {3,5,7}). These tend to have efficient implementation in clFFT. 2187 = 3^7, 2401 = 7^4.

FWIW, 2304 = 2^8*3^2, 2500 = 2^2*5^4, 2592 = 2^5*3^4

Here it the full list of even FFTs that follow same pattern (between 2048 & 4096K)
2048=2^11
2304=2^8*3^2
2500=2^2*5^4
2560=2^9*5^1
2592=2^5*3^4
2744=2^3*7^3
2916=2^2*3^6
3072=2^10*3^1
3136=2^6*7^2
3200=2^7*5^2
3456=2^7*3^3
3584=2^9*7^1
3888=2^4*3^5
4000=2^5*5^3
4096=2^12
axn is offline   Reply With Quote
Old 2016-05-03, 16:52   #412
bverka86
 
May 2011

23 Posts
Default

I'll keep that list of FFT's in mind.

The breakdown into each FFT's "smoothness" is helpful, so thank you!

Sometimes I question using these R9 270X's for LL tests, but then I consider things like M72895313...

R9 270X:
Iteration 18000000 M( 72895313 )C, 0xebe0881ebbc59556, n = 4096K, clLucas v1.04 err = 0.1094 (13:33 real, 8.1293 ms/iter, ETA 123:44:44)

Dual Xeon E5450 (6/8 cores):
[Work thread May 3 10:47] Iteration: 3400000 / 72895313 [4.66%], ms/iter: 21.244, ETA: 17d 02:05
bverka86 is offline   Reply With Quote
Old 2016-05-14, 04:34   #413
hurrican50
 
Apr 2016

2 Posts
Default Why does the ETA increase when I increase my GPU fan speed?

On my 290, why does the ETA go from about 150Hours to 180Hours for cllucas when I increase the fanspeed of the GPU? The ETA is correctly adjusted based on the ms/iter, but why does it take a little longer?
All other programs I have seen are either quicker or not effected when I increase the fan speed.

Note: Increasing the fanspeed does not effect the speed of the memory, but it does allow the GPU to run faster, longer.
hurrican50 is offline   Reply With Quote
Old 2016-05-14, 08:03   #414
hurrican50
 
Apr 2016

2 Posts
Default

When I increase the Fanspeed on my 290, why does the ETA increase from about 150 hours to 180 hours on cllucas for an LL test? On all other programs this either has no effect, or it makes the program run faster because it allows the GPU to run closer to full speed.
hurrican50 is offline   Reply With Quote
Old 2016-05-16, 02:28   #415
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

41·251 Posts
Default

Quote:
Originally Posted by hurrican50 View Post
When I increase the Fanspeed on my 290, why does the ETA increase from about 150 hours to 180 hours on cllucas for an LL test? On all other programs this either has no effect, or it makes the program run faster because it allows the GPU to run closer to full speed.
Power limit. The fan takes more energy, therefore less is available for computing. The card is enough clever to adjust its ticks to stay in the power requirements. For my Titans, I used to tune the fan curve, to get like 10% more computation. [edit: this was long ago when I was running on air, for water cooling we don't have this problem anymore, and the fans stays at minimum, because there is no fan connected].

Last fiddled with by LaurV on 2016-05-16 at 02:31
LaurV is offline   Reply With Quote
Old 2016-06-12, 22:14   #416
airsquirrels
 
airsquirrels's Avatar
 
"David"
Jul 2015
Ohio

11·47 Posts
Default

I did some testing with clFFT 2.12.1 and the latest Crimson drivers.

Nothing of significant note to report, although as of 2.10 Fury X cards can from > 16384K FFTs...

Iteration 10000 M( 332213573 )C, 0x216c5a1819bd595d, n = 19200K, clLucas v1.04 err = 0.1367 (4:42 real, 28.1697 ms/iter, ETA 2599:26:26)
Iteration 20000 M( 332213573 )C, 0xebc18094b00fad87, n = 19200K, clLucas v1.04 err = 0.1406 (4:42 real, 28.1144 ms/iter, ETA 2594:15:35)

108 days for a 100M test on a Fury X. ($629, 200/220W during test)
airsquirrels is offline   Reply With Quote
Old 2016-06-13, 09:48   #417
axn
 
axn's Avatar
 
Jun 2003

23·683 Posts
Default

Quote:
Originally Posted by airsquirrels View Post
108 days for a 100M test on a Fury X. ($629, 200/220W during test)
clLucas doesn't necessarily choose the best FFT sizes. There might be larger FFTs that can give much higher performance. I would, at minimum, try these alternate FFTs: 18432k, 19208k, 19600k, 20000k, 20480k, 20736k to see if there is a potential for higher performance.

Last fiddled with by axn on 2016-06-13 at 09:56
axn is offline   Reply With Quote
Old 2016-06-17, 05:59   #418
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

240638 Posts
Default

yes, correct! we would also suggest a try with 32768k, which is the next power of two (however a bit to the limit here) to see how the iteration times behave.
LaurV is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1724 2023-06-04 23:31
Can't get OpenCL to work on HD7950 Ubuntu 14.04.5 LTS VictordeHolland Linux 4 2018-04-11 13:44
OpenCL accellerated lattice siever pstach Factoring 1 2014-05-23 01:03
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
AMD's Graphics Core Next- a reason to accelerate towards OpenCL? Belteshazzar GPU Computing 19 2012-03-07 18:58

All times are UTC. The time now is 15:26.


Fri Jul 7 15:26:15 UTC 2023 up 323 days, 12:54, 0 users, load averages: 1.08, 1.12, 1.10

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔