mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > Riesel Prime Search

Reply
 
Thread Tools
Old 2007-10-10, 19:48   #1
MooooMoo
Apprentice Crank
 
MooooMoo's Avatar
 
Mar 2006

2×227 Posts
Default Weird LLR FFT timings

A while ago, I tested k=19 for 601800 < n < 748200. The FFT length that LLR was using was 40960, and it took 0.983 ms per iteration.

Later, I tested k=19 for 748200 < n < 891100. The FFT length that LLR was using was 49152, which was a 20% jump from the earlier FFT length of 40960. As expected, the time per iteration increased 20% to 1.182 ms per iteration.

The testing of k=19 for 891100 < n < 1036200 increased the FFT length to 57344, a 16.7% jump from the earlier FFT length of 49152. Once again, the time per iteration increased about 16.7% to 1.427 ms per iteration.

A few days ago, I tested n > 1036200 for this same k. The FFT length increased to 65536, a 14.3% jump from the earlier FFT length of 57344. However, the time per iteration increased less than 8%, and the time per iteration was only 1.540 ms instead of the expected 1.631ms. Does anyone know why?
MooooMoo is offline   Reply With Quote
Old 2007-10-11, 04:25   #2
axn
 
axn's Avatar
 
Jun 2003

22×7×193 Posts
Default

Quote:
Originally Posted by MooooMoo View Post
Does anyone know why?
Power-of-two FFTs are more efficient than the others. It's the nature of FFT algorithm.
axn is offline   Reply With Quote
Old 2007-10-11, 08:56   #3
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

144668 Posts
Default

This is certainly accepted wisdom. I'm working in crystallography, and at one point I tested a large number of X*Y*Z three-dimensional FFTs using FFTW and found that 128*128*128, which I'd expected to be obviously the fastest, was 30% slower per voxel than 128*126*125.

Code:
There are fairly clearly four speed buckets; if you group by 'number
of length-127 axes' that gives a lower bound on the speed (IE there
are no fastest-buckets with any length 127, no second-fastest with two
lengths 127).

Actually, the fastest bucket looks to be just 'lengths all 125, 126 or
128'.  It is bad if the short axis is 128, which is presumably the
cache issue.  It looks as if it's best for the axes to be in increasing
order with speed, but that's not completely certain from the data.
I suspect this is more an issue with the transpositions required for working on 3D FFTs than with the 1D FFT algorithm itself; large transpositions can be horribly cache-unfriendly.
fivemack is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Gimme Atom timings! nuggetprime Software 5 2011-02-21 08:28
New proggy and timings axn Operation Billion Digits 1 2009-02-06 16:14
Need GMP trial-division timings ewmayer Factoring 7 2008-12-11 22:12
321 LLR timings paulunderwood 3*2^n-1 Search 14 2008-04-17 22:27
AMD64 opcode timings Prime95 Software 16 2005-03-04 17:48

All times are UTC. The time now is 21:46.


Mon Sep 26 21:46:06 UTC 2022 up 39 days, 19:14, 0 users, load averages: 2.29, 1.73, 1.70

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔