mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2023-02-18, 07:35   #1
Jurzal
 
Jurzal's Avatar
 
Jan 2023
Riga, Latvia

22×3×5 Posts
Default What is Arch, Pass 1, Pass 2, clm?

Hi, in p95 benchmark there is an option to do full pass benchmark to determine optimal setup for my PC.
Can somebody explain what they mean? Thanks!

FFTlen=3360K, Type=3, Arch=4, Pass1=448, Pass2=7680, clm=4 (12 cores, 2 workers): 1.95, 1.86 ms. Throughput: 1050.86 iter/sec.
FFTlen=3360K, Type=3, Arch=4, Pass1=448, Pass2=7680, clm=2 (12 cores, 2 workers): 1.88, 1.86 ms. Throughput: 1068.46 iter/sec.
FFTlen=3360K, Type=3, Arch=4, Pass1=448, Pass2=7680, clm=1 (12 cores, 2 workers): 2.00, 1.92 ms. Throughput: 1020.24 iter/sec.
FFTlen=3360K, Type=3, Arch=4, Pass1=896, Pass2=3840, clm=4 (12 cores, 2 workers): 2.07, 1.94 ms. Throughput: 997.99 iter/sec.
FFTlen=3360K, Type=3, Arch=4, Pass1=896, Pass2=3840, clm=2 (12 cores, 2 workers): 1.93, 1.88 ms. Throughput: 1050.31 iter/sec.
FFTlen=3360K, Type=3, Arch=4, Pass1=896, Pass2=3840, clm=1 (12 cores, 2 workers): 1.93, 1.86 ms. Throughput: 1055.57 iter/sec.
FFTlen=3456K, Type=3, Arch=4, Pass1=384, Pass2=9216, clm=4 (12 cores, 2 workers): 1.97, 1.89 ms. Throughput: 1036.81 iter/sec.
FFTlen=3456K, Type=3, Arch=4, Pass1=384, Pass2=9216, clm=2 (12 cores, 2 workers): 1.93, 1.91 ms. Throughput: 1043.26 iter/sec.
FFTlen=3456K, Type=3, Arch=4, Pass1=384, Pass2=9216, clm=1 (12 cores, 2 workers): 1.94, 1.91 ms. Throughput: 1040.41 iter/sec.
FFTlen=3456K, Type=3, Arch=4, Pass1=768, Pass2=4608, clm=4 (12 cores, 2 workers): 2.06, 1.93 ms. Throughput: 1004.03 iter/sec.
FFTlen=3456K, Type=3, Arch=4, Pass1=768, Pass2=4608, clm=2 (12 cores, 2 workers): 1.93, 1.88 ms. Throughput: 1051.41 iter/sec.
FFTlen=3456K, Type=3, Arch=4, Pass1=768, Pass2=4608, clm=1 (12 cores, 2 workers): 1.89, 1.86 ms. Throughput: 1067.57 iter/sec.
FFTlen=3456K, Type=3, Arch=4, Pass1=1536, Pass2=2304, clm=4 (12 cores, 2 workers): 2.38, 2.17 ms. Throughput: 880.39 iter/sec.
FFTlen=3456K, Type=3, Arch=4, Pass1=1536, Pass2=2304, clm=2 (12 cores, 2 workers): 2.14, 1.94 ms. Throughput: 983.53 iter/sec.
FFTlen=3456K, Type=3, Arch=4, Pass1=1536, Pass2=2304, clm=1 (12 cores, 2 workers): 2.04, 1.95 ms. Throughput: 1003.56 iter/sec.
Jurzal is offline   Reply With Quote
Old 2023-02-18, 15:28   #2
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

172178 Posts
Default

From prime95 source file commonb.c (beginning line 9604 in v30.8b15), and similar occurs also elsewhere:
Code:
				sprintf (buf,
					 "FFTlen=%lu%s%s, Type=%d, Arch=%d, Pass1=%lu, Pass2=%lu, clm=%lu",
					 (fftlen & 0x3FF) ? fftlen : fftlen / 1024,
					 (fftlen & 0x3FF) ? "" : "K",
					 plus1 ? " all-complex" : "",
					 lldata.gwdata.FFT_TYPE, lldata.gwdata.ARCH,
					 fftlen / (lldata.gwdata.PASS2_SIZE ? lldata.gwdata.PASS2_SIZE : 1),
					 lldata.gwdata.PASS2_SIZE,
					 lldata.gwdata.PASS1_CACHE_LINES / ((CPU_FLAGS & CPU_AVX512F) ? 8 : ((CPU_FLAGS & CPU_AVX) ? 4 : 2)));
Arch = a number code for cpu architecture;
pass1 = size of first fft pass across the data
pass2 =size of second fft pass across the data
clm = cache line multiplier

see also at least source files cpuid.c, cpuid.h re cpu architecture

(George is on vacation. If I've botched the above, he may correct it after returning)

Last fiddled with by kriesel on 2023-02-18 at 15:29
kriesel is offline   Reply With Quote
Old 2023-02-18, 15:43   #3
Jurzal
 
Jurzal's Avatar
 
Jan 2023
Riga, Latvia

22×3×5 Posts
Default

Thanks!
It become clearer also, more questions arise - what is cache line multiplier, what is pass across data? :D
Jurzal is offline   Reply With Quote
Old 2023-02-18, 18:07   #4
slandrum
 
Jan 2021
California

56010 Posts
Default

Quote:
Originally Posted by Jurzal View Post
Thanks!
It become clearer also, more questions arise - what is cache line multiplier, what is pass across data? :D
Cache line multiplier is a low level detail in the implementation of prime95/mprime, values can be 1, 2 or 4 and on different architectures one of the settings may be slighlty faster than the other.

Do you understand what an FFT is? Very large FFTs are used to perform the multiplication/modular arithmetic, the FFT processing in prime95/mprime is done in two passes, effectively pass1 is working on X size pieces of the FFT, and pass two is taking Y pieces and combining them together. If you multiply the two pass size numbers together you'll get the full size of the FFT. Some programs break the FFT processing into 3 passes. It's an implementation detail to make the code more efficient in terms of how it uses the memory and cache.

ETA: You can use the search features on the forum and probably find much better answers than the ones I've given here.

Last fiddled with by slandrum on 2023-02-18 at 18:13
slandrum is offline   Reply With Quote
Old 2023-02-18, 21:05   #5
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

7,823 Posts
Default

Some further reading:
https://mersenneforum.org/showpost.p...78&postcount=5
https://www.mersenneforum.org/showpo...21&postcount=7
kriesel is offline   Reply With Quote
Old 2023-02-18, 21:21   #6
Jurzal
 
Jurzal's Avatar
 
Jan 2023
Riga, Latvia

6010 Posts
Default

Thanks will look into it!
Last question, how can I calculate the MB size of FFT calculation?
To understand when my cache 32MB is done and RAM kicks in, can I calculate the FFT that fits within 32MB of cache?
Thanks!
Jurzal is offline   Reply With Quote
Old 2023-02-18, 22:34   #7
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

1E8F16 Posts
Default

I suspect pass 1 fitting in L1 or L2 is more likely and significant.
As a ballpark computation, if the fft data are changed in place, L3cachesize / #workers / (8bytes/DPword)
= 32Mi / 2 / 8 = 2048Ki. That's smaller than wavefront DC.
kriesel is offline   Reply With Quote
Old 2023-02-19, 07:33   #8
Jurzal
 
Jurzal's Avatar
 
Jan 2023
Riga, Latvia

22×3×5 Posts
Default

Quote:
Originally Posted by kriesel View Post
I suspect pass 1 fitting in L1 or L2 is more likely and significant.
As a ballpark computation, if the fft data are changed in place, L3cachesize / #workers / (8bytes/DPword)
= 32Mi / 2 / 8 = 2048Ki. That's smaller than wavefront DC.
I have 5900X with 32x2 MB of L3 Cache, so if 32Mi / 1 / 8, I should fit within 4,096K range?
Something like that I assume. 2 workers, each utilizing its own 32MB of cache.

Last fiddled with by Jurzal on 2023-02-19 at 07:34
Jurzal is offline   Reply With Quote
Old 2023-02-19, 09:30   #9
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013
https://pedan.tech/

318410 Posts
Default

Quote:
Originally Posted by Jurzal View Post
I have 5900X with 32x2 MB of L3 Cache, so if 32Mi / 1 / 8, I should fit within 4,096K range?
Something like that I assume. 2 workers, each utilizing its own 32MB of cache.
Yes, about.

Though your benchmarks in the benchmark thread seem a bit odd. Was there anything else running at the same time?

Someone ran benchmarks of the 5800X3D and you can see it doesn't have a big increase when the FFT exceeds 32M unlike the other CPUs with 32MB per chiplet. I'm on mobile and too lazy to dig through the thread.
Mark Rose is offline   Reply With Quote
Old 2023-02-19, 13:23   #10
Jurzal
 
Jurzal's Avatar
 
Jan 2023
Riga, Latvia

22×3×5 Posts
Default

GPU was running on background, discord, hwinfo64 other background type tasks.
5800X3D has 96 MB of L3 cache, while 5900X has 2x32 MB of L3 cache.

I can test some other parameters, but I don't think I would come up with different results
Jurzal is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to pass a timeout argument? Dubslow YAFU 3 2016-05-11 15:37
Out of 1st pass work iconized Prime Sierpinski Project 1 2012-02-12 18:36
Block Lanczos with a reordering pass jasonp Msieve 18 2010-02-07 08:33
First pass PRPNet server out of work? opyrt Prime Sierpinski Project 6 2009-09-24 18:14
please help me pass the test. caliman Hardware 2 2007-11-08 06:12

All times are UTC. The time now is 14:01.


Fri Jul 7 14:01:03 UTC 2023 up 323 days, 11:29, 0 users, load averages: 1.15, 1.15, 1.15

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔