mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   About clm in Prime95 (https://www.mersenneforum.org/showthread.php?t=23587)

nopaynoget 2018-08-20 13:46

About clm in Prime95
 
I saw clm=1 and clm=2 or clm=4
I dont know what dose it mean in Prime95 and how can set it.
Need some help .

Prime95 2018-08-20 19:26

This is prime95 internals, not something for the end user to control. CLM stands for "cache line multiplier". It refers to the number of cache lines processed simultaneously in pass 1. Higher clms put more pressure on the L2 cache, but should be a little faster in carry propagation. Prime95 version 29.4 automatically benchmarks different clm values to select the fastest one for your machine.

nopaynoget 2018-08-21 19:39

[QUOTE=Prime95;494319]This is prime95 internals, not something for the end user to control. CLM stands for "cache line multiplier". It refers to the number of cache lines processed simultaneously in pass 1. Higher clms put more pressure on the L2 cache, but should be a little faster in carry propagation. Prime95 version 29.4 automatically benchmarks different clm values to select the fastest one for your machine.[/QUOTE]

Thank you .
Yes even the same CPU will have different clms.
Can you tell me under what circumstances clms will be the same number.Shows all CLM=2 or other number.
Thanks again.

Prime95 2018-08-22 01:29

There is no hard-and-fast rule for determining which clm is best for each FFT size. Clm is always 1, 2, or 4. I think that smaller FFT size usually use a clm of 2 or 4 (pass 1 easily fits in the L2 cache) whereas larger FFT sizes usually use a clm of 1 or 2.

Sorry to be so vague.

nopaynoget 2018-08-22 20:05

[QUOTE=Prime95;494416]There is no hard-and-fast rule for determining which clm is best for each FFT size. Clm is always 1, 2, or 4. I think that smaller FFT size usually use a clm of 2 or 4 (pass 1 easily fits in the L2 cache) whereas larger FFT sizes usually use a clm of 1 or 2.

Sorry to be so vague.[/QUOTE]
A lot of help for me.
Thank you so muck.:tu:

harlee 2018-09-03 17:46

[QUOTE=Prime95;494319]Prime95 version 29.4 automatically benchmarks different clm values to select the fastest one for your machine.[/QUOTE]

I've been doing P-1 testing in the 19M range on exponents where B1=B2. I 've seen the clm tests but I not sure if the best one is being selected. Here are the tests results from the results.txt file:

[Sat Aug 25 05:59:24 2018]
FFTlen=1024K, Type=3, Arch=4, Pass1=128, Pass2=8192, clm=4 (1 core, 1 worker): 6.29 ms. Throughput: 159.00 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=128, Pass2=8192, clm=2 (1 core, 1 worker): 6.02 ms. Throughput: 166.14 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=128, Pass2=8192, clm=1 (1 core, 1 worker): 6.27 ms. Throughput: 159.54 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=256, Pass2=4096, clm=4 (1 core, 1 worker): 6.01 ms. Throughput: 166.32 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=256, Pass2=4096, clm=2 (1 core, 1 worker): 5.96 ms. Throughput: 167.72 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=256, Pass2=4096, clm=1 (1 core, 1 worker): 6.02 ms. Throughput: 166.14 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=512, Pass2=2048, clm=4 (1 core, 1 worker): 6.45 ms. Throughput: 154.97 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=512, Pass2=2048, clm=2 (1 core, 1 worker): 5.86 ms. Throughput: 170.78 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=512, Pass2=2048, clm=1 (1 core, 1 worker): 5.97 ms. Throughput: 167.55 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=1024, Pass2=1024, clm=4 (1 core, 1 worker): 6.86 ms. Throughput: 145.81 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=1024, Pass2=1024, clm=2 (1 core, 1 worker): 5.99 ms. Throughput: 166.91 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=1024, Pass2=1024, clm=1 (1 core, 1 worker): 6.02 ms. Throughput: 166.00 iter/sec.

[Mon Aug 27 20:59:23 2018]
FFTlen=1024K, Type=3, Arch=4, Pass1=128, Pass2=8192, clm=4 (1 core, 1 worker): 6.34 ms. Throughput: 157.68 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=128, Pass2=8192, clm=2 (1 core, 1 worker): 5.99 ms. Throughput: 167.04 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=128, Pass2=8192, clm=1 (1 core, 1 worker): 6.43 ms. Throughput: 155.44 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=256, Pass2=4096, clm=4 (1 core, 1 worker): 6.01 ms. Throughput: 166.44 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=256, Pass2=4096, clm=2 (1 core, 1 worker): 5.90 ms. Throughput: 169.46 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=256, Pass2=4096, clm=1 (1 core, 1 worker): 5.97 ms. Throughput: 167.40 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=512, Pass2=2048, clm=4 (1 core, 1 worker): 6.43 ms. Throughput: 155.53 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=512, Pass2=2048, clm=2 (1 core, 1 worker): 5.79 ms. Throughput: 172.58 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=512, Pass2=2048, clm=1 (1 core, 1 worker): 5.89 ms. Throughput: 169.83 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=1024, Pass2=1024, clm=4 (1 core, 1 worker): 6.84 ms. Throughput: 146.11 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=1024, Pass2=1024, clm=2 (1 core, 1 worker): 5.99 ms. Throughput: 166.86 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=1024, Pass2=1024, clm=1 (1 core, 1 worker): 5.99 ms. Throughput: 166.85 iter/sec.

[Tue Aug 28 17:57:30 2018]
FFTlen=1024K, Type=3, Arch=4, Pass1=128, Pass2=8192, clm=4 (1 core, 1 worker): 6.70 ms. Throughput: 149.29 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=128, Pass2=8192, clm=2 (1 core, 1 worker): 6.44 ms. Throughput: 155.21 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=128, Pass2=8192, clm=1 (1 core, 1 worker): 6.69 ms. Throughput: 149.55 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=256, Pass2=4096, clm=4 (1 core, 1 worker): 6.77 ms. Throughput: 147.66 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=256, Pass2=4096, clm=2 (1 core, 1 worker): 6.81 ms. Throughput: 146.94 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=256, Pass2=4096, clm=1 (1 core, 1 worker): 6.97 ms. Throughput: 143.38 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=512, Pass2=2048, clm=4 (1 core, 1 worker): 7.15 ms. Throughput: 139.85 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=512, Pass2=2048, clm=2 (1 core, 1 worker): 6.42 ms. Throughput: 155.88 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=512, Pass2=2048, clm=1 (1 core, 1 worker): 6.44 ms. Throughput: 155.33 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=1024, Pass2=1024, clm=4 (1 core, 1 worker): 7.42 ms. Throughput: 134.79 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=1024, Pass2=1024, clm=2 (1 core, 1 worker): 6.49 ms. Throughput: 154.04 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=1024, Pass2=1024, clm=1 (1 core, 1 worker): 6.40 ms. Throughput: 156.28 iter/sec.

[Wed Aug 29 14:59:22 2018]
FFTlen=1024K, Type=3, Arch=4, Pass1=128, Pass2=8192, clm=4 (1 core, 1 worker): 6.25 ms. Throughput: 160.11 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=128, Pass2=8192, clm=2 (1 core, 1 worker): 5.99 ms. Throughput: 166.81 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=128, Pass2=8192, clm=1 (1 core, 1 worker): 6.27 ms. Throughput: 159.51 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=256, Pass2=4096, clm=4 (1 core, 1 worker): 6.00 ms. Throughput: 166.54 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=256, Pass2=4096, clm=2 (1 core, 1 worker): 5.89 ms. Throughput: 169.68 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=256, Pass2=4096, clm=1 (1 core, 1 worker): 6.01 ms. Throughput: 166.44 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=512, Pass2=2048, clm=4 (1 core, 1 worker): 6.49 ms. Throughput: 154.01 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=512, Pass2=2048, clm=2 (1 core, 1 worker): 5.79 ms. Throughput: 172.61 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=512, Pass2=2048, clm=1 (1 core, 1 worker): 5.92 ms. Throughput: 168.93 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=1024, Pass2=1024, clm=4 (1 core, 1 worker): 6.90 ms. Throughput: 144.94 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=1024, Pass2=1024, clm=2 (1 core, 1 worker): 6.01 ms. Throughput: 166.28 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=1024, Pass2=1024, clm=1 (1 core, 1 worker): 5.97 ms. Throughput: 167.43 iter/sec.

[Thu Aug 30 11:59:21 2018]
FFTlen=1024K, Type=3, Arch=4, Pass1=128, Pass2=8192, clm=4 (1 core, 1 worker): 6.27 ms. Throughput: 159.61 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=128, Pass2=8192, clm=2 (1 core, 1 worker): 6.01 ms. Throughput: 166.47 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=128, Pass2=8192, clm=1 (1 core, 1 worker): 6.27 ms. Throughput: 159.54 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=256, Pass2=4096, clm=4 (1 core, 1 worker): 5.99 ms. Throughput: 166.90 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=256, Pass2=4096, clm=2 (1 core, 1 worker): 5.91 ms. Throughput: 169.30 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=256, Pass2=4096, clm=1 (1 core, 1 worker): 5.95 ms. Throughput: 167.98 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=512, Pass2=2048, clm=4 (1 core, 1 worker): 6.39 ms. Throughput: 156.51 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=512, Pass2=2048, clm=2 (1 core, 1 worker): 5.79 ms. Throughput: 172.74 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=512, Pass2=2048, clm=1 (1 core, 1 worker): 5.88 ms. Throughput: 169.99 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=1024, Pass2=1024, clm=4 (1 core, 1 worker): 6.82 ms. Throughput: 146.69 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=1024, Pass2=1024, clm=2 (1 core, 1 worker): 5.98 ms. Throughput: 167.16 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=1024, Pass2=1024, clm=1 (1 core, 1 worker): 5.98 ms. Throughput: 167.31 iter/sec.

[Fri Aug 31 08:59:20 2018]
FFTlen=1024K, Type=3, Arch=4, Pass1=128, Pass2=8192, clm=4 (1 core, 1 worker): 6.41 ms. Throughput: 155.96 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=128, Pass2=8192, clm=2 (1 core, 1 worker): 6.24 ms. Throughput: 160.18 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=128, Pass2=8192, clm=1 (1 core, 1 worker): 6.42 ms. Throughput: 155.73 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=256, Pass2=4096, clm=4 (1 core, 1 worker): 6.03 ms. Throughput: 165.80 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=256, Pass2=4096, clm=2 (1 core, 1 worker): 5.95 ms. Throughput: 168.20 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=256, Pass2=4096, clm=1 (1 core, 1 worker): 6.00 ms. Throughput: 166.73 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=512, Pass2=2048, clm=4 (1 core, 1 worker): 6.41 ms. Throughput: 155.97 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=512, Pass2=2048, clm=2 (1 core, 1 worker): 5.80 ms. Throughput: 172.37 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=512, Pass2=2048, clm=1 (1 core, 1 worker): 5.90 ms. Throughput: 169.39 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=1024, Pass2=1024, clm=4 (1 core, 1 worker): 6.86 ms. Throughput: 145.86 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=1024, Pass2=1024, clm=2 (1 core, 1 worker): 6.00 ms. Throughput: 166.76 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=1024, Pass2=1024, clm=1 (1 core, 1 worker): 6.00 ms. Throughput: 166.67 iter/sec.

[Sat Sep 1 05:59:19 2018]
FFTlen=1024K, Type=3, Arch=4, Pass1=128, Pass2=8192, clm=4 (1 core, 1 worker): 6.26 ms. Throughput: 159.85 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=128, Pass2=8192, clm=2 (1 core, 1 worker): 5.99 ms. Throughput: 166.89 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=128, Pass2=8192, clm=1 (1 core, 1 worker): 6.26 ms. Throughput: 159.64 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=256, Pass2=4096, clm=4 (1 core, 1 worker): 6.03 ms. Throughput: 165.89 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=256, Pass2=4096, clm=2 (1 core, 1 worker): 5.93 ms. Throughput: 168.50 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=256, Pass2=4096, clm=1 (1 core, 1 worker): 5.99 ms. Throughput: 167.02 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=512, Pass2=2048, clm=4 (1 core, 1 worker): 6.45 ms. Throughput: 154.93 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=512, Pass2=2048, clm=2 (1 core, 1 worker): 5.81 ms. Throughput: 171.99 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=512, Pass2=2048, clm=1 (1 core, 1 worker): 5.92 ms. Throughput: 169.03 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=1024, Pass2=1024, clm=4 (1 core, 1 worker): 6.88 ms. Throughput: 145.26 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=1024, Pass2=1024, clm=2 (1 core, 1 worker): 6.02 ms. Throughput: 166.14 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=1024, Pass2=1024, clm=1 (1 core, 1 worker): 6.01 ms. Throughput: 166.43 iter/sec.

[Sun Sep 2 02:59:18 2018]
FFTlen=1024K, Type=3, Arch=4, Pass1=128, Pass2=8192, clm=4 (1 core, 1 worker): 6.26 ms. Throughput: 159.73 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=128, Pass2=8192, clm=2 (1 core, 1 worker): 6.03 ms. Throughput: 165.84 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=128, Pass2=8192, clm=1 (1 core, 1 worker): 6.30 ms. Throughput: 158.66 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=256, Pass2=4096, clm=4 (1 core, 1 worker): 6.04 ms. Throughput: 165.66 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=256, Pass2=4096, clm=2 (1 core, 1 worker): 5.93 ms. Throughput: 168.62 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=256, Pass2=4096, clm=1 (1 core, 1 worker): 5.94 ms. Throughput: 168.32 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=512, Pass2=2048, clm=4 (1 core, 1 worker): 6.42 ms. Throughput: 155.69 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=512, Pass2=2048, clm=2 (1 core, 1 worker): 5.82 ms. Throughput: 171.95 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=512, Pass2=2048, clm=1 (1 core, 1 worker): 5.89 ms. Throughput: 169.74 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=1024, Pass2=1024, clm=4 (1 core, 1 worker): 6.86 ms. Throughput: 145.75 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=1024, Pass2=1024, clm=2 (1 core, 1 worker): 6.01 ms. Throughput: 166.45 iter/sec.
FFTlen=1024K, Type=3, Arch=4, Pass1=1024, Pass2=1024, clm=1 (1 core, 1 worker): 6.00 ms. Throughput: 166.70 iter/sec.

However Prime95 (Mac OS X 64-bit,Prime95,v29.4,build 7) running on a MacBook Air (Intel(R) Core(TM) i5-5250U CPU @ 1.60GHz) prefers:

Using FMA3 FFT length 1M, Pass1=256, Pass2=4K, clm=2

Based on the clm test results, shouldn't Prime95 be using Pass1=512, Pass2=2048, clm=2 instead?

Prime95 2018-09-03 20:11

[QUOTE=harlee;495265]Based on the clm test results, shouldn't Prime95 be using Pass1=512, Pass2=2048, clm=2 instead?[/QUOTE]

Yes. Can you post or email me the full gwnum.txt file? If you exit and restart prime95 does it still select the wrong clm?

harlee 2018-09-04 00:30

1 Attachment(s)
[QUOTE=Prime95;495276]Yes. Can you post or email me the full gwnum.txt file? If you exit and restart prime95 does it still select the wrong clm?[/QUOTE]

[Sep 3 20:22] Optimal P-1 factoring of M19970131 using up to 4096MB of memory.
[Sep 3 20:22] Assuming no factors below 2^65 and 3 primality tests saved if a factor is found.
[Sep 3 20:22] Optimal bounds are B1=350000, B2=9800000
[Sep 3 20:22] Chance of finding a factor is an estimated 6.63%
[Sep 3 20:22] Using FMA3 FFT length 1M, Pass1=256, Pass2=4K, clm=2

harlee 2018-09-04 23:04

Sorry, can't edit my previous post. Yes, I stopped and restarted prime95 and got the messages in the previous post when the application started. Also, I run one worker window on one CPU core.

Prime95 2018-09-05 01:27

I think prime95 is confused because you are not using all the CPU cores -- that is, prime95 is looking for the best benchmark using 2 cores, 1 worker.

As a workaround, try putting NumCPUs=1 in local.txt and see if prime95 now uses the correct FFT implementation.

harlee 2018-09-05 19:31

Thanks! The workaround worked.

[Sep 5 15:29] Using FMA3 FFT length 1M, Pass1=512, Pass2=2K, clm=2


All times are UTC. The time now is 03:22.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.