mersenneforum.org LLR version 3.8.24 released.
 Register FAQ Search Today's Posts Mark Forums Read

2020-10-18, 22:55   #23
R. Gerbicz

"Robert Gerbicz"
Oct 2005
Hungary

1,459 Posts

Quote:
 Originally Posted by paulunderwood Fermat's Little theorem states b^(N-1) == 1 mod N for prime N (and gcd(b,N)==1). For N=2^n-c this means for b=3 that 3^(2^n-c-1) == 1 (mod 2^n-c). This can be rewritten as 3^(2^n) == 3^(c+1) (mod 2^n-c). The left hand side is just squarings; The right hand side takes ~log(c) iterations. At the moment LLR does ~n multiplications of 3 times an n bit number which adds up to a 3-5% overhead. Can a similar argument hold for k*2^n-c? Yes! (3^k)^(2^n) == 3^(c+1) (mod k*2^n-c).
Basically in the algorithm there is "almost" only squarings and very few multplications (at the stuff for error computation).

For N=(k*b^n+c)/d using a^d as "base" with Fermat's little theorem if N is prime then:

(a^d)^((k*b^n+c)/d)==a^d mod N but then

a^(k*b^n+c)==a^d mod (d*N) is also true, we're doing this because reduction mod (d*N) is easier, note that here we're already weaker than a standard Fermat test. From this

(a^k)^(b^n)==a^(d-c) mod (d*N)

Ofcourse b=2 is fixed to enable the fast checks. It isn't interesting if d-c<0 because in general "a" is small, or you can even divide the left side by the right side to avoid any inverse multiplication
computation in this case.

And you can also delay the k-th powering with: (a^(b^n))^k.

Last fiddled with by R. Gerbicz on 2020-10-18 at 22:59 Reason: typos

2020-10-28, 19:54   #24
Happy5214

"Alexander"
Nov 2008
The Alamo City

577 Posts

Quote:
 Originally Posted by henryzz Still only 32-bit for windows.
Still waiting on a 64-bit Windows binary. I want to run timing tests on a laptop I'm trying to bring online, but they'll be off with 32-bit or an old version (more with the latter I assume).

2020-10-29, 11:48   #25
henryzz
Just call me Henry

"David"
Sep 2007
Cambridge (GMT/BST)

133428 Posts

Quote:
 Originally Posted by Happy5214 Still waiting on a 64-bit Windows binary. I want to run timing tests on a laptop I'm trying to bring online, but they'll be off with 32-bit or an old version (more with the latter I assume).
Is there a possibility of running WSL?

2020-10-29, 19:39   #26
Happy5214

"Alexander"
Nov 2008
The Alamo City

24116 Posts

Quote:
 Originally Posted by henryzz Is there a possibility of running WSL?
Not at the moment. It has Windows 8.1 on it, and I haven't gotten around to upgrading it to 10.

 2020-11-02, 20:39 #27 Jean Penné     May 2004 FRANCE 52·23 Posts Switching to 3.8.24 build 3 Hi, Jeff Gilchrist warned me about an issue in LLR 3.8.24 build 2 : The default options for the Fermat PRP testing are : Gerbicz error checking (-oErrorChecking=1) and Random shift on "a" at the beginning (-oShifting=1) Gerbicz is OK for these tests, but the random shift can work only for k*2^n+c numbers if k==1 AND c == +1 or -1 The bug is that my code did require k == 1, but did not test for c, before setting this random shift... To avoid any misunderstanding, I renamed all the links in the index.html files but, indeed, you have to download again all the files to be up to date! Would you excuse me for these drawbacks, and Best Regards, Jean
 2020-11-02, 21:07 #28 paulunderwood     Sep 2002 Database er0rr 3×17×71 Posts Thanks Jean. The lastest LLR with GEC seems much slower, compared to older versions. I understand there is computational overhead for GEC and welcome better error detection. This is what I am seeing: Code: ^C3322955+15, bit: 540000 / 3322956 [16.25%]. Time per bit: 0.272 ms. Code: ^Ceration: 1050000 / 3322955 [31.10%], ms/iter: 0.334, ETA: 00:12:39 Last fiddled with by paulunderwood on 2020-11-02 at 21:09
2020-11-02, 21:32   #29
R. Gerbicz

"Robert Gerbicz"
Oct 2005
Hungary

1,459 Posts

Quote:
 Originally Posted by paulunderwood Thanks Jean. The lastest LLR with GEC seems much slower, compared to older versions. I understand there is computational overhead for GEC and welcome better error detection. This is what I am seeing: Code: ^C3322955+15, bit: 540000 / 3322956 [16.25%]. Time per bit: 0.272 ms. Code: ^Ceration: 1050000 / 3322955 [31.10%], ms/iter: 0.334, ETA: 00:12:39
You shouldn't see such slowdown due to the checking. See what gpuowl is using L=400 or L=1000 with that the slowdown is "only" 2/L part of the total running time. p95 is using a not fixed L, changing it in a dynamic way if you see error then you decrease L; if for a while you don't see errors you increase it up to even max L=1000 [as I can remember].

The theoretic barrier for my check is that you could see even only 2/sqrt(n) part for the slowdown for a given N=(k*2^n+c)/d, but it is not that recommended because in that case you would do only one error check in the whole run. [so even a fixed L=400, 1000 is much better, giving a pretty small slowdown, but more checks]. Though for a smallish n, say n<1e6 you could use smaller L, because L^2>n is suboptimal.

 2020-11-05, 23:35 #30 diep     Sep 2006 The Netherlands 26·11 Posts Which LLR version is adviced to run on old AMD magny cours processors? Have 48 core box operational with magny cours 2.2ghz processors. Very happy about it. LLR gives CPU fault. That would be the same LLR version that's on the intel xeons here where it runs fine for years. diep@thegathering:/home/69/test3$./sllr64 -v LLR Program - Version 3.8.21, using Gwnum Library Version 28.14 Which LLR version is adviced to use? Many Thanks in advance, Vincent 2020-11-06, 06:39 #31 Happy5214 "Alexander" Nov 2008 The Alamo City 577 Posts Quote:  Originally Posted by diep Which LLR version is adviced to run on old AMD magny cours processors? Have 48 core box operational with magny cours 2.2ghz processors. Very happy about it. LLR gives CPU fault. That would be the same LLR version that's on the intel xeons here where it runs fine for years. diep@thegathering:/home/69/test3$ ./sllr64 -v LLR Program - Version 3.8.21, using Gwnum Library Version 28.14 Which LLR version is adviced to use? Many Thanks in advance, Vincent
I have an Intel Core 2 Quad from 2009 (which is even older than Magny-Cours, according to Wikipedia), and the latest LLR versions work fine on it. Have you tried building it from source or using a more recent version? I don't know if the 3.8.21 build had any particular issues with older CPUs, since I don't think I ever used that particular version. I know for sure that 3.8.23 and 3.8.24 both work on my box.

Last fiddled with by Happy5214 on 2020-11-06 at 06:57 Reason: More detail

 2020-11-23, 15:29 #32 diep     Sep 2006 The Netherlands 26×11 Posts Good Morning! On a NUMA system i notice a huge performance issue with LLR. let's first explain the simple problem. Probably we all know what numa systems are. memory at other sockets (or attached in case of threadripper to other cpu module as it is 8 modules of course) needs to get memory from other sockets via a crossbar. Not only is that slower it also kind of burns up the crossbar which has a limited bandwidth and not a forever life. At a 4 socket board, i have one with a broken socket (and 2 with junk in it) which after much work i managed to get to work. So i have 3 socket system. With very noticable differences in latency local memory versus remote. Now in a perfect world if the kernel schedules LLR at the correct socket (so that would be 12 cores in case of this magny cours) then everything goes fine. Regrettably the linux kernel is not so clever there and it doesn't have knowledge about LLR so it's logical this problem happens. So i get huge timing differences. Now with taskset you can set your program to execute on different cores. Regrettably that's a useless command because it doesn't migrate the memory. So for example if i start LLR at 4 threads and have system start it, odds are ver high it starts at the wrong socket. Say that's socket 0. Now it allocates memory at socket 0 and it starts executing. Then i want it run it at a different cpu. Say socket number 2. As there is 36 cores in this system that's taskset -p 0xfff000000 4000 In case the procesID is 4000. In total 9 characters hexadecimal as 9 x 4 = 36 cores. (In fact i would give it probably then taskset - p 0xfc0000000 as each cpu in itself is again a double cpu of 2x6 cores. ) So memory already allocated at socket number 0 then gets accesses from cores at cpu 2. This is duck slow. Problem seems to be: The kernel doesn't migrate the memory already allocated by the proces to socket number 2. It stays on socket number 0. That causes huge timing differences to happen. Sometimes nearly factor 2 here. Some proces that should take no more than 7500 - 9500 seconds is really 17000 seconds slow here. And a few of them actually do. Yet the ones scheduled right from the start wrong are screwed forever. So if LLR would have a command line option which set of cpu cores (like 0-5 or 29-35) to bind to and allocate RAM from that would improve LLR times significantly and heat up the HT links less.
 2020-11-23, 16:27 #33 axn     Jun 2003 2·2,459 Posts You can start a command using taskset, not just set the affinity of existing process.

 Similar Threads Thread Thread Starter Forum Replies Last Post Jean Penné Software 51 2019-04-10 06:04 Jean Penné Software 11 2017-02-23 08:52 Jean Penné Software 38 2015-12-10 07:31 Jean Penné Software 28 2015-08-04 04:51 opyrt Prime Sierpinski Project 11 2010-11-18 18:24

All times are UTC. The time now is 14:11.

Mon Apr 19 14:11:26 UTC 2021 up 11 days, 8:52, 0 users, load averages: 2.87, 2.76, 2.64