mersenneforum.org Mlucas v19 available
 User Name Remember Me? Password
 Register FAQ Search Today's Posts Mark Forums Read

 2021-01-13, 15:17 #67 pvn   Nov 2020 22 Posts tdulcet, ah, this is very helpful. I spent a good bit of time yesterday doing something similar, building essentially a barebones version of this to build docker images. For intel, I just built multiple binaries (for sse, avx, avx2, avx512) and use an entrypoint script to determine at runtime what hardware is available and run the right binary. I have had some trouble with the build on arm, though, so for now I'm just using the precompiled binaries, but similar routine in the entrypoint script to run the nosmid/c2smid binary as needed. the docker image is at pvnovarese/mlucas_v19:latest (it's a multi-arch image, with both aaarch64 and x86_64) Dockerfile etc can be found here: https://github.com/pvnovarese/mlucas_v19 I will review your script as well, it looks like you've thought a lot more about this than I have :)
2021-01-14, 00:09   #68
ewmayer
2ω=0

Sep 2002
República de California

2·7·829 Posts

Quote:
 Originally Posted by tdulcet My install script for Linux currently follows the recommended instructions on the Mlucas README for each architecture to hopefully provide the best performance for most users, but I would be interested in adding this feature to automatically try different combinations of CPU cores/threads and then picking the one with the best performance, although I am not sure what the correct procedure is to do this for each architecture and CPU or how the -DUSE_THREADS compile flag factors in. The scripts goal is to automate the entire download, build, setup and run process for Mlucas, so I think this could be an important component of that. I have not received any feedback on the script so far, so I am also not even sure if there is any interest in this feature or what percentage of systems it would affect.
-DUSE_THREADS is needed to enable multithreaded build mode; without it you get a single-threaded build, which would only be useful if all you ever wanted to do is run one such 1-thread job per core. Even in that, the core-affinity stuff (-cpu argument) is not available for such builds, so you'd basically be stuck firing up a bunch of executable images, each from its own run directory (unique worktodo.ini file, copy of mlucas.cfg and primenet.py script to manage work for that directory) and hoping the OS does a good job managing the core affinities.

(Basically, there's just no good reason to omit the above flag anymore).

Re. some kind of script to automate the self-testing using various suitable candidate -cpu arguments, that would indeed be useful. George uses the freeware hwloc library in his Prime95 code to suss out the topology of the machine running the code - I'd considered using it for my own as well in the past, but had seen a few too many threads that boiled down to "hwloc doesn't work properly on my machine" and needing some intervention re. that library by George for my taste. In any event, let me think on it more, and perhaps some playing-around with that library by those of you interested in this aspect would be a good starting point.

2021-01-14, 13:21   #69
tdulcet

"Teal Dulcet"
Jun 2018

3×7 Posts

Quote:
 Originally Posted by pvn Dockerfile etc can be found here: https://github.com/pvnovarese/mlucas_v19 I will review your script as well, it looks like you've thought a lot more about this than I have :)
Nice! Thanks. With my script you should be able to compile Mlucas on demand, since is uses a parallel Makefile with one job for each CPU thread, it should only take a couple minutes or less to compile on most systems. It uses the -march=native compile flag on x86 systems, so the resulting binaries should also be slightly faster, although they are generally not portable. What was the issue you had building on ARM?

There is a longstanding issue with 32-bit ARM, where the mi64.c file hangs when compiling with GCC. If you remove the -O3 optimization you get these errors:
Quote:
 ../src/mi64.c: In function ‘mi64_shl_short’: ../src/mi64.c:1038:2: error: unknown register name ‘rsi’ in ‘asm’ __asm__ volatile (\ ^~~~~~~ ../src/mi64.c:1038:2: error: unknown register name ‘rcx’ in ‘asm’ ../src/mi64.c:1038:2: error: unknown register name ‘rbx’ in ‘asm’ ../src/mi64.c:1038:2: error: unknown register name ‘rax’ in ‘asm’ ../src/mi64.c: In function ‘mi64_shrl_short’: ../src/mi64.c:1536:2: error: unknown register name ‘rsi’ in ‘asm’ __asm__ volatile (\ ^~~~~~~ ../src/mi64.c:1536:2: error: unknown register name ‘rcx’ in ‘asm’ ../src/mi64.c:1536:2: error: unknown register name ‘rbx’ in ‘asm’ ../src/mi64.c:1536:2: error: unknown register name ‘rax’ in ‘asm’
Quote:
 Originally Posted by ewmayer (Basically, there's just no good reason to omit the above flag anymore).
OK, thanks for the info. That is what I thought. I just wanted to make sure that there was not some edge case where my script should omit the flag.

Quote:
 Originally Posted by ewmayer Re. some kind of script to automate the self-testing using various suitable candidate -cpu arguments, that would indeed be useful. George uses the freeware hwloc library in his Prime95 code to suss out the topology of the machine running the code - I'd considered using it for my own as well in the past, but had seen a few too many threads that boiled down to "hwloc doesn't work properly on my machine" and needing some intervention re. that library by George for my taste. In any event, let me think on it more, and perhaps some playing-around with that library by those of you interested in this aspect would be a good starting point.
OK, I was just thinking that there was some procedure my script could use given the CPU (Intel, AMD or ARM), the number of CPU Cores and the number of CPU threads to generate all possible candidate combinations for the -cpu argument that could realistically generate the best performance. It could then try the different candidate combinations (as described in the two examples of your previous post) and pick the one with the best performance.

Based on the "Advanced Users" and "Advanced Usage" sections of the Mlucas README, for an example 8 core/16 thread system, this is my best guess of the candidate combinations to try with the -cpu argument:

Intel
Code:
0     (1-threaded)
0,8     (2 threads per core, 1-threaded) (current default)
0:3,8:11     (2 threads per core, 4-threaded)

AMD

Code:
0     (1-threaded)
0:1     (2 threads per core, 1-threaded) (current default)
0:15     (2 threads per core, 8-threaded)

ARM
Code:
0     (1-threaded)
0:3     (4-threaded) (current default)
0:7     (8-threaded)
I am not sure if these are all the combinations worth testing or if we could rule any of them out.

Last fiddled with by tdulcet on 2021-01-14 at 13:22

2021-01-16, 21:06   #70
ewmayer
2ω=0

Sep 2002
República de California

2·7·829 Posts

@tdulcet: Extremely busy this past month working on a high-priority 'intermediate' v19.1 release (this will restore Clang/llvm buildability on Arm, problem was first IDed on the new Apple M1 CPU but is more general), alas no time to give the automation of best-total-throughput-finding the attention it deserves. But that's where folks like you come in. :)

First off - the mi64.c compile issue has been fixed in the as-yet-unreleased 19.1 code, as the mods in that file are small I will attach it here, suggest you save a copy of the old one so you can diff and see the changes for yourself. Briefly, a big chunk of x86_64 inline-asm needed extra wrapping inside a '#ifdef YES_ASM' preprocessor directive. That flag is def'd (or not) in mi64.h like so:
Code:
  #if(defined(CPU_IS_X86_64) && defined(COMPILER_TYPE_GCC) && (OS_BITS == 64))
#define YES_ASM
#endif
Re. your core/thread-combos-to-try on an example 8c/16t system, those look correct. The remaining trick, though, is figuring out which of the most promising c/t combos give the best total-throughput on the user's system. For example - sticking to just 1-thread-per-physical-core for the moment - we expect 1t to run roughly 2x slower that 2t. Say the ratio is 1.8, and the user has an 8-core system. The real question is, how does the total-throughput compare for 8x1t jobs versus 4x2t?

Similarly, we usually see a steep dropoff in || scaling beyond 4 cores - but that need not imply that running two 4-thread jobs is better than one 8-thread one. If said dropoff is due to the workload saturing the memory banwidth, we might well see a similar performance hit with two 4-thread jobs
Attached Files
 mi64.c.bz2 (75.6 KB, 24 views)

 2021-01-16, 22:23 #71 ewmayer ∂2ω=0     Sep 2002 República de California 2×7×829 Posts Addendum: OK, I think the roadmap needs to look something like this - abbreviation-wise, 'c' refers to physical cores, 't' to threadcount: 1. Based on the user's HW topology, identify a set of 'most likely to succeed' core/thread combos, like tdulcet did in his above post. For x86 this needs to take into account the different core-numbering conventions used by Intel and AMD; 2. For each combo in [1], run the automated self-tests, and save the resulting mlucas.cfg file under a unique name, e.g. for 4c/8t call it mlucas.cfg.4c.8t; 3. The various cfg-files hold the best FFT-radix combo to use at each FFT length for the given c/t combo, i.e. in terms of maximizing total throughput on the user's system we can focus on just those. So let's take a hypothetical example: Say on my 8c/16t AMD processor the round of self-tests in [1] has shown that using just 1c, 1c2t is 10% faster than 1c1t. We now need to see how 1c2t scales to all physical cores, across the various FFT lengths in the self-test. E.g. at FFT length 4096K, say the best radix combo found for 1c2t is 64,32,32,32 (note the product of those = 2048K rather than 4096K because to match general GIMPS convention "FFT length" refers to #doubles, but Mlucas uses an underlying complex-FFT, so the individual radices are complex and refer to pairs-of-doubles). So we next want to fire up 8 separate 1c2t jobs at 4096K, each using that radix combo and running on a distinct physical core, thus our 8 jobs would use -cpu flags (I used AMD for my example to avoid the comm confusion Inte;'s convention would case here) 0:1,2:3,4:5,6:7,8:9,10:11,12:13 and 14:15, respectively. I would further like to specify the foregoing radix combo via the -radset flag, but here we hit a small snag: at present, there is no way to specify an actual radix-combo. Instead one must find the target FFT length in the big case() table in get_fft_radices.c and match the desired radix-combo to a case-index. For 4096K, we see 64,32,32,32 maps to 'case 7', so we'd use -radset 7 for each of our 8 launch-at-same-time jobs. I may need to do some code-fiddling to make that less awkward. Anyhow, since we're now using just 1 radix-combo at each FFT length and we want a decent timing sample not dominated by start-up init and thread-management overhead, we might use -iters 1000 for each of our 8 jobs. Launch at more-or-less same time, they will have a range of msec/iter timings t0-t7 which we convert into total throughput in iters/sec via 1000*(1/t0+1/t1+1/t2+1/t3+1/t4+1/t5+1/t6+1/t7). Repeat for each FFT length of interest, generating a set of total throughput numbers. 4. Repeast [3] for each c/t combo in [1]. It may well prove the case that a single c/t combo does not give best total throughput across all FFT lengths, but for a first cut it seems best to somehow generate some kind of weighted average-across-all-FFT-lengths for each c/t combo and pick the best one. In [3] we generated total throughput iters/sec numbers at each FFT length, maybe multiply each by its corresponding FFT length and sum over all FFT lengths. Last fiddled with by ewmayer on 2021-01-16 at 22:24
2021-01-17, 16:05   #72
tdulcet

"Teal Dulcet"
Jun 2018

3×7 Posts

Quote:
 Originally Posted by ewmayer @tdulcet: Extremely busy this past month working on a high-priority 'intermediate' v19.1 release (this will restore Clang/llvm buildability on Arm, problem was first IDed on the new Apple M1 CPU but is more general), alas no time to give the automation of best-total-throughput-finding the attention it deserves. But that's where folks like you come in. :)
No problem. I will look forward to your new v19.1 release.

Quote:
 Originally Posted by ewmayer First off - the mi64.c compile issue has been fixed in the as-yet-unreleased 19.1 code, as the mods in that file are small I will attach it here
Thanks for the fix. This had been preventing me from running Mlucas on my Raspberry Pis for a couple years, so that is great that it will now work.

Quote:
 Originally Posted by ewmayer Re. your core/thread-combos-to-try on an example 8c/16t system, those look correct. The remaining trick, though, is figuring out which of the most promising c/t combos give the best total-throughput on the user's system. For example - sticking to just 1-thread-per-physical-core for the moment - we expect 1t to run roughly 2x slower that 2t. Say the ratio is 1.8, and the user has an 8-core system. The real question is, how does the total-throughput compare for 8x1t jobs versus 4x2t?
Yes, this will be difficult. I implemented a preliminary version that follows the instructions on the Mlucas README. Specifically, it will multiply the 4x2t msec/iter times by 1.5 before comparing them. Multiplying by 2 would of course produce different results in this case.

Quote:
 Originally Posted by ewmayer Addendum: OK, I think the roadmap needs to look something like this
Wow, thanks for the detailed roadmap, it is very helpful!

1. OK, I wrote Bash code to automatically generate the combinations from my previous post above, for the user's CPU and number of CPU cores/threads. It will generate a nice table like one of these for example:
Code:
The CPU is Intel.
#  Workers/Runs  Threads         -cpu arguments
1  1             16, 2 per core  0:15
2  1             8, 1 per core   0:7
3  2             4, 1 per core   0:3 4:7
4  4             2, 1 per core   0:1 2:3 4:5 6:7
5  8             1, 1 per core   0 1 2 3 4 5 6 7
6  2             4, 2 per core   0:3,8:11 4:7,12:15
7  4             2, 2 per core   0:1,8:9 2:3,10:11 4:5,12:13 6:7,14:15
8  8             1, 2 per core   0,8 1,9 2,10 3,11 4,12 5,13 6,14 7,15

The CPU is AMD.
#  Workers/Runs  Threads         -cpu arguments
1  1             16, 2 per core  0:15
2  1             8, 1 per core   0:15:2
3  2             4, 1 per core   0:7:2 8:15:2
4  4             2, 1 per core   0:3:2 4:7:2 8:11:2 12:15:2
5  8             1, 1 per core   0 2 4 6 8 10 12 14
6  2             4, 2 per core   0:7 8:15
7  4             2, 2 per core   0:3 4:7 8:11 12:15
8  8             1, 2 per core   0:1 2:3 4:5 6:7 8:9 10:11 12:13 14:15

The CPU is ARM.
#  Workers/Runs  Threads  -cpu arguments
1  1             8        0:7
2  2             4        0:3 4:7
3  4             2        0:1 2:3 4:5 6:7
4  8             1        0 1 2 3 4 5 6 7
The combinations are the same as my previous post above, except I added a 2-threaded combination for ARM and the ordering is different.

2. Done.
3./4. Interesting, this is going to be a lot more complex to implement then I originally thought. The switch statement in get_fft_radices.c is too big to store in my script and creating an awk command to extract the case number based on the FFT length and radix combo would obviously be extremely difficult, particularly because there are nested switch statements. I am going to have to think about how best to do this... I welcome suggestions from anyone who is reading this. In the meantime, I wrote code to directly compare the adjusted msec/iter times from the mlucas.cfg files from step #2. This of course does not account for any of the scaling issues that @ewmayer described. It will generate two tables (the fastest combination and the rest of the combinations tested) like these for my 6 core/12 thread Intel system for example:
Code:
Fastest
#  Workers/Runs  Threads        First -cpu argument  Adjusted msec/iter times
6  6             1, 2 per core  0,6                  8.47  9.69  10.72  12.26  12.71  14.53  14.76  16.54  16.1  18.89  20.94  23.94  26.39  28.85  29.16  32.98

Mean/Average faster     #  Workers/Runs  Threads         First -cpu argument  Adjusted msec/iter times
3.248 ± 0.101 (324.8%)  1  1             12, 2 per core  0:11                 28.92  31.74  33.78  38.64  42.66  44.52  46.56  51.06  52.26  61.8  70.14  79.38  88.62  92.94  97.26  106.2
3.627 ± 0.146 (362.7%)  2  1             6, 1 per core   0:5                  34.14  34.8  39.66  45.48  47.28  51.18  51.3  56.64  60.78  66.24  73.02  87.78  96.12  102.6  108.12  116.22
2.607 ± 0.068 (260.7%)  3  2             3, 1 per core   0:2                  22.98  25.53  27.3  30.12  33.63  37.83  36.72  42.15  42.69  48.9  54.66  61.68  71.19  76.44  77.49  87.36
1.736 ± 0.029 (173.6%)  4  6             1, 1 per core   0                    14.41  17.1  18.46  20.88  22.72  24.99  25.36  28.38  28.82  32.67  36.06  40.67  46.12  49.87  51.32  58.26
1.816 ± 0.047 (181.6%)  5  2             3, 2 per core   0:2,6:8              16.11  18.09  19.32  21.99  23.64  25.41  25.92  29.85  30.57  33.78  38.7  42.42  48.57  51.12  52.53  59.19
The two tables show that 1-threaded with 2 threads per core is ~1.7 times faster then 1-threaded with 1 thread per core for example.

2021-01-19, 13:46   #74
tdulcet

"Teal Dulcet"
Jun 2018

2110 Posts

Quote:
 Originally Posted by ewmayer @tdulcet - How about I add support in v19.1 for the -radset flag to take either an index into the big table, or an actual set of comma-separated FFT radices?
That would be very helpful to automate this!

Quote:
 Originally Posted by ewmayer Edit: Why make people wait - here is a modified version of Mlucas.c which supports the above-described -radset argument.
Wow, thanks for doing it so quickly! This will be very helpful. I committed and pushed the the changes I described in my previous post to GitHub here, which basically implements step # 1, 2 and part of 4. I will now get started on step 3 and the rest of 4 using your new version of Mlucas.c.

In my previous post on an example 8c/16t system, I said it will multiply the 4x2t msec/iter times by 1.5 before comparing them to the 8x1t times, following the instructions on the Mlucas README. After doing more testing, I was getting unexpected results with this formula ((CPU cores / workers) - 0.5), so it will now multiply the times by 2 (CPU cores / workers) for this example. This should be irrelevant once I implement step 3.

I thought I should note that some systems like the Intel Xeon Phi can have more then two CPU threads per CPU core. The Mlucas README does not mention this case, but my script should correctly handle it for Intel and AMD x86 systems. For example, on a 64 core/256 thread Intel Xeon Phi system it would try these combinations (only showing the first -cpu argument for brevity):
Code:
#   Workers/Runs  Threads          -cpu arguments
1   1             64, 1 per core   0:63
2   2             32, 1 per core   0:31
3   4             16, 1 per core   0:15
4   8             8, 1 per core    0:7
5   16            4, 1 per core    0:3
6   32            2, 1 per core    0:1
7   64            1, 1 per core    0
8   1             128, 2 per core  0:63,64:127
9   2             64, 2 per core   0:31,64:95
10  4             32, 2 per core   0:15,64:79
11  8             16, 2 per core   0:7,64:71
12  16            8, 2 per core    0:3,64:67
13  32            4, 2 per core    0:1,64:65
14  64            2, 2 per core    0,64
15  1             256, 4 per core  0:63,64:127,128:191,192:255
16  2             128, 4 per core  0:31,64:95,128:159,192:223
17  4             64, 4 per core   0:15,64:79,128:143,192:207
18  8             32, 4 per core   0:7,64:71,128:135,192:199
19  16            16, 4 per core   0:3,64:67,128:131,192:195
20  32            8, 4 per core    0:1,64:65,128:129,192:193
21  64            4, 4 per core    0,64,128,192

Last fiddled with by tdulcet on 2021-01-19 at 13:52

 2021-01-19, 20:19 #75 ewmayer ∂2ω=0     Sep 2002 República de California 1160610 Posts @tdulcet: Glad to be of service to someone else who wants be of service, or something. :) o Re. KNL, yes I have a barebones one sitting next to me and running a big 64M-FFT primality test, 1 thread on each of physical cores 0:63. On KNL I've never found any advantage from running this kind of code with more than 1 thread per physical core. o One of your timing sample above mentioned getting nearly 2x speedup from running 2 threads on 1 physical core, with the other cores unused. I suspect that may be the OS actually putting 1 thread on each of 2 physical cores. Remember, those pthread affinity settings are treated as *hints* to the OS, we hope that under heavy load the OS will respect them because there are no otherwise-idle physical cores it can bounce threads to. o You mentioned the mi64.c missing-x86-preprocessor-flag-wrapper was keeping you from building on your Raspberry Pi - that was even with -O3? And did you as a result just use the precompiled Arm/Linux binaries on that machine?
 2021-01-20, 07:46 #76 joniano   Jan 2021 110 Posts Possible Symptoms of a Bug on ARM64 Build - Running too fast Hello Folks - I recently got Mlucas running on a Raspberry Pi 4, 8GB of RAM, running Ubuntu and I am doing PRP checks on large primes. I'm assuming either Mlucas is extremely fast and consistent or I'm running into some sort of a bug. If you look at a few lines of the ".stat" file for one of my recent primes, you'll see that every few seconds I blast through 10,000 iterations at exactly the same ms/iter speed and it seems to take under a day to fully PRP test a new number. Code: [2021-01-19 21:42:45] M110899639 Iter# = 110780000 [99.89% complete] clocks = 00:15:20.953 [ 92.0954 msec/iter] Res64: 461E323B49699D73. AvgMaxErr = 0.000000000. MaxErr = 0.000000000. Residue shift count = 13555775. [2021-01-19 21:42:48] M110899639 Iter# = 110790000 [99.90% complete] clocks = 00:15:20.953 [ 92.0954 msec/iter] Res64: 461E323B49699D73. AvgMaxErr = 0.000000000. MaxErr = 0.000000000. Residue shift count = 13555775. [2021-01-19 21:42:50] M110899639 Iter# = 110800000 [99.91% complete] clocks = 00:15:20.953 [ 92.0954 msec/iter] Res64: 461E323B49699D73. AvgMaxErr = 0.000000000. MaxErr = 0.000000000. Residue shift count = 13555775. [2021-01-19 21:42:52] M110899639 Iter# = 110810000 [99.92% complete] clocks = 00:15:20.953 [ 92.0954 msec/iter] Res64: 461E323B49699D73. AvgMaxErr = 0.000000000. MaxErr = 0.000000000. Residue shift count = 13555775. [2021-01-19 21:42:55] M110899639 Iter# = 110820000 [99.93% complete] clocks = 00:15:20.953 [ 92.0954 msec/iter] Res64: 461E323B49699D73. AvgMaxErr = 0.000000000. MaxErr = 0.000000000. Residue shift count = 13555775. [2021-01-19 21:42:57] M110899639 Iter# = 110830000 [99.94% complete] clocks = 00:15:20.953 [ 92.0954 msec/iter] Res64: 461E323B49699D73. AvgMaxErr = 0.000000000. MaxErr = 0.000000000. Residue shift count = 13555775. [2021-01-19 21:42:59] M110899639 Iter# = 110840000 [99.95% complete] clocks = 00:15:20.953 [ 92.0954 msec/iter] Res64: 461E323B49699D73. AvgMaxErr = 0.000000000. MaxErr = 0.000000000. Residue shift count = 13555775. [2021-01-19 21:43:01] M110899639 Iter# = 110850000 [99.96% complete] clocks = 00:15:20.953 [ 92.0954 msec/iter] Res64: 461E323B49699D73. AvgMaxErr = 0.000000000. MaxErr = 0.000000000. Residue shift count = 13555775. [2021-01-19 21:43:04] M110899639 Iter# = 110860000 [99.96% complete] clocks = 00:15:20.953 [ 92.0954 msec/iter] Res64: 461E323B49699D73. AvgMaxErr = 0.000000000. MaxErr = 0.000000000. Residue shift count = 13555775. [2021-01-19 21:43:06] M110899639 Iter# = 110870000 [99.97% complete] clocks = 00:15:20.953 [ 92.0954 msec/iter] Res64: 461E323B49699D73. AvgMaxErr = 0.000000000. MaxErr = 0.000000000. Residue shift count = 13555775. [2021-01-19 21:43:08] M110899639 Iter# = 110880000 [99.98% complete] clocks = 00:15:20.953 [ 92.0954 msec/iter] Res64: 461E323B49699D73. AvgMaxErr = 0.000000000. MaxErr = 0.000000000. Residue shift count = 13555775. [2021-01-19 21:43:10] M110899639 Iter# = 110890000 [99.99% complete] clocks = 00:15:20.953 [ 92.0954 msec/iter] Res64: 461E323B49699D73. AvgMaxErr = 0.000000000. MaxErr = 0.000000000. Residue shift count = 13555775. [2021-01-19 21:43:13] M110899639 Iter# = 110899639 [100.00% complete] clocks = 00:15:20.953 [ 95.5445 msec/iter] Res64: 461E323B49699D73. AvgMaxErr = 0.000000000. MaxErr = 0.000000000. Residue shift count = 13555775. M110899639 is not prime. Res64: 243C3E785D7D8345. Program: E19.0. Final residue shift count = 13555775 M110899639 mod 2^35 - 1 = 20387533375 M110899639 mod 2^36 - 1 = 12983321457 Does this look suspicious to anyone else? I also run Prime95 on a seemingly much more powerful Core i7-7700 and that is taking about 14 days to PRP-test a single number, which is what is making me question this. I'm glad to provide more detail if it would help troubleshoot.
2021-01-20, 09:28   #77
LaurV
Romulan Interpreter

Jun 2011
Thailand

5×17×109 Posts

Quote:
 Originally Posted by joniano Does this look suspicious to anyone else?
Yep. Very. The residues are the same, which is close to impossible. Like one in k chances, where k is much larger than the number of particles in the universe
Unfortunately I can't help, not the Linux neither mLucas guy.

 Similar Threads Thread Thread Starter Forum Replies Last Post ewmayer Mlucas 48 2019-11-28 02:53 ewmayer Mlucas 3 2017-06-17 11:18 Lorenzo Mlucas 52 2016-03-13 08:45 Unregistered Mlucas 0 2009-10-27 20:35 delta_t Mlucas 14 2007-10-04 05:45

All times are UTC. The time now is 00:29.

Sat Feb 27 00:29:11 UTC 2021 up 85 days, 20:40, 1 user, load averages: 3.17, 3.05, 3.42