 Forum: GpuOwl 2021-11-19, 14:49 Replies: 5 Views: 520 Posted By preda I don't understand what happened. The proof... I don't understand what happened. The proof residues are written once only, afterwards they are only read. The check of residues at startup is done CPU-side only (it's a very simple checksum over the...
 Forum: Software 2021-11-04, 09:26 Replies: 10 Views: 1,054 Posted By preda P-1 only "eliminates" an exponent if a factor is... P-1 only "eliminates" an exponent if a factor is found. I suggest using bounds of at least B1=1M,B2=20M. (I use B1=3M and B2=60M myself).
 Forum: Software 2021-09-24, 08:31 Replies: 7 Views: 994 Posted By preda In my understanding, VIRT in "top" for a process... In my understanding, VIRT in "top" for a process indicates "virtual" memory that is not mapped to physical memory. That would happen for example after a malloc() but before writing anything to the...
 Forum: Lounge 2021-09-06, 19:12 Replies: 1,842 Views: 196,011 Posted By preda Ivan Patzaichin Ivan Patzaichin was a legendary Romanian canoeist: https://en.wikipedia.org/wiki/Ivan_Patzaichin
 Forum: mersenne.ca 2021-06-14, 19:20 Replies: 16 Views: 1,792 Posted By preda Yes AFAIK the new formula is correct. ... Yes AFAIK the new formula is correct. fancySum(a, b) = a + b * (1 - a) == a + b - a * b
 Forum: mersenne.ca 2021-06-14, 06:03 Replies: 16 Views: 1,792 Posted By preda Let's consider a story: on a dangerous trip,... Let's consider a story: on a dangerous trip, somebody must first cross a lake, and afterwards the forest. In the lake there's an aligator that would eat him with 90% chances. In the unlikely event...
 Forum: Software 2021-06-02, 13:58 Replies: 54 Views: 11,471 Posted By preda There's one more interesting factoid about the... There's one more interesting factoid about the difference between "classic" FFT and NTT: when working with complex numbers in the classic FFT, the inverse transform is equal to the conjugate of...
 Forum: Software 2021-06-02, 13:33 Replies: 54 Views: 11,471 Posted By preda In the same setup (FFT 4M, Radeon VII, exponent... In the same setup (FFT 4M, Radeon VII, exponent around 107M) I gained about 33% performance by tweaking (with inline assembly) the low-level modular primitives (add, sub, mul). So now the performance...
 Forum: Software 2021-05-27, 07:03 Replies: 54 Views: 11,471 Posted By preda I was testing on a Radeon VII. I was testing on a Radeon VII.
 Forum: Software 2021-05-26, 20:33 Replies: 54 Views: 11,471 Posted By preda I did some preliminary performance measurements,... I did some preliminary performance measurements, and the initial results are a bit dissapointing -- the NTT being about 3x slower than the equivalent FP64. It seems the code is compute-bound now (vs....
 Forum: Software 2021-05-25, 06:50 Replies: 54 Views: 11,471 Posted By preda Yes this is one option I considered. It's tricky... Yes this is one option I considered. It's tricky to do the "shift every (couple of) FFT levels" well, because if it's done the conservative way (shift every level) it's wasteful on the precision...
 Forum: Software 2021-05-25, 06:41 Replies: 54 Views: 11,471 Posted By preda There is a difference of 3 bits between M61 and... There is a difference of 3 bits between M61 and the above 64-bit prime, which explains 3/2=1.5 bits of the difference. Another important element that enables 25bits is that now I'm using...
 Forum: Software 2021-05-25, 00:49 Replies: 54 Views: 11,471 Posted By preda Lately I've been experimenting with some non-FP64... Lately I've been experimenting with some non-FP64 FFT transforms. These are briefly some directions I've looked into, for representing the values that the FFT operates on: 1. a set of 4 SP...
 Forum: Math 2021-05-17, 12:17 Replies: 2 Views: 903 Posted By preda On the GPU, we are limited by the small number of... On the GPU, we are limited by the small number of "VGPRs" (registers) per workgroup that are available. Because we're operating at the upper limit of VGPRs, there's no much room to operate on two...
 Forum: GpuOwl 2021-05-13, 17:54 Replies: 4 Views: 723 Posted By preda It's fine to run without a config.txt if you... It's fine to run without a config.txt if you don't need it. It's just a facility to put the flags that you would otherwise pass on the command line, in a file. The format is exactly what you'd put on...
 Forum: Hardware 2021-04-29, 10:05 Replies: 77 Views: 12,010 Posted By preda TF (trial factoring) does not use FFTs. For... TF (trial factoring) does not use FFTs. For primality testing Mersenne numbers, there is LL and PRP; the two are very similar from an implementation perspective. They both require squaring very...
 Forum: Software 2021-04-27, 11:20 Replies: 54 Views: 11,471 Posted By preda The cost of small multiplication (sorry for the below being so trivial) At the core of a FFT there are "small multiplications", word-size or some small multiple of word-size, and I've been thinking a bit about their cost. ...
 Forum: Software 2021-04-27, 10:20 Replies: 54 Views: 11,471 Posted By preda Some interesting threads on the topic: ... Some interesting threads on the topic: https://www.mersenneforum.org/showthread.php?t=19486 https://www.mersenneforum.org/showthread.php?t=22622
 Forum: Software 2021-04-23, 06:51 Replies: 52 Views: 9,693 Posted By preda Thank you for the explanation of Shoup's mul-mod.... Thank you for the explanation of Shoup's mul-mod. In GCN (AMD GPU), there is a 32-bit mul_hi instruction, but there is no 64-bit mul_hi. "emulating" the 64-bit mul_hi is slow, almost as slow as the...
 Forum: Software 2021-04-18, 18:12 Replies: 52 Views: 9,693 Posted By preda So, does this mean that p=M31 has all the... So, does this mean that p=M31 has all the required roots-of-two for the IBDWT for Z/pZ NTT? so, is NTT(M31) a viable alternative to FGT? Or, the problem with M31 is that it doesn't have the...
 Forum: GPU Computing 2021-04-15, 17:48 Replies: 2 Views: 1,020 Posted By preda From what I understand, OpenCL 3.0 is closer to... From what I understand, OpenCL 3.0 is closer to OpenCL 1.x than to OpenCL 2.0. I.e. 3.0 is not "more" than 2.0, but instead it reduces the mandatory feature-set to the level of 1.x and offers...
 Forum: GpuOwl 2021-04-03, 17:12 Replies: 16 Views: 2,387 Posted By preda Nice, I like it! The owl has a foxy look :) Nice, I like it! The owl has a foxy look :)
 Forum: Information & Answers 2021-03-31, 11:55 Replies: 7 Views: 976 Posted By preda I guess it's because of the "chmod 777... I guess it's because of the "chmod 777 expand.py". Do you need that? (expand.py already has rights 775)
 Forum: Hardware 2021-03-29, 07:43 Replies: 16 Views: 1,766 Posted By preda The need for the general-MUL vs. MUL-3 only... The need for the general-MUL vs. MUL-3 only appears when changing the "L" step dinamically during a test. This is something GpuOwl does not support (and thus gets away with using MUL-3), but prime95...
