mersenneforum.org  

Go Back   mersenneforum.org > Search Forums

Showing results 1 to 25 of 1000
Search took 0.13 seconds.
Search: Posts Made By: preda
Forum: GpuOwl 2021-11-19, 14:49
Replies: 5
Views: 419
Posted By preda
I don't understand what happened. The proof...

I don't understand what happened. The proof residues are written once only, afterwards they are only read. The check of residues at startup is done CPU-side only (it's a very simple checksum over the...
Forum: Software 2021-11-04, 09:26
Replies: 10
Views: 823
Posted By preda
P-1 only "eliminates" an exponent if a factor is...

P-1 only "eliminates" an exponent if a factor is found.

I suggest using bounds of at least B1=1M,B2=20M. (I use B1=3M and B2=60M myself).
Forum: Software 2021-09-24, 08:31
Replies: 7
Views: 883
Posted By preda
In my understanding, VIRT in "top" for a process...

In my understanding, VIRT in "top" for a process indicates "virtual" memory that is not mapped to physical memory. That would happen for example after a malloc() but before writing anything to the...
Forum: Lounge 2021-09-06, 19:12
Replies: 1,812
RIP
Views: 189,908
Posted By preda
Ivan Patzaichin

Ivan Patzaichin was a legendary Romanian canoeist: https://en.wikipedia.org/wiki/Ivan_Patzaichin
Forum: Math 2021-07-06, 17:27
Replies: 24
Views: 3,044
Posted By preda
Maybe related, please see the thread about what I...

Maybe related, please see the thread about what I coined "PRP-1": https://www.mersenneforum.org/showthread.php?t=23628
Forum: mersenne.ca 2021-06-14, 19:20
Replies: 16
Views: 1,637
Posted By preda
Yes AFAIK the new formula is correct. ...

Yes AFAIK the new formula is correct.

fancySum(a, b) = a + b * (1 - a) == a + b - a * b
Forum: mersenne.ca 2021-06-14, 06:03
Replies: 16
Views: 1,637
Posted By preda
Let's consider a story: on a dangerous trip,...

Let's consider a story: on a dangerous trip, somebody must first cross a lake, and afterwards the forest. In the lake there's an aligator that would eat him with 90% chances. In the unlikely event...
Forum: Software 2021-06-02, 13:58
Replies: 54
Views: 11,148
Posted By preda
There's one more interesting factoid about the...

There's one more interesting factoid about the difference between "classic" FFT and NTT:

when working with complex numbers in the classic FFT, the inverse transform is equal to the conjugate of...
Forum: Software 2021-06-02, 13:33
Replies: 54
Views: 11,148
Posted By preda
In the same setup (FFT 4M, Radeon VII, exponent...

In the same setup (FFT 4M, Radeon VII, exponent around 107M) I gained about 33% performance by tweaking (with inline assembly) the low-level modular primitives (add, sub, mul). So now the performance...
Forum: Software 2021-05-27, 07:03
Replies: 54
Views: 11,148
Posted By preda
I was testing on a Radeon VII.

I was testing on a Radeon VII.
Forum: Software 2021-05-26, 20:33
Replies: 54
Views: 11,148
Posted By preda
I did some preliminary performance measurements,...

I did some preliminary performance measurements, and the initial results are a bit dissapointing -- the NTT being about 3x slower than the equivalent FP64. It seems the code is compute-bound now (vs....
Forum: Software 2021-05-25, 06:50
Replies: 54
Views: 11,148
Posted By preda
Yes this is one option I considered. It's tricky...

Yes this is one option I considered. It's tricky to do the "shift every (couple of) FFT levels" well, because if it's done the conservative way (shift every level) it's wasteful on the precision...
Forum: Software 2021-05-25, 06:41
Replies: 54
Views: 11,148
Posted By preda
There is a difference of 3 bits between M61 and...

There is a difference of 3 bits between M61 and the above 64-bit prime, which explains 3/2=1.5 bits of the difference.

Another important element that enables 25bits is that now I'm using...
Forum: Software 2021-05-25, 00:49
Replies: 54
Views: 11,148
Posted By preda
Lately I've been experimenting with some non-FP64...

Lately I've been experimenting with some non-FP64 FFT transforms.

These are briefly some directions I've looked into, for representing the values that the FFT operates on:

1. a set of 4 SP...
Forum: Math 2021-05-17, 12:17
Replies: 2
Views: 840
Posted By preda
On the GPU, we are limited by the small number of...

On the GPU, we are limited by the small number of "VGPRs" (registers) per workgroup that are available. Because we're operating at the upper limit of VGPRs, there's no much room to operate on two...
Forum: GpuOwl 2021-05-13, 17:54
Replies: 4
Views: 652
Posted By preda
It's fine to run without a config.txt if you...

It's fine to run without a config.txt if you don't need it. It's just a facility to put the flags that you would otherwise pass on the command line, in a file. The format is exactly what you'd put on...
Forum: Hardware 2021-04-29, 10:05
Replies: 77
Views: 11,608
Posted By preda
TF (trial factoring) does not use FFTs. For...

TF (trial factoring) does not use FFTs.

For primality testing Mersenne numbers, there is LL and PRP; the two are very similar from an implementation perspective. They both require squaring very...
Forum: Software 2021-04-27, 11:20
Replies: 54
Views: 11,148
Posted By preda
The cost of small multiplication

(sorry for the below being so trivial)

At the core of a FFT there are "small multiplications", word-size or some small multiple of word-size, and I've been thinking a bit about their cost.
...
Forum: Software 2021-04-27, 10:20
Replies: 54
Views: 11,148
Posted By preda
Some interesting threads on the topic: ...

Some interesting threads on the topic:

https://www.mersenneforum.org/showthread.php?t=19486
https://www.mersenneforum.org/showthread.php?t=22622
Forum: Software 2021-04-23, 06:51
Replies: 52
Views: 9,347
Posted By preda
Thank you for the explanation of Shoup's mul-mod....

Thank you for the explanation of Shoup's mul-mod.
In GCN (AMD GPU), there is a 32-bit mul_hi instruction, but there is no 64-bit mul_hi. "emulating" the 64-bit mul_hi is slow, almost as slow as the...
Forum: Software 2021-04-18, 18:12
Replies: 52
Views: 9,347
Posted By preda
So, does this mean that p=M31 has all the...

So, does this mean that p=M31 has all the required roots-of-two for the IBDWT for Z/pZ NTT? so, is NTT(M31) a viable alternative to FGT?

Or, the problem with M31 is that it doesn't have the...
Forum: GPU Computing 2021-04-15, 17:48
Replies: 2
Views: 925
Posted By preda
From what I understand, OpenCL 3.0 is closer to...

From what I understand, OpenCL 3.0 is closer to OpenCL 1.x than to OpenCL 2.0. I.e. 3.0 is not "more" than 2.0, but instead it reduces the mandatory feature-set to the level of 1.x and offers...
Forum: GpuOwl 2021-04-03, 17:12
Replies: 16
Views: 2,255
Posted By preda
Nice, I like it! The owl has a foxy look :)

Nice, I like it! The owl has a foxy look :)
Forum: Information & Answers 2021-03-31, 11:55
Replies: 7
Views: 918
Posted By preda
I guess it's because of the "chmod 777...

I guess it's because of the "chmod 777 expand.py". Do you need that? (expand.py already has rights 775)
Forum: Hardware 2021-03-29, 07:43
Replies: 16
Views: 1,662
Posted By preda
The need for the general-MUL vs. MUL-3 only...

The need for the general-MUL vs. MUL-3 only appears when changing the "L" step dinamically during a test. This is something GpuOwl does not support (and thus gets away with using MUL-3), but prime95...
Showing results 1 to 25 of 1000

 
All times are UTC. The time now is 09:24.


Tue Dec 7 09:24:01 UTC 2021 up 137 days, 3:53, 0 users, load averages: 1.51, 1.40, 1.41

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.