mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Riesel Prime Search (https://www.mersenneforum.org/forumdisplay.php?f=59)
-   -   69 * 2^n - 1 (https://www.mersenneforum.org/showthread.php?t=18600)

diep 2013-09-25 17:54

holy smoke!

Even if i would have the time to get something going at the Tesla's here i wouldn't get to those ranges any quick :)

Toying with the alphabet now, especially (un)abcd :)

diep 2013-09-25 23:43

Moved it from sieving to testing.

Using sllr64 here right now at CPU hardware (Xeon L5420),
tested as fastest at the CPU hardware.

I remember Jean Penne busy with some gpgpu software, how did that progress lately; has Riesel Prime Search already a public version of that?

Got some Tesla's here. They idle now :)

VBCurtis 2013-09-26 04:04

CUDA-LLR is available, and in my experience stable. It only uses power-of-2 FFT sizes, and speed improves with larger exponents. The main FFT jump we care about is just over 3M for k=69, so your Teslas would be most useful in the upper 2M range, or over 5M (relative to CPU workers, that is).

Check in the hardware/GPU computing forum- I didn't see the thread when I glanced, but I've been running the program for over a year, even found a prime for k=5 with it in the 3-megabit range.
-Curtis

diep 2013-09-27 01:19

[QUOTE=VBCurtis;354213]CUDA-LLR is available, and in my experience stable. It only uses power-of-2 FFT sizes, and speed improves with larger exponents. The main FFT jump we care about is just over 3M for k=69, so your Teslas would be most useful in the upper 2M range, or over 5M (relative to CPU workers, that is).

Check in the hardware/GPU computing forum- I didn't see the thread when I glanced, but I've been running the program for over a year, even found a prime for k=5 with it in the 3-megabit range.
-Curtis[/QUOTE]

Thx Curtis, i downloaded it. Will try to get it to work!

Is that power of 2 the only 'disadvantage' over the IBDWT in SSE2 i got running currently?
I tend to remember how my own FFT implementation that also used power of 2 had another few disadvantages (let's say it polite) :)

The tesla's i got here are 0.5 Tflop in theory (of course that's always 2x more than it can do in terms of instructions, they always assume you can use multiply-add, not sure whether this FFT can), looking forward benchmarking it for this code!

Note it would be possible at Nvidia to run at each SIMD a different code stream. I don't know whether it still can deliver 0.5 Tflop doing that, yet if it can, should be easier to get rid of that power of 2 sized FFT? Maybe?

VBCurtis 2013-09-27 02:55

I don't recall what msft (user name, not company) said about the limitations of his code- I believe he stopped development shortly after he got it working, in favor of an OpenCL version for the other half of the GPUniverse.

I happen to have plenty of work available near 3M, so I haven't considered alternatives.

diep 2014-01-05 15:03

hi,

I found a prime, maybe some want to verify it is prime.
How to properly report it?

69 * 2 ^ 2649939 - 1 was found prime here!

Thanks,
Vincent

[email]diep@xs4all.nl[/email] in case i don't respond quickly at forum.

Kosmaj 2014-01-05 15:56

Hi diep

Congratulations!
To report it please create a new prover's code including RPS, Psieve, Srsieve and the software you used to prove it prime like LLR.

Thanks!

diep 2014-01-05 21:11

Tried all that, let me know if worked out ok. Thanks!

Paul Underwood verified with pfgw and confirms in meantime.

pinhodecarlos 2014-01-05 21:24

[QUOTE=diep;363882]Tried all that, let me know if worked out ok. Thanks!

Paul Underwood verified with pfgw and confirms in meantime.[/QUOTE]

It's correct.
[url]http://primes.utm.edu/primes/page.php?id=116841[/url]

diep 2014-01-05 22:20

thanks for verifying!

diep 2014-01-14 17:21

At the L5420 Xeon machines i have here at home, i had seen a pretty big jump in testing time moving up from roughly 2.74Mbit to 2.76 mbit

Testing times increased roughly from 6123 seconds to 7689 seconds.

Each CPU has 12 MB L2 cache.
So to speak 3MB a core
Seems it's the transform causing it, not the hardware.

Not sure about transform size internal.

If it stores 2.75M bits and assume 18 bits per double then it would require
an array sized 2.75mbits * 64 / (18 * 8 bits per byte) = 2.75 * 8 / 18 = 1.2 MB

Even double that would easily fit in L2.

At what mbit level can i again expect a big dang like that?

Is that at double this size at 5.5 Mbit?


All times are UTC. The time now is 03:11.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.