mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Hardware (https://www.mersenneforum.org/forumdisplay.php?f=9)
-   -   The prime-crunching on dedicated hardware FAQ (https://www.mersenneforum.org/showthread.php?t=10275)

jasonp 2008-08-28 14:35

Mini-Geek: the limit of double precision is far larger than anyone will want to use for an LL test. I don't know that anyone's tried it, but you could almost certainly test exponents around a billion.

Folks: if Jason can avoid trolling, you can avoid baiting him. Flamewars can move to Misc Math Threads if you're really keen on starting one.

wblipp 2008-08-28 16:04

[QUOTE=Mini-Geek;140169]when does double precision reach its limit?[/QUOTE]
See Msg 17 in this thread for ewmayer's information on how to figure this. For double precision it looks like memory will be a limit before word size for quite a while. A transform size of 1G should handle exponents up to 17G. A transform of size 1T would handle exponents up to 14 T.


[QUOTE=Mini-Geek;140169]but can run k*2^n+-1 for relatively small numbers (n needs to be larger than ~340K to place in the top 5000 list), could it be faster than LLR is for those sizes? Maybe for even smaller numbers?[/QUOTE]
YES, I think there are many interesting projects for smaller numbers with fast single precision FFTs. That's what I had in mind when I said "There are some interesting and useful things that could be done with that." The table in Msg 17 indicates you would need single precison transforms of size about 64K to handle n in the 340K range. Is that in the range of practicality for GPUs?

Extending the table to smaller sizes, it appears a single precision transform size 128 would be sufficient for numbers up to 337 digits, and size 256 would take you to 644 digit numbers. I'm hoping somebody will build a kick-ass ECM at these sizes that would be of interest to factoring projects like the Cunningham project and oddperfect.org.

William

jasonp 2008-08-28 16:16

[QUOTE=wblipp;140204]Extending the table to smaller sizes, it appears a single precision transform size 128 would be sufficient for numbers up to 337 digits, and size 256 would take you to 644 digit numbers. I'm hoping somebody will build a kick-ass ECM at these sizes that would be of interest to factoring projects like the Cunningham project and oddperfect.org.
[/QUOTE]
Stage 2 of ECM is more memory-bound than compute-bound (for useful size bounds), but stage 1 could benefit. Note that when the transforms get small, general-purpose processors get to use wide execution units too, and they still have a 6-10x clock speed advantage over a GPU. If the numbers involved are small enough to allow single precision FP only, and the available integer instructions and throughput are not dismal, I can see a benefit to using special-purpose hardware.

retina 2008-08-29 07:29

[QUOTE=Mini-Geek;140169]So, if single precision reaches it before the 10M digit range, when does double precision reach its limit?[/QUOTE]Well I don't know the real answer but below are some heuristics I computed many years ago with some FFT code I was creating. It was not great code (as can be seen by the maximum bit sizes being less than is achieved with Prime95 but I don't think that is important. The important thing to note is that the potential max'd-out numbers are very large by today's standards.

Note:
- DP=double precision (53bit mantissa)
- EP=extended precision (63+1bit mantissa)
- and result sizes are bit lengths, so adjust by log10(2) for base 10 sizes if you like.
[code]FFT_length Term_bits_(DP) Result_size_(DP) Term_bits_(EP) Result_size_(EP)
2^1 25 75 35 105
2^2 25 125 35 175
2^3 24 216 34 306
2^4 24 408 34 578
2^5 23 759 33 1089
2^6 23 1495 33 2145
2^7 22 2838 32 4128
2^8 22 5654 32 8224
2^9 21 10773 31 15903
2^10 21 21525 31 31775
2^11 20 40980 30 61470
2^12 20 81940 30 122910
2^13 19 155667 29 237597
2^14 19 311315 29 475165
2^15 18 589842 28 917532
2^16 18 1179666 28 1835036
2^17 17 2228241 27 3538971
2^18 17 4456465 27 7077915
2^19 16 8388624 26 13631514
2^20 16 16777232 26 27263002
2^21 15 31457295 25 52428825
2^22 15 62914575 25 104857625
2^23 14 117440526 24 201326616
2^24 14 234881038 24 402653208
2^25 13 436207629 23 771751959
2^26 13 872415245 23 1543503895
2^27 12 1610612748 22 2952790038
2^28 12 3221225484 22 5905580054
2^29 11 5905580043 21 11274289173
2^30 11 11811160075 21 22548578325
2^31 10 21474836490 20 42949672980
2^32 10 42949672970 20 85899345940
2^33 9 77309411337 19 1.63209E+11
2^34 9 1.54619E+11 19 3.26418E+11
2^35 8 2.74878E+11 18 6.18475E+11
2^36 8 5.49756E+11 18 1.23695E+12
2^37 7 9.62073E+11 17 2.33646E+12
2^38 7 1.92415E+12 17 4.67292E+12
2^39 6 3.29853E+12 16 8.79609E+12
2^40 6 6.59707E+12 16 1.75922E+13
2^41 5 1.09951E+13 15 3.29853E+13
2^42 5 2.19902E+13 15 6.59707E+13
2^43 4 3.51844E+13 14 1.23145E+14
2^44 4 7.03687E+13 14 2.46291E+14
2^45 3 1.05553E+14 13 4.57397E+14
2^46 3 2.11106E+14 13 9.14794E+14
2^47 2 2.81475E+14 12 1.68885E+15
2^48 2 5.6295E+14 12 3.3777E+15
2^49 1 5.6295E+14 11 6.19245E+15
2^50 1 1.1259E+15 11 1.23849E+16
2^51 - - 10 2.2518E+16
2^52 - - 10 4.5036E+16
2^53 - - 9 8.10648E+16
2^54 - - 9 1.6213E+17
2^55 - - 8 2.8823E+17
2^56 - - 8 5.76461E+17
2^57 - - 7 1.00881E+18
2^58 - - 7 2.01761E+18
2^59 - - 6 3.45876E+18
2^60 - - 6 6.91753E+18
2^61 - - 5 1.15292E+19
2^62 - - 5 2.30584E+19
2^63 - - 4 3.68935E+19
2^64 - - 4 7.3787E+19
2^65 - - 3 1.1068E+20
2^66 - - 3 2.21361E+20
2^67 - - 2 2.95148E+20
2^68 - - 2 5.90296E+20
2^69 - - 1 5.90296E+20
2^70 - - 1 1.18059E+21[/code]Since these were simply heuristics I cannot say if these are practical results or not. I only used the code for numbers up to a few million bits so larger values may or may not actually work.

Mini-Geek 2008-08-29 11:08

[quote=retina;140260]Well I don't know the real answer but below are some heuristics I computed many years ago with some FFT code I was creating. It was not great code (as can be seen by the maximum bit sizes being less than is achieved with Prime95 but I don't think that is important. The important thing to note is that the potential max'd-out numbers are very large by today's standards.

Note:
- DP=double precision (53bit mantissa)
- EP=extended precision (63+1bit mantissa)
- and result sizes are bit lengths, so adjust by log10(2) for base 10 sizes if you like.
...
Since these were simply heuristics I cannot say if these are practical results or not. I only used the code for numbers up to a few million bits so larger values may or may not actually work.[/quote]
So ~2^1 quadrillion-1 (~10^338929672118098 or 339 trillion digits) is the largest Mersenne number DP can check, and ~2^1 sextillion-1 (~10^355393002580961800000 or 355 quintllion digits) the largest for EP. That does indeed seem incredibly far off. I don't think we need to worry about it for a while.
Does Prime95 already use EP? It seems to me that it would make it faster, since the word size would be larger.

retina 2008-08-29 11:19

[QUOTE=Mini-Geek;140272]So ~2^1 quadrillion-1 (~10^338929672118098 or 339 trillion digits) is the largest Mersenne number DP can check, and ~2^1 sextillion-1 (~10^355393002580961800000 or 355 quintllion digits) the largest for EP. That does indeed seem incredibly far off. I don't think we need to worry about it for a while.[/QUOTE]Yeah, although my numbers could easily be 1 or 2 orders of magnitude off, but I think the net result is still obvious.[QUOTE=Mini-Geek;140272]Does Prime95 already use EP? It seems to me that it would make it faster, since the word size would be larger.[/QUOTE]Oh no, the 10 byte variable is not particularly friendly to base-2 hardware, and SSE doesn't support it either.

Mini-Geek 2008-08-29 11:30

[quote=retina;140273]Oh no, the 10 byte variable is not particularly friendly to base-2 hardware, and SSE doesn't support it either.[/quote]
Hm...Aren't quantum computers supposed to be higher than base 2? Besides the fast speeds that quantum computers should have, this would mean they could use EP, right?
BTW could you run those same heuristics for SP so I can see a little more precisely than "before 30 or 40 million" where it stops and compare it to DP and EP?

[URL="http://en.wikipedia.org/wiki/Quantum_computer"]Quantum computer[/URL]
So a 32-bit Quantum computer could be in 2^32 states simultaneously. Not entirely sure how that relates to an equivalent base digital hardware or if it even does, but uh...just throwing that out there.

retina 2008-08-29 12:36

[QUOTE=Mini-Geek;140274]Hm...Aren't quantum computers supposed to be higher than base 2? Besides the fast speeds that quantum computers should have, this would mean they could use EP, right?
BTW could you run those same heuristics for SP so I can see a little more precisely than "before 30 or 40 million" where it stops and compare it to DP and EP?

[URL="http://en.wikipedia.org/wiki/Quantum_computer"]Quantum computer[/URL]
So a 32-bit Quantum computer could be in 2^32 states simultaneously. Not entirely sure how that relates to an equivalent base digital hardware or if it even does, but uh...just throwing that out there.[/QUOTE]My understanding of QCs would be that you would need [i]at least[/i] the same number of qubits as the length of the number you are testing. Although current known algorithms (Shor's factoring algorithm) require ~double the number of qubits as the number size. The QC would not have the concept of DP or EP, they do not work in that manner.

I never did SP FFT in my tests. And I would hate to have to guess the numbers required. When I read the post above about SP good for 30M-40M bits I was very surprised. I thought it would be much less than that. But I bow to the superior knowledge of others here that have used it in real life.

cheesehead 2008-08-29 19:59

[quote=Mini-Geek;140274]Besides the fast speeds that quantum computers should have, this would mean they could use EP, right?[/quote]If we have a quantum computer that has enough qubits for L-L tests of Mersenne numbers with exponents > 10[sup]7 or 8[/sup], it seems to me it'd be preferable to use Shor's algorithm anyway, because Shor's finds an actual factor rather than merely the composite/noncomposite Lucas-Lehmer result.

(I don't know about the expected execution times, though. Note that QC computations may need to be performed multiple times in order to ensure that the probability of a correct measured result is high enough.)

jasong 2008-09-10 19:08

[QUOTE=retina;140170]It is curious how all of your "friends" end up either ill or in general have some sort of problem that prevents them from ever actually proving their claims.[/QUOTE]
There is exactly one friend that I've been referring to all this time. And he's recovering fairly well, considering how sick he was.

As to the hardware crash, one of the problems with RAID arrays is that everything tends to be backed up in the same location. So things like ceiling leaks and fallen over shelves have the potential to destroy enough hard drives in a RAID array to make them unrecoverable.

Anyway, he's back in business, and his latest interest will probably become very apparent in the next 2-3 months. As soon as he can explain his theories well enough for one or more of the 10-15 people who know what's going on to duplicate his methods, it will only be a short time(hours to weeks) before you find out about them.

As Steve Jobs would say, you'll find out when you find out, and no form of bitching(or sucking up, just to cover the bases) will make it happen sooner.

And for those who think I'm a troll, if it's possible for a troll to be correct in what they say and still be a troll, then yes, I am very much a troll. Every dog(or troll) has his day, and I intend to savor that day like a fine wine.

Xyzzy 2008-09-10 19:16

There seems to be a very discriminatory attitude about trolls around here.

We're not sure how our trolls view it, but we bet they don't feel comfortable living with it every day.

Can't we all just get along?

:down:


All times are UTC. The time now is 21:07.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.