mersenneforum.org  

Go Back   mersenneforum.org > New To GIMPS? Start Here! > Information & Answers

Reply
 
Thread Tools
Old 2022-10-11, 17:41   #78
Mysticial
 
Mysticial's Avatar
 
Sep 2016

2·5·37 Posts
Default

Quote:
Originally Posted by kriesel View Post
P-1 factoring on F33 is a 2-years-long undertaking in Mlucas on good hardware, as Ernst has been doing it. Generally a primality test is dozens of times as long as a reasonable bounds P-1 attempt. A Pepin test on F33 is probably a several decades long undertaking on any CPUs reasonably available. Are you prepared to alter Gpuowl to perform a Pepin test, on fft lengths ~512M not yet implemented in gpuowl, on F33 and rent a high end GPU (recent Tesla or AMD Instinct) for some years?
I estimate a billion-digit Mersenne PRP run would take ~6 years in gpuowl on a radeon VII, and primality tests scale as p2.1, so ~6 * (233/3321928171)2.1 ~ 44 years. An A100 may cut that to ~12-15 years. During that time faster GPUs will appear, that could shorten it somewhat.
What's the current state-of-the-art iteration time on F33?

I think this is well into the region where NTTs are gonna win simply based on memory usage. And a power-of-two convolution length does not require any irrational base shenanigans.
Mysticial is offline   Reply With Quote
Old 2022-10-11, 22:23   #79
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

34·7·13 Posts
Default

Not sure what the minimim feasible timing is for current hardware under US$20k system cost.
Or how a block of time using a supercomputer might change things, or what that would cost.
But here are some datapoints for disparate affordable hardware, to place an upper bound on feasible run time.
Ernst probably has a better idea of what's currently possible.
Maybe Drkirkby's dual-28-core system can do better than any of the CPU data below.

Mlucas.cfg excerpt for dual Xeon E5-2697V2, in ubuntu atop WSL, system cost me ~$1.5K years ago:
Code:
    491520  msec/iter = 1452.54  ROE[avg,max] = [0.248991718, 0.281250000]  radices = 960 16 16 32 32  0  0  0  0  0
    524288  msec/iter = 1597.16  ROE[avg,max] = [0.248288750, 0.312500000]  radices = 1024 16 16 32 32  0  0  0  0  0
#above were done with 24 cores, 0:23
At 2^33 iterations, a Pepin test of F33 on 480M - 512M fft would be ~400. to 436. years on that.

If scaling by Geekbench 3 64 bit multicore can be trusted,
e5-2697v2 52150 vs e5-2698v4 73848 implies ~308. years on a dual e5-2698V4.
https://cpu-benchmark.org/cpu/intel-xeon-e5-2697-v2/ https://cpu-benchmark.org/cpu/intel-xeon-e5-2698-v4/

See also end of https://www.mersenneforum.org/showpo...6&postcount=14 for timing on a tenth generation Intel laptop, ~$600, a millenium plus to run.
Comparing via CPUID CPU-Z benchmark, an AMD Ryzen 5950x might be around 120 years.

From https://www.mersenneforum.org/mayer/README.html, a Xeon Phi 7250 can do ~552.5msec/iter at 512M, extrapolating to ~150.5 years for F33 Pepin test. (At one time that would have been a $5K system.)

In 2017, at https://mersenneforum.org/showpost.p...&postcount=178 Ernst wrote of a 32-core Xeon that he estimated could do a gigadigit mersenne primality test in ~20 years. F33 is considerably longer; ~2.5^2.1 ~6.85 times as long, 137. years.

8-core Ryzen: ~58 years for gigadigit Mersenne; extrapolates to ~397. years for F33. https://mersenneforum.org/showpost.p...&postcount=182

Looking a bit beyond the current state of the art:
To attempt F33 with gpuowl, it would need to be extended to handle >32 bit exponents, add 512M ffts, and implement the Pepin test.

Looking at it the other way around, to get F33 PRP run time under 10 years, would require an iteration time < 10 * 365 * 24 * 3600 / 2^33 ~ 0.0367 seconds; 36.7msec.
kriesel is offline   Reply With Quote
Old 2022-10-11, 23:31   #80
Mysticial
 
Mysticial's Avatar
 
Sep 2016

1011100102 Posts
Default

The best implementation I have needs about ~0.65 seconds to do a ~2^33-bit multiply convolution on my 7950X with memory @ 4400.

Squaring would certainly be faster, though I don't have benchmarks for that. Either way, still way too slow.

Not an apples-to-apples comparison anyway since it can't do exactly 2^33-bit convolution length, but it can probably be modified to match it exactly with no overhead to avoid irrational base weights.

And yes, it is 100% memory-bound.
Mysticial is offline   Reply With Quote
Old 2022-10-11, 23:42   #81
Mysticial
 
Mysticial's Avatar
 
Sep 2016

1011100102 Posts
Default

If we want to look at theoreticals, a 2^33-bit convolution using the optimal NTT method will need ~4GB of ram to hold the transform data.

Assuming standard 2-pass approach will full pass-merging on both ends, each iteration will require reading and writing 8 GB of memory.

Zen4 with 6000 memory seems to be around 70 GB copy bandwidth. So we can put a lower-bound of ~120ms using a hypothetical memory-optimal implementation. (which goes to show how far off from optimal my own implementation still is)

Last fiddled with by Mysticial on 2022-10-11 at 23:47
Mysticial is offline   Reply With Quote
Old 2022-10-11, 23:54   #82
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

11,087 Posts
Default

Quote:
Originally Posted by Mysticial View Post
Assuming standard 2-pass approach will full pass-merging on both ends, each iteration will require reading and writing 8 GB of memory.
Ouch! I'm constrained to 8 KB of D for one of my code paths. Also another 8 KB for I (read: code). Great fun! 8^)
chalsall is offline   Reply With Quote
Old 2022-10-12, 00:04   #83
Mysticial
 
Mysticial's Avatar
 
Sep 2016

2·5·37 Posts
Default

Actually I lied. It's not 4GB to hold the transform data, it's just 2GB. I forgot the factor of two from the wrap-around.

So it's read+write 4GB of memory per iteration, or about ~60ms lower-bound on a Zen4 box.
Mysticial is offline   Reply With Quote
Old 2022-10-12, 00:46   #84
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

11,087 Posts
Default

Quote:
Originally Posted by Mysticial View Post
Actually I lied.
Please forgive me for this. I can get a little intense sometimes...

You didn't lie. You made a mistake. Mistakes happen. It is how we learn.
chalsall is offline   Reply With Quote
Old 2022-10-13, 14:23   #85
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

737110 Posts
Default dept. of corrections

Oops. (Case of mistaken identity.)
Quote:
Originally Posted by kriesel View Post
See also end of https://www.mersenneforum.org/showpo...6&postcount=14 for timing on a tenth eleventh generation Intel laptop (i7-1165G7) , ~$600 $900, a millenium plus to run.
Comparing via CPUID CPU-Z benchmark, an AMD Ryzen 5950x might be around 120 238. years.
kriesel is offline   Reply With Quote
Old 2022-10-14, 17:13   #86
Stargate38
 
Stargate38's Avatar
 
"Daniel Jackson"
May 2011
14285714285714285714

3·251 Posts
Default

In other words, it would require a large cluster of PCs to get this factored in a reasonable time, say <10 years.
Stargate38 is offline   Reply With Quote
Old 2022-10-14, 21:41   #87
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

DE316 Posts
Default

Ancient history trying P-1 on F31/F33 (start at post #13)
jasonp is offline   Reply With Quote
Old 2022-12-05, 10:52   #88
tanaydin
 
Oct 2016

32 Posts
Default A C code for educational purposes

Hey there,

I've written a C code that uses GMP to check if there is any factor less than sqrt of a Mersenne number.
It's using trial division and couldn't finish checking 2^127-1 in two days so it is SLOW! :)
But maybe someone else needs a similar thing or can find it useful for his/her needs.

You can run it after compile like

./jumper 47

and it will check 2^47-1

I've named it "jumper", because it's jumping -2*p after founds a good point to start.

Any comment/commit is welcome, C is not a language that I've experienced so maybe there could be improvements.

https://github.com/tanaydin/jumper/

Last fiddled with by tanaydin on 2022-12-05 at 10:53
tanaydin is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Do you know this method to factorize? Godzilla Miscellaneous Math 28 2017-10-31 18:14
mathematica7.0 can easily factorize 10^67+1111 aaa120 Factoring 14 2008-12-07 13:14

All times are UTC. The time now is 05:58.


Mon Feb 6 05:58:18 UTC 2023 up 172 days, 3:26, 1 user, load averages: 1.11, 0.90, 0.92

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔