mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2021-12-09, 07:06   #133
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

100000010101112 Posts
Default

Quote:
Originally Posted by R. Gerbicz View Post
If the init cost is small compared to the total cost then why wouldn't use more prime:
In this case if you'd include 11 also, then you need to evaluate in 1/11 less points with 10 times more inits (assuming the same parameters), and note that here you'd still use a symmetrical polynom since c and m-c is coprime at the same time. You can pack the points evenly because eulerphi(n)|eulerphi(m) if n|m.
If I understand correctly, you are proposing using D=2310 instead of 1050.

In the example, our memory budget limits us to 523 temps. If we use, D=2310 we'd use 240 temps for the first poly and only 283 temps for the second poly. This would let us evaluate the first poly at 283-2*240+1 points -- clearly won't work.
Prime95 is offline   Reply With Quote
Old 2021-12-09, 08:52   #134
axn
 
axn's Avatar
 
Jun 2003

23·683 Posts
Default

Suggestion for a minor (but essentially free) optimization.

You would be doing something like:
B2 start = Requested B2 / q

where q is the smallest prime not dividing D.

But then actual B2 will be B2 start + k * batch size, k selected such that actual B2 >= Requested B2.

But since actual B2 >= Requested B2, we could adjust B2 start upwards by doing B2 start = Actual B2 / q

Which would then bump Actual B2. Which would then be fed back to higher B2 start, etc... until it converges.

Simply, if x = (Actual B2-Requested B2), then we can adjust B2 start and Actual B2 upwards by x/(q-1).

Concrete example:
D: 8190, 864x3390 polynomial multiplication (batch size = 13619970)
B1=2M, Requested B2=400M
q=11
B2 start = (400M/11) rounded down to the nearest multiple of D = 36363600
num batches = (400M-b2 start)/batch size, rounded up = 27
Actual B2 = B2 start + batch size * 27 = 404102790

The actual matches with what P95 is outputting.

Now, we calculate the excess x = 404102790 - 400M = 4102790
Adjustment = x/(q-1) = 4102790/10 = 410279, rounded down to nearest multiple of D = 409500
New B2start = 36363600+409500 = 36773100
New actual B2 = 404102790 + 409500=404512290

Sanity check = 404512290/11 rounded down is 36773100 !!

In this case, the change is small since Actual B2 was very close to Requested B2 (due to smaller batch size). In cases where batch size is larger, the corresponding change also will be larger. Most importantly, this is free.

Last fiddled with by axn on 2021-12-09 at 08:53
axn is offline   Reply With Quote
Old 2021-12-09, 14:21   #135
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

110011010012 Posts
Default

Quote:
Originally Posted by Prime95 View Post
If I understand correctly, you are proposing using D=2310 instead of 1050.

In the example, our memory budget limits us to 523 temps. If we use, D=2310 we'd use 240 temps for the first poly and only 283 temps for the second poly. This would let us evaluate the first poly at 283-2*240+1 points -- clearly won't work.
No, I was too compact, and a little imprecise:
let new step=11*1050=11550, then
eulerphi(step)=10*240=2400,

partition the reduced residue system modulo 11550 into 10 equal size sets, where within each set if you see c in a given set, then 11550-c is also in the same set. You can do this, because: 2400 is divisible by 10, and it is true that c and 11550-c is coprime to 11550 at the same time.
Any such partition would be good, say if you put c in a set then put also 11550-c, stop if you reach 240 as a size. At the end all sets will be symmetrical to the midpoint 11550/2.
Call your code for each set/polynom seperately, now the step is 11 times larger what was before, but you eliminated all P, where P is divisible by 11, so you have ten problem/polynom, you can process each polynom 11 times faster what was before.
Each polynom has degree=240/2=120 and you evaulate at once with polymult 403-2*120+1=164 points, "eliminating" 1050*164=172200 basepoints coprime to 11550.
If t is the init time and T is total time (which includes init time) for your original run, then my suggestion will use the same memory and runs in:
10*t+10/11*(T-t) time and this will be smaller than T iff t<T/100, what is definately will be true if memory is fixed and B2->inf since here t is constant (fixed).
R. Gerbicz is offline   Reply With Quote
Old 2021-12-09, 22:55   #136
nordi
 
Dec 2016

127 Posts
Default

My 30.8b4 got stuck while starting stage 2. In worktodo.txt I had

Code:
Pminus1=N/A,1,2,20479,-1,5000000000,5000000000000,123,"2948977,180092327,710908007,26014268911,686893590107252399,6294660677916349147081,310138432659875910149871768953,12124683172430190710498405470618889"
Previously I did literally hundreds of exponents in the 12M region without encountering such issues, so it might be related to the exponent.

mprime's last words were

Code:
[Work thread Dec 9 21:29] M20479 stage 1 complete. 21057674982 transforms. Total time: 22107.028 sec.
[Work thread Dec 9 21:29] Inversion of stage 1 result complete. 5 transforms, 1 modular inverse. Time: 0.001 sec.
[Work thread Dec 9 21:29] Switching to FMA3 FFT length 1280
[Work thread Dec 9 21:29] Using 59400MB of memory.  D: 7657650, 691200x5283889 polynomial multiplication.
[Work thread Dec 9 21:29] Setting affinity to run polymult helper thread on logical CPUs 2 (zero-based)
[Work thread Dec 9 21:29] Setting affinity to run polymult helper thread on logical CPUs 4 (zero-based)
[Work thread Dec 9 21:29] Setting affinity to run polymult helper thread on logical CPUs 6 (zero-based)
Over 2 hours later, CPU usage is at 0%, RAM usage at 6.7GB.

According to strace, the process is waiting for some lock:
Code:
# strace -p 27536 -q
futex(0x7f92d21f69d0, FUTEX_WAIT, 27572, NULL
I left the process running, so I could do some debugging if needed.
nordi is offline   Reply With Quote
Old 2021-12-09, 23:28   #137
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

17×487 Posts
Default

Quote:
Originally Posted by nordi View Post
My 30.8b4 got stuck while starting stage 2..
Nuts. I'll revisit the locking code, no need to keep it running. Try again and it might work.
Prime95 is offline   Reply With Quote
Old 2021-12-10, 00:04   #138
nordi
 
Dec 2016

127 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Try again and it might work.
I tried and it got stuck again.
nordi is offline   Reply With Quote
Old 2021-12-10, 06:19   #139
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

201278 Posts
Default

Quote:
Originally Posted by nordi View Post
I tried and it got stuck again.
I can reproduce the issue. Don't know whats wrong yet.
Prime95 is offline   Reply With Quote
Old 2021-12-10, 10:01   #140
tha
 
tha's Avatar
 
Dec 2002

881 Posts
Default

GHz-days for P-1 needs to be re-adjusted.

A couple of parameters on mersenne.ca as well.
tha is offline   Reply With Quote
Old 2021-12-10, 11:48   #141
lisanderke
 
"Lisander Viaene"
Oct 2020
Belgium

24×7 Posts
Default

Quote:
Originally Posted by lisanderke View Post
14923 I received 2888 GHzDs P-1 credit for this workload while it took me probably less than an hr to complete, perhaps credit given should be recalculated after a full release of 30.8 or higher versions!


Mind you, B2 was a whopping ~1,142,499 times B1!

If (but I assume when) the GHzDs calculation is adjusted I'd definitely like to request for my own results produced with 30.8 to be recalculated!
lisanderke is offline   Reply With Quote
Old 2021-12-10, 13:56   #142
bur
 
bur's Avatar
 
Aug 2020
79*6581e-4;3*2539e-3

10110110102 Posts
Default

I gave it a try for exponents in the 60.5M range it the increase in speed is incredible:

Code:
i9-10900K, 16GB RAM, Ubuntu

60514973	NF-PM1	2021-12-10 13:44	B1=2000000, B2=174051780	41.6883
60507347	NF-PM1	2021-12-10 09:00	B1=2000000, B2=122000000	27.5025
60507143	NF-PM1	2021-12-09 23:29	B1=2000000, B2=122000000	27.5025
60507119	NF-PM1	2021-12-09 14:06	B1=2000000, B2=122000000	27.5025
60505651	NF-PM1	2021-12-09 04:42	B1=2000000, B2=122000000	27.5025
60505337	NF-PM1	2021-12-08 19:14	B1=2000000, B2=122000000	27.5025
It pretty cosistently took 9:30 h per B1=2M B2=122M test. After switching to 30.8 even though B2 was automatically increased to B2=174M, the time dropped to 4:44 h!

That range is just off cat1 DC and that should definitely impact the validation rate.


Who came up with this optimization? Is it a new idea or based on something published elsewhere?

And, just out of curiosity, why wasn't it used before?

Is it specific for Mersenne numbers or could it be used more generally like Proth or even any integer?

And, a last question (for now), could it be applied to P+1 as well?

Last fiddled with by bur on 2021-12-10 at 14:05
bur is offline   Reply With Quote
Old 2021-12-10, 14:16   #143
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

7,823 Posts
Default

Quote:
Originally Posted by bur View Post
Who came up with this optimization? Is it a new idea or based on something published elsewhere?

And, just out of curiosity, why wasn't it used before?

Is it specific for Mersenne numbers or could it be used more generally like Proth or even any integer?

And, a last question (for now), could it be applied to P+1 as well?
So many questions. https://eprint.iacr.org/2021/1462.pdf should provide some answers. (Thanks for finding and posting that ixfd64!)

Last fiddled with by kriesel on 2021-12-10 at 14:19
kriesel is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Do not post your results here! kar_bon Prime Wiki 40 2022-04-03 19:05
what should I post ? science_man_88 science_man_88 24 2018-10-19 23:00
Where to post job ad? xilman Linux 2 2010-12-15 16:39
Moderated Post kar_bon Forum Feedback 3 2010-09-28 08:01
Something that I just had to post/buy dave_0273 Lounge 1 2005-02-27 18:36

All times are UTC. The time now is 14:05.


Fri Jul 7 14:05:20 UTC 2023 up 323 days, 11:33, 0 users, load averages: 1.41, 1.18, 1.15

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔