mersenneforum.org Assigning too much memory slows down P-1 stage 2?
 User Name Remember Me? Password
 Register FAQ Search Today's Posts Mark Forums Read

2020-12-12, 10:16   #1
ZFR

Feb 2008
Meath, Ireland

23·3·7 Posts
Assigning too much memory slows down P-1 stage 2?

Using mprime. I have a machine with 32GB memory. I'm running PRPs/DC on 3 workers, but thought I'd assign one worker for P-1.

I assigned 4GB memory initially. For the first exponent, when it reached stage 2, the bounds were as follows:
Quote:
 [Worker #2 Dec 10 09:50] Optimal P-1 factoring of M107608939 using up to 4000MB of memory. [Worker #2 Dec 10 09:50] Assuming no factors below 2^77 and 2 primality tests saved if a factor is found. [Worker #2 Dec 10 09:50] Optimal bounds are B1=740000, B2=20837000 [Worker #2 Dec 10 09:50] Chance of finding a factor is an estimated 3.51%
And then the times as follows

Quote:
 [Worker #3 Dec 10 10:06] M108745423 stage 2 is 1.59% complete. Time: 950.778 sec. [Worker #2 Dec 10 10:06] M107608939 stage 2 is 2.16% complete. Time: 951.162 sec. [Worker #2 Dec 10 10:22] M107608939 stage 2 is 4.31% complete. Time: 944.688 sec.
Then I thought, why not give it 24GB memory. I only run mprime when the machine is idle anyway.

So for next eponent, got this at Stage 2

Quote:
 [Worker #2 Dec 11 11:58] Optimal P-1 factoring of M107610803 using up to 24000MB of memory. [Worker #2 Dec 11 11:58] Assuming no factors below 2^77 and 2 primality tests saved if a factor is found. [Worker #2 Dec 11 11:58] Optimal bounds are B1=744000, B2=21803000 [Worker #2 Dec 11 11:58] Chance of finding a factor is an estimated 3.55%
and timings
Quote:
 [Worker #2 Dec 11 12:32] M107610803 stage 2 is 0.06% complete. [Worker #2 Dec 11 12:49] M107610803 stage 2 is 0.93% complete. Time: 1031.830 sec. [Worker #2 Dec 11 13:06] M107610803 stage 2 is 1.80% complete. Time: 1043.622 sec.
or

Quote:
 [Worker #2 Dec 11 15:45] M107610803 stage 2 is 3.40% complete. Time: 961.250 sec. [Worker #2 Dec 11 16:01] M107610803 stage 2 is 4.28% complete. Time: 959.943 sec.
After increasing memory from 4GB to 24GB the time taken almost doubled. But the increase in bounds and the chance to find a factor barely increased. Is this really optimal?

Last fiddled with by ZFR on 2020-12-12 at 10:16

 2020-12-12, 15:39 #2 VBCurtis     "Curtis" Feb 2005 Riverside, CA 516610 Posts Never seen that happen- it looks like memory access has slowed for the larger memory footprint. Is this a dual-socket / NUMA box? What OS?
2020-12-12, 16:00   #3
ZFR

Feb 2008
Meath, Ireland

23·3·7 Posts

Quote:
 Originally Posted by VBCurtis Never seen that happen- it looks like memory access has slowed for the larger memory footprint. Is this a dual-socket / NUMA box? What OS?
Linux Mint 19.

Quote:
 Machine: Type: Desktop Mobo: Gigabyte model: Z97X-Gaming GT v: x.x serial: BIOS: American Megatrends v: F7 date: 09/19/2015 CPU: Topology: Quad Core model: Intel Core i7-4790K bits: 64 type: MT MCP arch: Haswell rev: 3 L2 cache: 8192 KiB flags: lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 64003 Speed: 1682 MHz min/max: 800/4000 MHz Core speeds (MHz): 1: 1625 2: 1355 3: 1515 4: 1469 5: 1867 6: 1494 7: 1342 8: 1886 Memory: RAM: total: 31.22 GiB Array-1: capacity: 32 GiB slots: 4 EC: None Device-1: ChannelA-DIMM0 size: 8 GiB speed: 1333 MT/s Device-2: ChannelA-DIMM1 size: 8 GiB speed: 1333 MT/s Device-3: ChannelB-DIMM0 size: 8 GiB speed: 1333 MT/s Device-4: ChannelB-DIMM1 size: 8 GiB speed: 1333 MT/s

 2020-12-12, 16:14 #4 axn     Jun 2003 22×1,319 Posts Something funky is going on. This is not supposed to happen. However... Your "before" data shows two Stage 2 running in parallel. Is that right? How many stage 2 are running in parallel in the "after" scenario? Also, you're not showing how much memory is actually being used and how many relative primes are being processed. Can you provide those details as well (preferably for both before & after). Finally, are you seeing any harddisk activity indicating potential thrashing of memory?
2020-12-12, 16:32   #5
ZFR

Feb 2008
Meath, Ireland

23×3×7 Posts

Quote:
 Originally Posted by axn Something funky is going on. This is not supposed to happen. However... Your "before" data shows two Stage 2 running in parallel. Is that right? How many stage 2 are running in parallel in the "after" scenario?
Yes, in the before scenario, one of the PRP threads was doing a P-1 on its exponent too.

In the "after" it's only this one that's doing P-1.

Quote:
 Originally Posted by axn Also, you're not showing how much memory is actually being used and how many relative primes are being processed. Can you provide those details as well (preferably for both before & after).
Sure. I've got the output log.

Here is the full output of the "before". I've removed the other workers.
Quote:
 [Worker #2 Dec 10 09:50] Setting affinity to run worker on CPU core #2 [Worker #2 Dec 10 09:50] Optimal P-1 factoring of M107608939 using up to 4000MB of memory. [Worker #2 Dec 10 09:50] Assuming no factors below 2^77 and 2 primality tests saved if a factor is found. [Worker #2 Dec 10 09:50] Optimal bounds are B1=740000, B2=20837000 [Worker #2 Dec 10 09:50] Chance of finding a factor is an estimated 3.51% [Worker #2 Dec 10 09:50] Using FMA3 FFT length 5760K, Pass1=1280, Pass2=4608, clm=4 [Worker #2 Dec 10 09:50] Ignoring suggested B1 value, using B1=610000 from the save file [Worker #2 Dec 10 09:50] Ignoring suggested B2 value, using B2=7022000 from the save file [Worker #2 Dec 10 09:50] Available memory is 1925MB. [Worker #2 Dec 10 09:50] Using 1909MB of memory. Processing 37 relative primes (0 of 480 already processed). [Worker #2 Dec 10 09:50] M107608939 stage 2 is 0.00% complete. [Worker #2 Dec 10 10:06] M107608939 stage 2 is 2.16% complete. Time: 951.162 sec. [Worker #2 Dec 10 10:22] M107608939 stage 2 is 4.31% complete. Time: 944.688 sec.
And the one from "after"

Quote:
 [Worker #2 Dec 11 11:12] Optimal P-1 factoring of M107610803 using up to 24000MB of memory. [Worker #2 Dec 11 11:12] Assuming no factors below 2^77 and 2 primality tests saved if a factor is found. [Worker #2 Dec 11 11:12] Optimal bounds are B1=744000, B2=21803000 [Worker #2 Dec 11 11:12] Chance of finding a factor is an estimated 3.55% [Worker #2 Dec 11 11:12] Using FMA3 FFT length 5760K, Pass1=1280, Pass2=4608, clm=4 [Worker #2 Dec 11 11:12] M107610803 stage 1 is 99.85% complete. [Worker #2 Dec 11 11:14] M107610803 stage 1 complete. 4336 transforms. Time: 113.176 sec. [Worker #2 Dec 11 11:14] Starting stage 1 GCD - please be patient. [Worker #2 Dec 11 11:15] Stage 1 GCD complete. Time: 48.280 sec. [Worker #2 Dec 11 11:15] Available memory is 23742MB. [Worker #2 Dec 11 11:15] Using 22960MB of memory. Processing 480 relative primes (0 of 480 already processed). [Worker #2 Dec 11 11:44] M107610803 stage 2 is 0.00% complete. Time: 1771.636 sec. [Worker #2 Dec 11 11:46] Worker stopped. [Worker #2 Dec 11 11:58] Waiting 5 seconds to stagger worker starts. [Worker #2 Dec 11 11:58] Worker starting [Worker #2 Dec 11 11:58] Setting affinity to run worker on CPU core #2 [Worker #2 Dec 11 11:58] Optimal P-1 factoring of M107610803 using up to 24000MB of memory. [Worker #2 Dec 11 11:58] Assuming no factors below 2^77 and 2 primality tests saved if a factor is found. [Worker #2 Dec 11 11:58] Optimal bounds are B1=744000, B2=21803000 [Worker #2 Dec 11 11:58] Chance of finding a factor is an estimated 3.55% [Worker #2 Dec 11 11:58] Using FMA3 FFT length 5760K, Pass1=1280, Pass2=4608, clm=4 [Worker #2 Dec 11 11:58] Available memory is 23742MB. [Worker #2 Dec 11 11:58] Using 22960MB of memory. Processing 480 relative primes (0 of 480 already processed). [Worker #2 Dec 11 12:32] M107610803 stage 2 is 0.06% complete. [Worker #2 Dec 11 12:49] M107610803 stage 2 is 0.93% complete. Time: 1031.830 sec. [Worker #2 Dec 11 13:06] M107610803 stage 2 is 1.80% complete. Time: 1043.622 sec.
I don't think there is any thrashing going on.

As I said, I keep the full output logs. So if you need me to post more, let me know.

Last fiddled with by ZFR on 2020-12-12 at 16:37

 2020-12-12, 17:43 #6 ZFR     Feb 2008 Meath, Ireland A816 Posts Is it normal that whenever I start mprime I get "Processing 480 relative primes (0 of 480 already processed)."
2020-12-12, 18:40   #7
petrw1
1976 Toyota Corona years forever!

"Wayne"
Nov 2006

2×41×61 Posts

Quote:
 Originally Posted by ZFR Is it normal that whenever I start mprime I get "Processing 480 relative primes (0 of 480 already processed)."
It's a function of how much RAM you allocate.
In all my experience the bigger this number the faster it completes.

Processing 480 relative primes

 2020-12-12, 19:20 #8 ZFR     Feb 2008 Meath, Ireland 23×3×7 Posts Is there a way to restart stage 2 only, while keeping the stage 1 result? I want to test different memory values to see what it looks like. I have m107610803 m107610803.bu m107610803.bu2 m107610803.write files. If I remove them, it starts from P-1 stage 1.
 2020-12-12, 20:30 #9 ZFR     Feb 2008 Meath, Ireland 23×3×7 Posts By the way, not sure if it matters, but P-1 stage 1 was stopped midway before doing stage 2. Basically, initially I only had around 300MB assigned. Did stage 1 of exponent1 Started stage 1 of exponent2 (I wonder why stage 2 of exponent1 didn't start, checked logs and saw too little memory) Increase to 4GB, stage 2 of exponenent1 starts. Normal speed. At the same time stage 2 of one of the PRPs starts. Normal speed. Stage 2 of exponent1 finishes. Stage 1 of exponent2 completes. Stage 2 of exponent2 starts. Slow speed. Anyway, I've removed those m107610803 files and am restarting this. Let me see how it goes.
 2020-12-12, 22:35 #10 ZFR     Feb 2008 Meath, Ireland 2508 Posts OK, that looks like the reason. The first one used the original bounds for its stage 2; the ones that were used at the end of its stage 1. The second one switched bounds in the middle of its stage 1. So while the second one is slower, it has much bigger bounds. I'm not 100% familiar with the 2 stage P-1 algorithm. Are you allowed to change bounds in the middle of stage 1? When I change memory allocation while in stage 1, the bounds and speed change accordingly. But when I change memory allocation while in stage 2, I get the message "Ignoring optimal bounds, using bounds from save file". Is this correct?
 2020-12-13, 02:38 #11 axn     Jun 2003 22×1,319 Posts Code: [Worker #2 Dec 10 09:50] Ignoring suggested B1 value, using B1=610000 from the save file [Worker #2 Dec 10 09:50] Ignoring suggested B2 value, using B2=7022000 from the save file Mystery solved. The first one was not doing the B1/B2 it is claiming to be doing. Last fiddled with by axn on 2020-12-13 at 02:38

 Similar Threads Thread Thread Starter Forum Replies Last Post Antonio Software 6 2012-09-04 12:48 gamer30 Software 17 2012-08-23 20:02 James Heinrich Software 5 2005-03-22 20:05 gckw Software 3 2003-09-07 06:56 nomadicus Hardware 9 2003-03-01 00:15

All times are UTC. The time now is 06:50.

Sat Jan 29 06:50:11 UTC 2022 up 190 days, 1:19, 1 user, load averages: 0.89, 1.03, 1.07

Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔