mersenneforum.org Question on going deep and using cores
 Register FAQ Search Today's Posts Mark Forums Read

 2009-01-08, 20:54 #1 MercPrime     Jan 2009 1/n 1516 Posts Question on going deep and using cores Hey All, I've checked the forums and help, but I can't seem to find an answer, (and could very well be I don't know the "right" question to ask) so if anyone could point me in the riht direction - much appreciated. Questions: 1. To go "deep" on a particular exponent with no factor found lets say 100000001 (just an example - but a 9 digit exponent) when one says "67 bit depth" is that when you set p95 under manual P-1 with the parameters: a. 67 x 2 (1000000001) with B1=100000 and B2=0? and is that equivelent to b. 1 x 2 (100000001) with B1=100000 and B2=0? So the I guess is a.=b.? and if not which one should I do to prep an exponent as much as possible before the LL test? 2. I have a computer with 4 cores. is there any way to tweak p95 do use all 4 cores on the processing of 1 exponent? I checked the undoc.txt and see you can manually edit this but I don't want to futz something up. As it stands right now doing a P-1 on an exponent only uses 25% CPU capability. Any light anyone can shed on the above would be much appreciated! thanks!
 2009-01-09, 00:55 #2 Uncwilly 6809 > 6502     """"""""""""""""""" Aug 2003 101×103 Posts 10,009 Posts Bit depth and P-1 are oranges and limes. Bit depth refers to how far an exponent has been trial factored. P-1 has bounds, B1 & B2. Normally you want to factor an exponent to a predetermined bit depth (based up the size of the exponent) or until a factor is found, then do P-1. (George has changed this a little, factor to goal_bit-2, P-1, continue to factor to goal_bit.) If you are looking at exponents in the 332,000,000 range, the factor goal_bit level is 77. If you are in a rush to get a particular exponent done (and are willing to possibly waste some effort), you can have multiple cores work on the exponent. Here is how I would structure a worktodo.txt file to get 332,000,0001 (a hypothetical example), from a pre-existing bit depth of 61, up to 74, on a 2 core machine. Code: [worker #1] factor=332000000,61,62 factor=332000000,63,64 factor=332000000,65,66 factor=332000000,67,68 factor=332000000,69,70 factor=332000000,71,72 factor=332000000,73,74 [worker #2] factor=332000000,62,63 factor=332000000,64,65 factor=332000000,66,67 factor=332000000,68,69 factor=332000000,70,71 factor=332000000,72,73 You would then need to check on the program to make sure that if a factor is found that you stop the program and get it working a new number.
2009-01-09, 02:24   #3
petrw1
1976 Toyota Corona years forever!

"Wayne"
Nov 2006

25·149 Posts

Quote:
 Originally Posted by MercPrime 2. I have a computer with 4 cores. is there any way to tweak p95 do use all 4 cores on the processing of 1 exponent? I checked the undoc.txt and see you can manually edit this but I don't want to futz something up. As it stands right now doing a P-1 on an exponent only uses 25% CPU capability.
In most cases when you assign multiple cores to one "test" you lose some potential throughput due to contention for shared resources (Memory?, Bus?, etc.)

For example on my Q9550 4-core each core can complete a LL test on an exponent in the 47M range in about 35 days. If I configure it to use 2 cores for an exponent it will take about 20 days. Going to 3 or 4 gets even worse.

However on a recent test on my 2-core E6600 I found that a 18M double check ran almost exactly twice as fast with two cores.

Last fiddled with by petrw1 on 2009-01-09 at 02:25

 2009-01-09, 04:12 #4 MercPrime     Jan 2009 1/n 2110 Posts Thanks Uncwilly! The syntax for the worktodo files helps alot. (There isn't a menu for "bit depth" in the advanced tab, so the whole "bit depth" thing everyone talks about and I suppose everyone assumes you know the sytax from the "old days" or something. Petrw1 - Yeah on my particular machine if I do 3 or more cores there is this weird jump in the 4096k range, then goes down by double, and I've seen the same thing with using 3+ cores for LL test. Definately agree - I seem to get best results with running 2 x cores MP Last fiddled with by MercPrime on 2009-01-09 at 04:13
2009-01-09, 07:42   #5
Freightyard

Nov 2008
San Luis Obispo CA

100000002 Posts

Quote:
 Originally Posted by MercPrime on my particular machine if I do 3 or more cores there is this weird jump in the 4096k range, then goes down by double, and I've seen the same thing with using 3+ cores for LL test. Definately agree - I seem to get best results with running 2 x cores
Most of the Core 2 Quad chips are actually two dual-core CPUs on one die. Thus, each dual-core CPU has an independent cache. Running three or four cores on one exponent causes data to be needed that is in the other (non-local) cache.

2009-01-09, 10:03   #6
lycorn

"GIMFS"
Sep 2002
Oeiras, Portugal

2·32·83 Posts

Quote:
 Originally Posted by petrw1 For example on my Q9550 4-core each core can complete a LL test on an exponent in the 47M range in about 35 days.
I was expecting it to be faster. :surprised. What is the iteration time?

2009-01-09, 15:29   #7
petrw1
1976 Toyota Corona years forever!

"Wayne"
Nov 2006

25·149 Posts

Quote:
 Originally Posted by lycorn With the computer crunching 24/7? I was expecting it to be faster. :surprised. What is the iteration time?
First point: I have NOT overclocked.

With only one core doing a LL at 47.7M it was about 0.058 per iteration.
With 3 cores doing LL at that level and 1 core doing P-1 I was averaging about 0.066 seconds while P-1 was in Phase 1 and about 0.068 when P-1 was in phase 2.

Related to that, my P-1 core is working in the 50M range using 1024M RAM. It is taking about 21 hours for Phase 1 and about 36 hours for Phase 2 for a total of 57 hours. The Test... Status menu keeps telling me they entire P-1 should finish in 50 hours.

If these numbers are out of line then I need to check into it.

2009-01-09, 23:00   #8
TheJudger

"Oliver"
Mar 2005
Germany

11×101 Posts

Quote:
 Originally Posted by Uncwilly If you are in a rush to get a particular exponent done (and are willing to possibly waste some effort), you can have multiple cores work on the exponent. Here is how I would structure a worktodo.txt file to get 332,000,0001 (a hypothetical example), from a pre-existing bit depth of 61, up to 74, on a 2 core machine. Code: [worker #1] factor=332000000,61,62 factor=332000000,63,64 factor=332000000,65,66 factor=332000000,67,68 factor=332000000,69,70 factor=332000000,71,72 factor=332000000,73,74 [worker #2] factor=332000000,62,63 factor=332000000,64,65 factor=332000000,66,67 factor=332000000,68,69 factor=332000000,70,71 factor=332000000,72,73 You would then need to check on the program to make sure that if a factor is found that you stop the program and get it working a new number.
Shouldn't it be
Code:
[worker #1]
factor=332000000,61,73

[worker #2]
factor=332000000,73,74
since factoring from 0 to 73 bits needs approximately as long as factoring from 73 bits to 74 bits?

2009-01-10, 00:47   #9
Uncwilly
6809 > 6502

"""""""""""""""""""
Aug 2003
101×103 Posts

10,009 Posts

Quote:
 Originally Posted by TheJudger Shouldn't it be Code: [worker #1] factor=332000000,61,73 [worker #2] factor=332000000,73,74 since factoring from 0 to 73 bits needs approximately as long as factoring from 73 bits to 74 bits?
Depends on how much you want it to get done in the absolutely shortest time possible, or in about as fast as possible and avoid some wasted effort. If one was running 3 cores, sure do the high bit on one and interleave the others (even a smarter interleave).

2009-01-10, 11:07   #10
S485122

"Jacob"
Sep 2006
Brussels, Belgium

174410 Posts

Quote:
 Originally Posted by Uncwilly Depends on how much you want it to get done in the absolutely shortest time possible, or in about as fast as possible and avoid some wasted effort. If one was running 3 cores, sure do the high bit on one and interleave the others (even a smarter interleave).
Interleaving could cause problems : if for one reason or another you would send results out of order the server will reject themwith the "Result not needed" error. This would mean you have to communicate manually at least and perhaps even comunicate the results via the web site, not via the program.

Even if doing the lower order on one one core and the highest bit on another you could find a factor early on in the highest bit, meaning that : first of all you are not shure you have found the smallest factor* but also that you have to rearange all your work to stop the now unnecessary testing.

As for the speed-up it will be relative : each level will take double the time of the preceding one, this means that you use at most twice the time of the longest one when working sequentially. But the minimum time (when no factor is found) is the time neede by the highest bit level. That means that you have to put all the other bit levels on the other core. Bringing in more cores will not help to shorten the time. (In your example of interleaving Worker #1 would need twice the time Worker #2 needs to complete the assignment.) All in all the method proposed by The Judger makes most sense if working out of sequence.

I would stick to sequential trial factoring , use the other cores to work on other exponents ;-)

Jacob

* Prime95 will stop at the first factor found, so the factor reported is not always the smallest, but this is whithin one bit level.

Last fiddled with by S485122 on 2009-01-10 at 11:22 Reason: the interleaving schemes

 2009-01-10, 15:28 #11 cheesehead     "Richard B. Woods" Aug 2002 Wisconsin USA 22×3×641 Posts Because the GIMPS database currently has no way to record that TF on some exponent skipped some bit levels (e.g., 0-63, 66-67, without any TF from 263 to 266, will be recorded simply as having been TFed to 67), I urge all TF to be strictly sequential. (But see OTOH below.) Not even Code: [worker #1] factor=332000000,61,73 [worker #2] factor=332000000,73,74 , because very few users are likely to see the importance of completing the range to 273 if worker #2 finds a factor. (Not to mention the gap left if factors are found at, for instance, both the 68th and 74th bit levels) Of course, it would be better to institute a way to accurately record discontinuous, partial and fractional TF-range completions. On The Other Hand: We currently know for sure that the GIMPS recording of simply the highest TF bit-level is misleading and incomplete, and thus are not leaving as solid a legacy of comprehensive sweep there as we are in L-L testing. This argues against my previous admonition of strictly-sequential TF, since even those cases must and will be viewed as suspect in the future, as long as our current recording of results is flawed as it is. - - - Just to quantify the inefficiency of Code: [worker #1] factor=332000000,61,73 [worker #2] factor=332000000,73,74 : Worker #1 has a factor-finding chance of 1/62 + 1/63 + ... + 1/73. Worker #2's chance is only 1/74 even though it requires slightly (~ 0.024%) more time than worker #1's assignment. It's far more likely that worker #1 will find a factor before worker #2 does (or finishes) than that worker #2 will find a factor before worker #1 does (or finishes). Worker #2's effort is more likely to be wasted than worker #1's. Last fiddled with by cheesehead on 2009-01-10 at 16:09

 Similar Threads Thread Thread Starter Forum Replies Last Post Batalov Operazione Doppi Mersennes 59 2021-09-15 09:47 ET_ Operazione Doppi Mersennes 22 2016-07-28 11:23 cheesehead Science & Technology 47 2014-12-14 13:45 diep Math 5 2012-10-05 17:44 ixfd64 Lounge 5 2005-07-06 13:46

All times are UTC. The time now is 01:54.

Fri Oct 22 01:54:29 UTC 2021 up 90 days, 20:23, 2 users, load averages: 2.94, 2.59, 2.05