mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2009-01-08, 20:54   #1
MercPrime
 
MercPrime's Avatar
 
Jan 2009
1/n

3·7 Posts
Default Question on going deep and using cores

Hey All,

I've checked the forums and help, but I can't seem to find an answer, (and could very well be I don't know the "right" question to ask) so if anyone could point me in the riht direction - much appreciated.

Questions:

1. To go "deep" on a particular exponent with no factor found lets say 100000001 (just an example - but a 9 digit exponent) when one says "67 bit depth" is that when you set p95 under manual P-1 with the parameters:

a. 67 x 2 (1000000001) with B1=100000 and B2=0?

and is that equivelent to

b. 1 x 2 (100000001) with B1=100000 and B2=0?

So the I guess is a.=b.? and if not which one should I do to prep an exponent as much as possible before the LL test?

2. I have a computer with 4 cores. is there any way to tweak p95 do use all 4 cores on the processing of 1 exponent? I checked the undoc.txt and see you can manually edit this but I don't want to futz something up. As it stands right now doing a P-1 on an exponent only uses 25% CPU capability.



Any light anyone can shed on the above would be much appreciated! thanks!
MercPrime is offline   Reply With Quote
Old 2009-01-09, 00:55   #2
Uncwilly
6809 > 6502
 
Uncwilly's Avatar
 
"""""""""""""""""""
Aug 2003
101×103 Posts

2·3·1,597 Posts
Default

Bit depth and P-1 are oranges and limes.

Bit depth refers to how far an exponent has been trial factored.
P-1 has bounds, B1 & B2.

Normally you want to factor an exponent to a predetermined bit depth (based up the size of the exponent) or until a factor is found, then do P-1.
(George has changed this a little, factor to goal_bit-2, P-1, continue to factor to goal_bit.)

If you are looking at exponents in the 332,000,000 range, the factor goal_bit level is 77.


If you are in a rush to get a particular exponent done (and are willing to possibly waste some effort), you can have multiple cores work on the exponent. Here is how I would structure a worktodo.txt file to get 332,000,0001 (a hypothetical example), from a pre-existing bit depth of 61, up to 74, on a 2 core machine.
Code:
[worker #1]
factor=332000000,61,62
factor=332000000,63,64
factor=332000000,65,66
factor=332000000,67,68
factor=332000000,69,70
factor=332000000,71,72
factor=332000000,73,74

[worker #2]
factor=332000000,62,63
factor=332000000,64,65
factor=332000000,66,67
factor=332000000,68,69
factor=332000000,70,71
factor=332000000,72,73
You would then need to check on the program to make sure that if a factor is found that you stop the program and get it working a new number.
Uncwilly is offline   Reply With Quote
Old 2009-01-09, 02:24   #3
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

2·3·773 Posts
Default

Quote:
Originally Posted by MercPrime View Post
2. I have a computer with 4 cores. is there any way to tweak p95 do use all 4 cores on the processing of 1 exponent? I checked the undoc.txt and see you can manually edit this but I don't want to futz something up. As it stands right now doing a P-1 on an exponent only uses 25% CPU capability.
In most cases when you assign multiple cores to one "test" you lose some potential throughput due to contention for shared resources (Memory?, Bus?, etc.)

For example on my Q9550 4-core each core can complete a LL test on an exponent in the 47M range in about 35 days. If I configure it to use 2 cores for an exponent it will take about 20 days. Going to 3 or 4 gets even worse.

However on a recent test on my 2-core E6600 I found that a 18M double check ran almost exactly twice as fast with two cores.

Last fiddled with by petrw1 on 2009-01-09 at 02:25
petrw1 is offline   Reply With Quote
Old 2009-01-09, 04:12   #4
MercPrime
 
MercPrime's Avatar
 
Jan 2009
1/n

258 Posts
Default

Thanks Uncwilly! The syntax for the worktodo files helps alot. (There isn't a menu for "bit depth" in the advanced tab, so the whole "bit depth" thing everyone talks about and I suppose everyone assumes you know the sytax from the "old days" or something.


Petrw1 - Yeah on my particular machine if I do 3 or more cores there is this weird jump in the 4096k range, then goes down by double, and I've seen the same thing with using 3+ cores for LL test. Definately agree - I seem to get best results with running 2 x cores

MP

Last fiddled with by MercPrime on 2009-01-09 at 04:13
MercPrime is offline   Reply With Quote
Old 2009-01-09, 07:42   #5
Freightyard
 
Nov 2008
San Luis Obispo CA

27 Posts
Default

Quote:
Originally Posted by MercPrime View Post
on my particular machine if I do 3 or more cores there is this weird jump in the 4096k range, then goes down by double, and I've seen the same thing with using 3+ cores for LL test. Definately agree - I seem to get best results with running 2 x cores
Most of the Core 2 Quad chips are actually two dual-core CPUs on one die. Thus, each dual-core CPU has an independent cache. Running three or four cores on one exponent causes data to be needed that is in the other (non-local) cache.
Freightyard is offline   Reply With Quote
Old 2009-01-09, 10:03   #6
lycorn
 
lycorn's Avatar
 
Sep 2002
Oeiras, Portugal

3·487 Posts
Default

Quote:
Originally Posted by petrw1 View Post
For example on my Q9550 4-core each core can complete a LL test on an exponent in the 47M range in about 35 days.
With the computer crunching 24/7?
I was expecting it to be faster. :surprised. What is the iteration time?
lycorn is offline   Reply With Quote
Old 2009-01-09, 15:29   #7
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

463810 Posts
Default

Quote:
Originally Posted by lycorn View Post
With the computer crunching 24/7?
I was expecting it to be faster. :surprised. What is the iteration time?
First point: I have NOT overclocked.

With only one core doing a LL at 47.7M it was about 0.058 per iteration.
With 3 cores doing LL at that level and 1 core doing P-1 I was averaging about 0.066 seconds while P-1 was in Phase 1 and about 0.068 when P-1 was in phase 2.

Related to that, my P-1 core is working in the 50M range using 1024M RAM. It is taking about 21 hours for Phase 1 and about 36 hours for Phase 2 for a total of 57 hours. The Test... Status menu keeps telling me they entire P-1 should finish in 50 hours.

If these numbers are out of line then I need to check into it.
petrw1 is offline   Reply With Quote
Old 2009-01-09, 23:00   #8
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

11·101 Posts
Default

Quote:
Originally Posted by Uncwilly View Post
If you are in a rush to get a particular exponent done (and are willing to possibly waste some effort), you can have multiple cores work on the exponent. Here is how I would structure a worktodo.txt file to get 332,000,0001 (a hypothetical example), from a pre-existing bit depth of 61, up to 74, on a 2 core machine.
Code:
[worker #1]
factor=332000000,61,62
factor=332000000,63,64
factor=332000000,65,66
factor=332000000,67,68
factor=332000000,69,70
factor=332000000,71,72
factor=332000000,73,74

[worker #2]
factor=332000000,62,63
factor=332000000,64,65
factor=332000000,66,67
factor=332000000,68,69
factor=332000000,70,71
factor=332000000,72,73
You would then need to check on the program to make sure that if a factor is found that you stop the program and get it working a new number.
Shouldn't it be
Code:
[worker #1]
factor=332000000,61,73

[worker #2]
factor=332000000,73,74
since factoring from 0 to 73 bits needs approximately as long as factoring from 73 bits to 74 bits?
TheJudger is offline   Reply With Quote
Old 2009-01-10, 00:47   #9
Uncwilly
6809 > 6502
 
Uncwilly's Avatar
 
"""""""""""""""""""
Aug 2003
101×103 Posts

2×3×1,597 Posts
Default

Quote:
Originally Posted by TheJudger View Post
Shouldn't it be
Code:
[worker #1]
factor=332000000,61,73

[worker #2]
factor=332000000,73,74
since factoring from 0 to 73 bits needs approximately as long as factoring from 73 bits to 74 bits?
Depends on how much you want it to get done in the absolutely shortest time possible, or in about as fast as possible and avoid some wasted effort. If one was running 3 cores, sure do the high bit on one and interleave the others (even a smarter interleave).
Uncwilly is offline   Reply With Quote
Old 2009-01-10, 11:07   #10
S485122
 
S485122's Avatar
 
Sep 2006
Brussels, Belgium

2×5×167 Posts
Default

Quote:
Originally Posted by Uncwilly View Post
Depends on how much you want it to get done in the absolutely shortest time possible, or in about as fast as possible and avoid some wasted effort. If one was running 3 cores, sure do the high bit on one and interleave the others (even a smarter interleave).
Interleaving could cause problems : if for one reason or another you would send results out of order the server will reject themwith the "Result not needed" error. This would mean you have to communicate manually at least and perhaps even comunicate the results via the web site, not via the program.

Even if doing the lower order on one one core and the highest bit on another you could find a factor early on in the highest bit, meaning that : first of all you are not shure you have found the smallest factor* but also that you have to rearange all your work to stop the now unnecessary testing.

As for the speed-up it will be relative : each level will take double the time of the preceding one, this means that you use at most twice the time of the longest one when working sequentially. But the minimum time (when no factor is found) is the time neede by the highest bit level. That means that you have to put all the other bit levels on the other core. Bringing in more cores will not help to shorten the time. (In your example of interleaving Worker #1 would need twice the time Worker #2 needs to complete the assignment.) All in all the method proposed by The Judger makes most sense if working out of sequence.

I would stick to sequential trial factoring , use the other cores to work on other exponents ;-)

Jacob

* Prime95 will stop at the first factor found, so the factor reported is not always the smallest, but this is whithin one bit level.

Last fiddled with by S485122 on 2009-01-10 at 11:22 Reason: the interleaving schemes
S485122 is offline   Reply With Quote
Old 2009-01-10, 15:28   #11
cheesehead
 
cheesehead's Avatar
 
"Richard B. Woods"
Aug 2002
Wisconsin USA

22×3×641 Posts
Default

Because the GIMPS database currently has no way to record that TF on some exponent skipped some bit levels (e.g., 0-63, 66-67, without any TF from 263 to 266, will be recorded simply as having been TFed to 67), I urge all TF to be strictly sequential. (But see OTOH below.)

Not even
Code:
[worker #1]
factor=332000000,61,73

[worker #2]
factor=332000000,73,74
, because very few users are likely to see the importance of completing the range to 273 if worker #2 finds a factor. (Not to mention the gap left if factors are found at, for instance, both the 68th and 74th bit levels)

Of course, it would be better to institute a way to accurately record discontinuous, partial and fractional TF-range completions.

On The Other Hand: We currently know for sure that the GIMPS recording of simply the highest TF bit-level is misleading and incomplete, and thus are not leaving as solid a legacy of comprehensive sweep there as we are in L-L testing. This argues against my previous admonition of strictly-sequential TF, since even those cases must and will be viewed as suspect in the future, as long as our current recording of results is flawed as it is.

- - -

Just to quantify the inefficiency of
Code:
[worker #1]
factor=332000000,61,73

[worker #2]
factor=332000000,73,74
:

Worker #1 has a factor-finding chance of 1/62 + 1/63 + ... + 1/73. Worker #2's chance is only 1/74 even though it requires slightly (~ 0.024%) more time than worker #1's assignment. It's far more likely that worker #1 will find a factor before worker #2 does (or finishes) than that worker #2 will find a factor before worker #1 does (or finishes). Worker #2's effort is more likely to be wasted than worker #1's.

Last fiddled with by cheesehead on 2009-01-10 at 16:09
cheesehead is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
News from sub-project Deep Sieving Batalov Operazione Doppi Mersennes 58 2021-02-05 10:44
Deep Sieving MM49 in parallel ET_ Operazione Doppi Mersennes 22 2016-07-28 11:23
What does Glib Deepak have to do with deep doo-doo? cheesehead Science & Technology 47 2014-12-14 13:45
Deep Hash diep Math 5 2012-10-05 17:44
NASA's Deep Impact... ixfd64 Lounge 5 2005-07-06 13:46

All times are UTC. The time now is 13:20.

Sun May 16 13:20:42 UTC 2021 up 38 days, 8:01, 0 users, load averages: 1.60, 1.47, 1.81

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.