mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   PrimeNet (https://www.mersenneforum.org/forumdisplay.php?f=11)
-   -   P-1 before LL (https://www.mersenneforum.org/showthread.php?t=18370)

Rodrigo 2013-07-17 05:23

P-1 before LL
 
Not sure if this question belongs here or in the Software subforum:

Is there a way to set a worker to do P-1 prior to LL on the same exponent, or are we limited to setting each worker to do one [B]or[/B] the other?

Sorry if this question has been answered already, but nothing jumped out at me during a visual scan, and the terms are common enough that Search might yield thousands of results to wade through. At least, I didn't come up with rare enough search terms.

Thanks!

Rodrigo

prgamma10 2013-07-17 06:25

You might be able to change the worktodo file (if you got the P-1 task) and force manual communication (avoiding duplication of work).

ET_ 2013-07-17 07:36

[QUOTE=Rodrigo;346503]Not sure if this question belongs here or in the Software subforum:

Is there a way to set a worker to do P-1 prior to LL on the same exponent, or are we limited to setting each worker to do one [B]or[/B] the other?

Sorry if this question has been answered already, but nothing jumped out at me during a visual scan, and the terms are common enough that Search might yield thousands of results to wade through. At least, I didn't come up with rare enough search terms.

Thanks!

Rodrigo[/QUOTE]

IIRC, this is the standard configuration: every exponent chosen for LL test should stand factorization up a defined limit, and P-1 before starting LL, unless:

a - Someone already did the factorization work stand-alone, or
b - The PC has not a comfortable RAM disposition to do P-1.

Luigi

If you didn't

LaurV 2013-07-17 07:49

Do you want to [U]duplicate[/U] P-1 work, or just to be sure that you [U]don't miss[/U] the P-1 work which has to be done before any LL, in case it was NOT done? Because those are two different things. (well the question is a bit odd, as you anyhow duplicate a lot of work, because the checkpoint files are not stored in case of P-1, and you start every time from scratch).

It should be nice if, depending on your memory settings for P95, the program would calculate a B1 and B2 limits and would do some more P-1 when those limits are over the "already done" values. But the real life in different. What you can do is to set a lot of memory for P-1 stage 2, and modify the "worktodo.txt" to replace the last ",1" to ",0" in all [URL="http://www.mersennewiki.org/index.php/Worktodo.txt"]"Test=" and "DoubleCheck=" lines[/URL]. That would convince P95 to always do "heavy" P-1 to all expos, regardless of the fact that the work was done before or not (pr poorly, partially, etc). Also, to get rid of the burden, you may take exponents from GPU72, which are well TF-ed and P-1-ed.

Also, see the "SequentialWorkToDo=0" in undoc.txt file, that may be what you want.

Rodrigo 2013-07-17 15:56

[QUOTE=ET_;346511]IIRC, this is the standard configuration: every exponent chosen for LL test should stand factorization up a defined limit, and P-1 before starting LL, unless:

a - Someone already did the factorization work stand-alone, or
b - The PC has not a comfortable RAM disposition to do P-1.

Luigi

If you didn't[/QUOTE]
Luigi,

Something seems to be missing at the end there. :smile:

Rodrigo

Rodrigo 2013-07-17 16:09

[QUOTE=LaurV;346512]Do you want to [U]duplicate[/U] P-1 work, or just to be sure that you [U]don't miss[/U] the P-1 work which has to be done before any LL, in case it was NOT done? Because those are two different things. (well the question is a bit odd, as you anyhow duplicate a lot of work, because the checkpoint files are not stored in case of P-1, and you start every time from scratch).

It should be nice if, depending on your memory settings for P95, the program would calculate a B1 and B2 limits and would do some more P-1 when those limits are over the "already done" values. But the real life in different. What you can do is to set a lot of memory for P-1 stage 2, and modify the "worktodo.txt" to replace the last ",1" to ",0" in all [URL="http://www.mersennewiki.org/index.php/Worktodo.txt"]"Test=" and "DoubleCheck=" lines[/URL]. That would convince P95 to always do "heavy" P-1 to all expos, regardless of the fact that the work was done before or not (pr poorly, partially, etc). Also, to get rid of the burden, you may take exponents from GPU72, which are well TF-ed and P-1-ed.

Also, see the "SequentialWorkToDo=0" in undoc.txt file, that may be what you want.[/QUOTE]

From the two choices in your first paragraph, I'd prefer to do the latter.

I guess that my question has to do with the fact that, [B]sometimes,[/B] one of my workers will perform P-1 before starting an LL. Therefore I was wondering if there is a way to set things so that the worker [B]always[/B] picks out exponents that need P-1 work, and then (if applicable) goes on to do the LL on that exponent.

A (maybe) separate item: the long paragraph in your reply (the part dealing with setting the last value to 0) appears to involve duplicating a lot of work previously done, would that be right?

Will experiment with the "SequentialWorkToDo=0" setting and see how it works out.

Thanks!

Rodrigo

ET_ 2013-07-17 16:31

[QUOTE=Rodrigo;346538]Luigi,

Something seems to be missing at the end there. :smile:

Rodrigo[/QUOTE]

It was not missing, it was exceeding... :smile:

LaurV 2013-07-18 05:49

[QUOTE=Rodrigo;346540]From the two choices in your first paragraph, I'd prefer to do the latter.
[/QUOTE]
Indeed that was my guess, still my question was not very clear, it did not clarify (for the second case) if you want [U]to do[/U] or [U]to avoid[/U] doing P-1. But your next paragraph clarified it perfectly.
[QUOTE]
I guess that my question has to do with the fact that, [B]sometimes,[/B] one of my workers will perform P-1 before starting an LL. Therefore I was wondering if there is a way to set things so that the worker [B]always[/B] picks out exponents that need P-1 work, and then (if applicable) goes on to do the LL on that exponent.
[/QUOTE]
That was intended, and it was "normal" behavior long ago before we became faster in doing P-1 stuff with the GPUs. If you only want to get exponents which have no P-1 done enough, then you better pick them from the "seventies" or "eighties" (70-80M range) where the P-1 front did not reach yet. Caution: this is only suitable if you have an old computer with a lot of RAM. New computers are better doing LL tests. Otherwise you better let the GPUs do P-1, which is faster, or some guys with old computers which would need to wait too much for finishing LL tests and prefer to do P-1 (otherwise, you take the bread from their mouth, well, not really, there is enough work for everybody, but figuratively speaking).
[QUOTE]
A (maybe) separate item: the long paragraph in your reply (the part dealing with setting the last value to 0) appears to involve duplicating a lot of work previously done, would that be right?[/QUOTE]
Yes indeed, this was intended for the case when you WANT to do P-1, regardless of the status (case 1 in the first paragraph). This makes sense (as I said) in case youhave [U][B]a lot[/B][/U] of RAM memory allocated for stage 2, [B][U]and[/U][/B] you want to extend the B1/B2 limits for P-1. A lot of people do this, hoping to find record factors, and indeed, they duplicate lots of effort, as they re-calculate the b^E for stage 1 every time.

The design was done like that from the inception, is not "good" nor "bad", just the PrimeNet server does not store the checkpoint files for P-1. Those are huge files and storing them would need lots of space, and will generate lots of traffic, and may not be "safe" anyhow (some mechanism is needed to forbid bad guys uploading wrong residue files). So, every worker who wants to extend B1/B2 limits will [U][B]duplicate[/B][/U] the work done by the former workers (and do some additional). With the new GPU toy it would be possible to store the "end of stage 1" files and when a larger B1 is supplied, the program will only "extend" the calculus and store the new file too. Some guys (me included!) will have "databases" of those "EOS1" files, for people interested in extending P-1 bounds. A method how to share those and how to check they are "real" may be discussed in the future.

Rodrigo 2013-07-23 23:27

I just realized I never got back to LaurV for his thorough explanation. Thank you very much!

At this stage (so to speak) it sounds like it'll be simpler just to assign specific workers to focus on P-1 and let others do LL separately.

I'll tune in to this channel periodically to stay up to date on developments.

Rodrigo

Mark Rose 2013-10-25 19:26

[QUOTE=LaurV;346607]That was intended, and it was "normal" behavior long ago before we became faster in doing P-1 stuff with the GPUs. If you only want to get exponents which have no P-1 done enough, then you better pick them from the "seventies" or "eighties" (70-80M range) where the P-1 front did not reach yet. Caution: this is only suitable if you have an old computer with a lot of RAM. New computers are better doing LL tests. Otherwise you better let the GPUs do P-1, which is faster, or some guys with old computers which would need to wait too much for finishing LL tests and prefer to do P-1 (otherwise, you take the bread from their mouth, well, not really, there is enough work for everybody, but figuratively speaking).[/QUOTE]

So I have an Athlon II X4 640 (4 cores @ 3.0 Ghz, 512KB L2) with 16 GB of RAM... is P-1 the best use of this machine? Or is LL still a better use of it?

Uncwilly 2013-10-26 00:01

[QUOTE=Mark Rose;357433]So I have an Athlon II X4 640 (4 cores @ 3.0 Ghz, 512KB L2) with 16 GB of RAM... is P-1 the best use of this machine? Or is LL still a better use of it?[/QUOTE] If you ran 4 threads of P-1 with upto 2.5 GB each, that would help. There is a need for CPU's doing P-1.

kracker 2013-10-26 00:37

Scaling from two cores up with P-1 is terrible on P95

TheMawn 2013-10-26 01:05

I was actually wondering about something similar myself. My system has a fast processor and 16 GB of RAM, of which 14GB is basically sitting doing absolutely nothing. Depending on the efficiency of P-1 versus LL or DC, I might consider switching a worker to P-1. It made me wonder how much better, say, one worker with 12 GB is versus two workers with 6 GB. Or how efficient 3 LL's and 1 P-1 @ 12 GB is versus 2 LL's and 2 P-1 @ 6 GB each.

kracker 2013-10-26 01:34

What I mean(and maybe you already got that) is that running 1 P-1 on 4 cores is bad compared to 4 P-1 on 4 cores.

Mark Rose 2013-10-26 01:48

In stage 1, I'm getting ~10% less per-worker performance when running 4 workers (4 cores) vs 1 worker (1 core).

We'll see how it goes with stage 2 tomorrow.

lycorn 2013-10-31 12:06

[QUOTE=TheMawn;357468] It made me wonder how much better, say, one worker with 12 GB is versus two workers with 6 GB. Or how efficient 3 LL's and 1 P-1 @ 12 GB is versus 2 LL's and 2 P-1 @ 6 GB each.[/QUOTE]

From my experience, if you are focused on P-1 work, it is way more efficient to run 2 P-1 @ 6GB than 1 @12 GB (6 GB is already a pretty large amount of memory, and you won´t notice anything special by giving another 6 GB to one single test.
The impact of running P-1 tests in one or two cores on the LL tests running on the others is not significant, on a system with a decently fast (1600MHz+) memory. I mean, the bottleneck effects described on SB/IB/Haswell systems depend more on the number of cores running FFT calculations than on the type of test they are used at (P-1 or LL).
On the other hand, the project is in need of P-1 firepower (or at least it used to be the case, I think it still is), so if I were you, I would be running 2 LL tests and 2 P-1 @ 6 GB each (basically allocate 12 GB to Prime95, assign two cores to P-1 - one test on each core - and let the system choose the amount of memory to be allocated to each one).
My 2 cents...:smile:

chalsall 2013-10-31 18:53

[QUOTE=lycorn;358015]On the other hand, the project is in need of P-1 firepower (or at least it used to be the case, I think it still is)...[/QUOTE]

Actually, thanks to Carl's GPU P-1 program, this is no longer the case.

In fact, while TFing to 74 is keeping ahead of the LLing (we're gaining approximately one day of lead time for every ten days of real-time), we're currently having a bit of trouble keeping up with the P-1ing.

At the moment, occasionally candidates are released for P-1'ing at "only" 73 bits. Not the end of the world; they are still kept for TFing to 74 if a factor isn't found.

But, to put on the table, those using Carl's GPU program to P-1, it would help the project if you did at least one TF run to 74 (or two to 73 (or four to 72)) in parallel for every P-1 run done.

Not in any way trying to take away from Carl's excellent work. And, as always, everyone is free to do whatever they want with their own kit/time/money. This is simply a meta perspective of our current situation.

Mark Rose 2013-10-31 21:12

It would be excellent if that information were posted in a single place on some page: a simple table to say if you have X kind of hardware to use, put it to Y use.

chalsall 2013-10-31 22:05

[QUOTE=Mark Rose;358069]It would be excellent if that information were posted in a single place on some page: a simple table to say if you have X kind of hardware to use, put it to Y use.[/QUOTE]

Yeah... You're correct. Sorry...

The truth is we are all volunteers here. And most of us also have "real-work" to take care of (which pays for our hobbies, such as this). We tend not to be that good at documentation (we find it boring).

In the short form, if you have a GPU, do TF'ing (if you're willing). It's the best thing for GIMPS at the moment (subject to correction or clarification by other very knowledgeable observers).

LaurV 2013-11-01 02:58

[QUOTE=chalsall;358054]
At the moment, occasionally candidates are released for P-1'ing at "only" 73 bits. Not the end of the world; they are still kept for TFing to 74 if a factor isn't found.[/QUOTE]
And why is this bad? In fact, playing a bit with the pencil and the eraser, it seems that the "breaking point" between P-1 and TF for a gtx580 for ~63M-65M range would be around 73.7 bits or so. Therefore the optimum path would be
1. TF to 73
2. P-1 to B1=590K, B2~=13M (or B1=B2=4M9)
3. TF to 74.

Remember, George himself recommended "TF to x bits, then P-1, then TF to x+1 bits". At that time, before GPU step into the game, x was much lower, but the philosophy is the same, that time it was CPU against CPU, now it is GPU against GPU, so assuming your GPU can do both TF [U]and[/U] P-1, most probably you will be better if you do the "last bit" after P-1. Here highly depends on the GPU you have. Remark that I "carefully" selected the B1 and B2 above, for a chance of finding a factor to about 4%, in this case the step 2 and step 3 in the "algorithm" would have almost the same chance to eliminate an exponent in an almost identical (fixed) period of time, but with step 2 slightly better. For other cards where FFT is not so fast, step 2 can be much slower in eliminating factors compared with step 3.

So, at the end, is a matter of the system you have, and a matter of how mfaktc.exe evolves against cudaPm1.exe in the following period :razz:

Mark Rose 2013-11-01 03:11

[QUOTE=chalsall;358075]Yeah... You're correct. Sorry...

The truth is we are all volunteers here. And most of us also have "real-work" to take care of (which pays for our hobbies, such as this). We tend not to be that good at documentation (we find it boring).[/QUOTE]

I understand completely.

[QUOTE]
In the short form, if you have a GPU, do TF'ing (if you're willing). It's the best thing for GIMPS at the moment (subject to correction or clarification by other very knowledgeable observers).[/QUOTE]

Thanks!

And I do: 9,546 GHz-days TF in the last month according to GPU72 :)

Prime95 2013-11-01 14:05

[QUOTE=LaurV;358100]Remember, George himself recommended "TF to x bits, then P-1, then TF to x+1 bits". At that time, before GPU step into the game, x was much lower, but the philosophy is the same, that time it was CPU against CPU, now it is GPU against GPU, so assuming your GPU can do both TF [U]and[/U] P-1, most probably you will be better if you do the "last bit" after P-1[/QUOTE]

That advice may not hold for the GPU. The breakeven point depends on the relative speed of TF and P-1 on the GPU. Choose your next task (P-1 or TF) based on which will find factors more efficiently.

Say the P-1 job with 4% chance of success takes 8 hours -- that's one factor every 8.33 days. Say TF to the next bit level takes 2 hours -- that's one factor every 74/12 = 6.17 days. So TF to next bit level is best.

Plug in you GPU's actual P-1 and TF times to reach the correct conclusion for your own GPU.


BTW, this advice can be superceeded by other factors. If GIMPS as a whole has a shortage of P-1 GPUs and an excess of TF GPUs, then it is better to do the next TF level as it reduces the amount of P-1 work to be done.

LaurV 2013-11-01 17:30

[QUOTE=Prime95;358131]That advice may not hold for the GPU. The breakeven point depends on the relative speed of TF and P-1 on the GPU. Choose your next task (P-1 or TF) based on which will find factors more efficiently.

Say the P-1 job with 4% chance of success takes 8 hours -- that's one factor every 8.33 days. Say TF to the next bit level takes 2 hours -- that's one factor every 74/12 = 6.17 days. So TF to next bit level is best.

Plug in you GPU's actual P-1 and TF times to reach the correct conclusion for your own GPU.


BTW, this advice can be superceeded by other factors. If GIMPS as a whole has a shortage of P-1 GPUs and an excess of TF GPUs, then it is better to do the next TF level as it reduces the amount of P-1 work to be done.[/QUOTE]

Sure, that was exactly what I said: everybody has to do his homework.

Except that I put "realistic" numbers into the calculus (~160 minutes per P-1 assignment, about 80 minutes per each stage). If it would take 6 days to eliminate one exponent, by either TF or P-1, than you would be better doing LL directly. The 580 needs below 100 hours for a front range LL.

What we still can discuss is the balance between B1 and B2, especially if we would have the possibility to "extend" a B1. But that is another story.

For the last paragraf of your post, see where I said I do P-1 because I am afraid few bad guys will push me out of lifetime top 100 :razz: I just stepped up 6 places today, and I am going to stop when I will get into 100 percentile. Which will be in about 10 days. Can you see a better reason? :P

TheMawn 2013-11-01 20:42

Regarding documentation:

I too have many times had questions which could easily have been answered by a little document saying here's how many exponents are where and here's how many were brought to and from these stages in the last week, or whatever.

It would be a lot of work and I personally am not volunteeering for the job :razz:

Also the fact that P-1 isn't done at a specific stage makes things a bit tougher, because you have 74 bits P-1, 74 bits no P-1, 73 bits P-1, 73 bits no P-1, etc.

Best just keep regularly checking that things are progressing at relatively smooth rates. I.e., if P-1 is getting one day ahead of TF every twenty days, then it practically never becomes a real issue.

LaurV 2013-11-02 04:06

You mean something like [URL="http://www.mersenne.info/trial_factored_tabular_data/0/0/"]this[/URL], or [URL="http://www.mersenne.info/trial_factored_tabular_delta_7/0/0/"]this[/URL]? (see also James' site). Or something specifically related to P-1?

TheMawn 2013-11-02 04:51

I am aware of those tables. But, yeah. Something with some P-1 information would be cool but I am completely aware of the effort required to make something "just because it would be neat." I think there's just too much information and not enough dimensions to have it all graphed up in one place so it's a lot of jumping back and forth to try to make sense of the data.

Ah well. I think it's good enough for me if someone like Chris can just give us a heads-up when something starts to fall behind or whatever.

TheMawn 2013-11-02 05:01

Looking further into those graphs I can see the remains of some of my work. A week and a bit ago I went and grabbed 50,000 exponents starting at 400M to trial factor. I've done this sort of thing before. I donno. I just get the bug. I brought 50,000 exponents from 66 to 68 (the same ones I brought to 66 from 65 a while back) which took a bit longer and generated less lines of results. Also staying away from the 65 to 66 wavefront helped avoid poaching as I can tell the 200M range is getting a good bit of activity.

You can see a big weird slightly-below-50,000 number of exponents TF'ed to 68 in the 400M. That would be me. Even on the primenet summary there's a slight amount more factors-found at 400M and 401M.

It's interesting to think that all that work found 1000 factors out of 50,000 candidates. It took about a week, and that was the easiest 2% of the remaining candidates to clear. For a range of two million exponents.

Man, this project is massive.

lycorn 2013-11-02 16:32

I hear what you said about the "bug". I get a different one from time to time. It´s nearly 12 years since I joined the project, so I really need to change the type of work from time to time. I had my DC phase, P-1 phase, TF phase, etc, most of the time tempered with some 1st time LLs. Over the last few months, I got a new bug, one I had never had: to work in the "lower end of the spectrum", i.e. TFing exponents that are below 65, and doing ECM on small exponents. Yeah, I know it´s most of it quite useless, and some years ago I was strongly against this "beating the dead horse" exercise, but hey, that´s life. For a couple of months more I think I will stick to it, using some old hardware parts. It´s one of the good things about these massive projects: plenty to choose from...

cheesehead 2013-11-07 22:59

[QUOTE=Mark Rose;358069]It would be excellent if that information were posted in a single place on some page: a simple table to say if you have X kind of hardware to use, put it to Y use.[/QUOTE]
[QUOTE=TheMawn;358152]Regarding documentation:

I too have many times had questions which could easily have been answered by a little document saying here's how many exponents are where and here's how many were brought to and from these stages in the last week, or whatever.

It would be a lot of work <snip>[/QUOTE]I recommend that anyone volunteering to make such documentation, place it on mersennewiki.org


All times are UTC. The time now is 22:41.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.