mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   PrimeNet (https://www.mersenneforum.org/forumdisplay.php?f=11)
-   -   P-1 before LL (https://www.mersenneforum.org/showthread.php?t=18370)

kracker 2013-10-26 00:37

Scaling from two cores up with P-1 is terrible on P95

TheMawn 2013-10-26 01:05

I was actually wondering about something similar myself. My system has a fast processor and 16 GB of RAM, of which 14GB is basically sitting doing absolutely nothing. Depending on the efficiency of P-1 versus LL or DC, I might consider switching a worker to P-1. It made me wonder how much better, say, one worker with 12 GB is versus two workers with 6 GB. Or how efficient 3 LL's and 1 P-1 @ 12 GB is versus 2 LL's and 2 P-1 @ 6 GB each.

kracker 2013-10-26 01:34

What I mean(and maybe you already got that) is that running 1 P-1 on 4 cores is bad compared to 4 P-1 on 4 cores.

Mark Rose 2013-10-26 01:48

In stage 1, I'm getting ~10% less per-worker performance when running 4 workers (4 cores) vs 1 worker (1 core).

We'll see how it goes with stage 2 tomorrow.

lycorn 2013-10-31 12:06

[QUOTE=TheMawn;357468] It made me wonder how much better, say, one worker with 12 GB is versus two workers with 6 GB. Or how efficient 3 LL's and 1 P-1 @ 12 GB is versus 2 LL's and 2 P-1 @ 6 GB each.[/QUOTE]

From my experience, if you are focused on P-1 work, it is way more efficient to run 2 P-1 @ 6GB than 1 @12 GB (6 GB is already a pretty large amount of memory, and you won´t notice anything special by giving another 6 GB to one single test.
The impact of running P-1 tests in one or two cores on the LL tests running on the others is not significant, on a system with a decently fast (1600MHz+) memory. I mean, the bottleneck effects described on SB/IB/Haswell systems depend more on the number of cores running FFT calculations than on the type of test they are used at (P-1 or LL).
On the other hand, the project is in need of P-1 firepower (or at least it used to be the case, I think it still is), so if I were you, I would be running 2 LL tests and 2 P-1 @ 6 GB each (basically allocate 12 GB to Prime95, assign two cores to P-1 - one test on each core - and let the system choose the amount of memory to be allocated to each one).
My 2 cents...:smile:

chalsall 2013-10-31 18:53

[QUOTE=lycorn;358015]On the other hand, the project is in need of P-1 firepower (or at least it used to be the case, I think it still is)...[/QUOTE]

Actually, thanks to Carl's GPU P-1 program, this is no longer the case.

In fact, while TFing to 74 is keeping ahead of the LLing (we're gaining approximately one day of lead time for every ten days of real-time), we're currently having a bit of trouble keeping up with the P-1ing.

At the moment, occasionally candidates are released for P-1'ing at "only" 73 bits. Not the end of the world; they are still kept for TFing to 74 if a factor isn't found.

But, to put on the table, those using Carl's GPU program to P-1, it would help the project if you did at least one TF run to 74 (or two to 73 (or four to 72)) in parallel for every P-1 run done.

Not in any way trying to take away from Carl's excellent work. And, as always, everyone is free to do whatever they want with their own kit/time/money. This is simply a meta perspective of our current situation.

Mark Rose 2013-10-31 21:12

It would be excellent if that information were posted in a single place on some page: a simple table to say if you have X kind of hardware to use, put it to Y use.

chalsall 2013-10-31 22:05

[QUOTE=Mark Rose;358069]It would be excellent if that information were posted in a single place on some page: a simple table to say if you have X kind of hardware to use, put it to Y use.[/QUOTE]

Yeah... You're correct. Sorry...

The truth is we are all volunteers here. And most of us also have "real-work" to take care of (which pays for our hobbies, such as this). We tend not to be that good at documentation (we find it boring).

In the short form, if you have a GPU, do TF'ing (if you're willing). It's the best thing for GIMPS at the moment (subject to correction or clarification by other very knowledgeable observers).

LaurV 2013-11-01 02:58

[QUOTE=chalsall;358054]
At the moment, occasionally candidates are released for P-1'ing at "only" 73 bits. Not the end of the world; they are still kept for TFing to 74 if a factor isn't found.[/QUOTE]
And why is this bad? In fact, playing a bit with the pencil and the eraser, it seems that the "breaking point" between P-1 and TF for a gtx580 for ~63M-65M range would be around 73.7 bits or so. Therefore the optimum path would be
1. TF to 73
2. P-1 to B1=590K, B2~=13M (or B1=B2=4M9)
3. TF to 74.

Remember, George himself recommended "TF to x bits, then P-1, then TF to x+1 bits". At that time, before GPU step into the game, x was much lower, but the philosophy is the same, that time it was CPU against CPU, now it is GPU against GPU, so assuming your GPU can do both TF [U]and[/U] P-1, most probably you will be better if you do the "last bit" after P-1. Here highly depends on the GPU you have. Remark that I "carefully" selected the B1 and B2 above, for a chance of finding a factor to about 4%, in this case the step 2 and step 3 in the "algorithm" would have almost the same chance to eliminate an exponent in an almost identical (fixed) period of time, but with step 2 slightly better. For other cards where FFT is not so fast, step 2 can be much slower in eliminating factors compared with step 3.

So, at the end, is a matter of the system you have, and a matter of how mfaktc.exe evolves against cudaPm1.exe in the following period :razz:

Mark Rose 2013-11-01 03:11

[QUOTE=chalsall;358075]Yeah... You're correct. Sorry...

The truth is we are all volunteers here. And most of us also have "real-work" to take care of (which pays for our hobbies, such as this). We tend not to be that good at documentation (we find it boring).[/QUOTE]

I understand completely.

[QUOTE]
In the short form, if you have a GPU, do TF'ing (if you're willing). It's the best thing for GIMPS at the moment (subject to correction or clarification by other very knowledgeable observers).[/QUOTE]

Thanks!

And I do: 9,546 GHz-days TF in the last month according to GPU72 :)

Prime95 2013-11-01 14:05

[QUOTE=LaurV;358100]Remember, George himself recommended "TF to x bits, then P-1, then TF to x+1 bits". At that time, before GPU step into the game, x was much lower, but the philosophy is the same, that time it was CPU against CPU, now it is GPU against GPU, so assuming your GPU can do both TF [U]and[/U] P-1, most probably you will be better if you do the "last bit" after P-1[/QUOTE]

That advice may not hold for the GPU. The breakeven point depends on the relative speed of TF and P-1 on the GPU. Choose your next task (P-1 or TF) based on which will find factors more efficiently.

Say the P-1 job with 4% chance of success takes 8 hours -- that's one factor every 8.33 days. Say TF to the next bit level takes 2 hours -- that's one factor every 74/12 = 6.17 days. So TF to next bit level is best.

Plug in you GPU's actual P-1 and TF times to reach the correct conclusion for your own GPU.


BTW, this advice can be superceeded by other factors. If GIMPS as a whole has a shortage of P-1 GPUs and an excess of TF GPUs, then it is better to do the next TF level as it reduces the amount of P-1 work to be done.


All times are UTC. The time now is 22:41.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.