![]() |
|
|
#12 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
41708 Posts |
Scaling from two cores up with P-1 is terrible on P95
|
|
|
|
|
|
#13 |
|
May 2013
East. Always East.
11·157 Posts |
I was actually wondering about something similar myself. My system has a fast processor and 16 GB of RAM, of which 14GB is basically sitting doing absolutely nothing. Depending on the efficiency of P-1 versus LL or DC, I might consider switching a worker to P-1. It made me wonder how much better, say, one worker with 12 GB is versus two workers with 6 GB. Or how efficient 3 LL's and 1 P-1 @ 12 GB is versus 2 LL's and 2 P-1 @ 6 GB each.
|
|
|
|
|
|
#14 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23·271 Posts |
What I mean(and maybe you already got that) is that running 1 P-1 on 4 cores is bad compared to 4 P-1 on 4 cores.
|
|
|
|
|
|
#15 |
|
"/X\(‘-‘)/X\"
Jan 2013
2·5·293 Posts |
In stage 1, I'm getting ~10% less per-worker performance when running 4 workers (4 cores) vs 1 worker (1 core).
We'll see how it goes with stage 2 tomorrow. Last fiddled with by Mark Rose on 2013-10-26 at 01:49 |
|
|
|
|
|
#16 | |
|
"GIMFS"
Sep 2002
Oeiras, Portugal
2·11·67 Posts |
Quote:
The impact of running P-1 tests in one or two cores on the LL tests running on the others is not significant, on a system with a decently fast (1600MHz+) memory. I mean, the bottleneck effects described on SB/IB/Haswell systems depend more on the number of cores running FFT calculations than on the type of test they are used at (P-1 or LL). On the other hand, the project is in need of P-1 firepower (or at least it used to be the case, I think it still is), so if I were you, I would be running 2 LL tests and 2 P-1 @ 6 GB each (basically allocate 12 GB to Prime95, assign two cores to P-1 - one test on each core - and let the system choose the amount of memory to be allocated to each one). My 2 cents...
|
|
|
|
|
|
|
#17 | |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
2·67·73 Posts |
Quote:
In fact, while TFing to 74 is keeping ahead of the LLing (we're gaining approximately one day of lead time for every ten days of real-time), we're currently having a bit of trouble keeping up with the P-1ing. At the moment, occasionally candidates are released for P-1'ing at "only" 73 bits. Not the end of the world; they are still kept for TFing to 74 if a factor isn't found. But, to put on the table, those using Carl's GPU program to P-1, it would help the project if you did at least one TF run to 74 (or two to 73 (or four to 72)) in parallel for every P-1 run done. Not in any way trying to take away from Carl's excellent work. And, as always, everyone is free to do whatever they want with their own kit/time/money. This is simply a meta perspective of our current situation. Last fiddled with by chalsall on 2013-10-31 at 19:35 Reason: Smelling mistake. |
|
|
|
|
|
|
#18 |
|
"/X\(‘-‘)/X\"
Jan 2013
2·5·293 Posts |
It would be excellent if that information were posted in a single place on some page: a simple table to say if you have X kind of hardware to use, put it to Y use.
|
|
|
|
|
|
#19 | |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
2×67×73 Posts |
Quote:
The truth is we are all volunteers here. And most of us also have "real-work" to take care of (which pays for our hobbies, such as this). We tend not to be that good at documentation (we find it boring). In the short form, if you have a GPU, do TF'ing (if you're willing). It's the best thing for GIMPS at the moment (subject to correction or clarification by other very knowledgeable observers). |
|
|
|
|
|
|
#20 | |
|
Romulan Interpreter
Jun 2011
Thailand
26·151 Posts |
Quote:
1. TF to 73 2. P-1 to B1=590K, B2~=13M (or B1=B2=4M9) 3. TF to 74. Remember, George himself recommended "TF to x bits, then P-1, then TF to x+1 bits". At that time, before GPU step into the game, x was much lower, but the philosophy is the same, that time it was CPU against CPU, now it is GPU against GPU, so assuming your GPU can do both TF and P-1, most probably you will be better if you do the "last bit" after P-1. Here highly depends on the GPU you have. Remark that I "carefully" selected the B1 and B2 above, for a chance of finding a factor to about 4%, in this case the step 2 and step 3 in the "algorithm" would have almost the same chance to eliminate an exponent in an almost identical (fixed) period of time, but with step 2 slightly better. For other cards where FFT is not so fast, step 2 can be much slower in eliminating factors compared with step 3. So, at the end, is a matter of the system you have, and a matter of how mfaktc.exe evolves against cudaPm1.exe in the following period
|
|
|
|
|
|
|
#21 | ||
|
"/X\(‘-‘)/X\"
Jan 2013
1011011100102 Posts |
Quote:
Quote:
And I do: 9,546 GHz-days TF in the last month according to GPU72 :) |
||
|
|
|
|
|
#22 | |
|
P90 years forever!
Aug 2002
Yeehaw, FL
19×397 Posts |
Quote:
Say the P-1 job with 4% chance of success takes 8 hours -- that's one factor every 8.33 days. Say TF to the next bit level takes 2 hours -- that's one factor every 74/12 = 6.17 days. So TF to next bit level is best. Plug in you GPU's actual P-1 and TF times to reach the correct conclusion for your own GPU. BTW, this advice can be superceeded by other factors. If GIMPS as a whole has a shortage of P-1 GPUs and an excess of TF GPUs, then it is better to do the next TF level as it reduces the amount of P-1 work to be done. |
|
|
|
|