mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > PrimeNet

Reply
 
Thread Tools
Old 2013-10-26, 00:37   #12
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

41708 Posts
Default

Scaling from two cores up with P-1 is terrible on P95
kracker is offline   Reply With Quote
Old 2013-10-26, 01:05   #13
TheMawn
 
TheMawn's Avatar
 
May 2013
East. Always East.

11·157 Posts
Default

I was actually wondering about something similar myself. My system has a fast processor and 16 GB of RAM, of which 14GB is basically sitting doing absolutely nothing. Depending on the efficiency of P-1 versus LL or DC, I might consider switching a worker to P-1. It made me wonder how much better, say, one worker with 12 GB is versus two workers with 6 GB. Or how efficient 3 LL's and 1 P-1 @ 12 GB is versus 2 LL's and 2 P-1 @ 6 GB each.
TheMawn is offline   Reply With Quote
Old 2013-10-26, 01:34   #14
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23·271 Posts
Default

What I mean(and maybe you already got that) is that running 1 P-1 on 4 cores is bad compared to 4 P-1 on 4 cores.
kracker is offline   Reply With Quote
Old 2013-10-26, 01:48   #15
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

2·5·293 Posts
Default

In stage 1, I'm getting ~10% less per-worker performance when running 4 workers (4 cores) vs 1 worker (1 core).

We'll see how it goes with stage 2 tomorrow.

Last fiddled with by Mark Rose on 2013-10-26 at 01:49
Mark Rose is offline   Reply With Quote
Old 2013-10-31, 12:06   #16
lycorn
 
lycorn's Avatar
 
"GIMFS"
Sep 2002
Oeiras, Portugal

2·11·67 Posts
Default

Quote:
Originally Posted by TheMawn View Post
It made me wonder how much better, say, one worker with 12 GB is versus two workers with 6 GB. Or how efficient 3 LL's and 1 P-1 @ 12 GB is versus 2 LL's and 2 P-1 @ 6 GB each.
From my experience, if you are focused on P-1 work, it is way more efficient to run 2 P-1 @ 6GB than 1 @12 GB (6 GB is already a pretty large amount of memory, and you won´t notice anything special by giving another 6 GB to one single test.
The impact of running P-1 tests in one or two cores on the LL tests running on the others is not significant, on a system with a decently fast (1600MHz+) memory. I mean, the bottleneck effects described on SB/IB/Haswell systems depend more on the number of cores running FFT calculations than on the type of test they are used at (P-1 or LL).
On the other hand, the project is in need of P-1 firepower (or at least it used to be the case, I think it still is), so if I were you, I would be running 2 LL tests and 2 P-1 @ 6 GB each (basically allocate 12 GB to Prime95, assign two cores to P-1 - one test on each core - and let the system choose the amount of memory to be allocated to each one).
My 2 cents...
lycorn is offline   Reply With Quote
Old 2013-10-31, 18:53   #17
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

2·67·73 Posts
Default

Quote:
Originally Posted by lycorn View Post
On the other hand, the project is in need of P-1 firepower (or at least it used to be the case, I think it still is)...
Actually, thanks to Carl's GPU P-1 program, this is no longer the case.

In fact, while TFing to 74 is keeping ahead of the LLing (we're gaining approximately one day of lead time for every ten days of real-time), we're currently having a bit of trouble keeping up with the P-1ing.

At the moment, occasionally candidates are released for P-1'ing at "only" 73 bits. Not the end of the world; they are still kept for TFing to 74 if a factor isn't found.

But, to put on the table, those using Carl's GPU program to P-1, it would help the project if you did at least one TF run to 74 (or two to 73 (or four to 72)) in parallel for every P-1 run done.

Not in any way trying to take away from Carl's excellent work. And, as always, everyone is free to do whatever they want with their own kit/time/money. This is simply a meta perspective of our current situation.

Last fiddled with by chalsall on 2013-10-31 at 19:35 Reason: Smelling mistake.
chalsall is online now   Reply With Quote
Old 2013-10-31, 21:12   #18
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

2·5·293 Posts
Default

It would be excellent if that information were posted in a single place on some page: a simple table to say if you have X kind of hardware to use, put it to Y use.
Mark Rose is offline   Reply With Quote
Old 2013-10-31, 22:05   #19
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

2×67×73 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
It would be excellent if that information were posted in a single place on some page: a simple table to say if you have X kind of hardware to use, put it to Y use.
Yeah... You're correct. Sorry...

The truth is we are all volunteers here. And most of us also have "real-work" to take care of (which pays for our hobbies, such as this). We tend not to be that good at documentation (we find it boring).

In the short form, if you have a GPU, do TF'ing (if you're willing). It's the best thing for GIMPS at the moment (subject to correction or clarification by other very knowledgeable observers).
chalsall is online now   Reply With Quote
Old 2013-11-01, 02:58   #20
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

26·151 Posts
Default

Quote:
Originally Posted by chalsall View Post
At the moment, occasionally candidates are released for P-1'ing at "only" 73 bits. Not the end of the world; they are still kept for TFing to 74 if a factor isn't found.
And why is this bad? In fact, playing a bit with the pencil and the eraser, it seems that the "breaking point" between P-1 and TF for a gtx580 for ~63M-65M range would be around 73.7 bits or so. Therefore the optimum path would be
1. TF to 73
2. P-1 to B1=590K, B2~=13M (or B1=B2=4M9)
3. TF to 74.

Remember, George himself recommended "TF to x bits, then P-1, then TF to x+1 bits". At that time, before GPU step into the game, x was much lower, but the philosophy is the same, that time it was CPU against CPU, now it is GPU against GPU, so assuming your GPU can do both TF and P-1, most probably you will be better if you do the "last bit" after P-1. Here highly depends on the GPU you have. Remark that I "carefully" selected the B1 and B2 above, for a chance of finding a factor to about 4%, in this case the step 2 and step 3 in the "algorithm" would have almost the same chance to eliminate an exponent in an almost identical (fixed) period of time, but with step 2 slightly better. For other cards where FFT is not so fast, step 2 can be much slower in eliminating factors compared with step 3.

So, at the end, is a matter of the system you have, and a matter of how mfaktc.exe evolves against cudaPm1.exe in the following period
LaurV is offline   Reply With Quote
Old 2013-11-01, 03:11   #21
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

1011011100102 Posts
Default

Quote:
Originally Posted by chalsall View Post
Yeah... You're correct. Sorry...

The truth is we are all volunteers here. And most of us also have "real-work" to take care of (which pays for our hobbies, such as this). We tend not to be that good at documentation (we find it boring).
I understand completely.

Quote:
In the short form, if you have a GPU, do TF'ing (if you're willing). It's the best thing for GIMPS at the moment (subject to correction or clarification by other very knowledgeable observers).
Thanks!

And I do: 9,546 GHz-days TF in the last month according to GPU72 :)
Mark Rose is offline   Reply With Quote
Old 2013-11-01, 14:05   #22
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

19×397 Posts
Default

Quote:
Originally Posted by LaurV View Post
Remember, George himself recommended "TF to x bits, then P-1, then TF to x+1 bits". At that time, before GPU step into the game, x was much lower, but the philosophy is the same, that time it was CPU against CPU, now it is GPU against GPU, so assuming your GPU can do both TF and P-1, most probably you will be better if you do the "last bit" after P-1
That advice may not hold for the GPU. The breakeven point depends on the relative speed of TF and P-1 on the GPU. Choose your next task (P-1 or TF) based on which will find factors more efficiently.

Say the P-1 job with 4% chance of success takes 8 hours -- that's one factor every 8.33 days. Say TF to the next bit level takes 2 hours -- that's one factor every 74/12 = 6.17 days. So TF to next bit level is best.

Plug in you GPU's actual P-1 and TF times to reach the correct conclusion for your own GPU.


BTW, this advice can be superceeded by other factors. If GIMPS as a whole has a shortage of P-1 GPUs and an excess of TF GPUs, then it is better to do the next TF level as it reduces the amount of P-1 work to be done.
Prime95 is offline   Reply With Quote
Reply



All times are UTC. The time now is 21:54.


Fri Aug 6 21:54:31 UTC 2021 up 14 days, 16:23, 1 user, load averages: 3.58, 2.85, 2.63

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.