![]() |
|
|
#34 | |
|
Sep 2003
5·11·47 Posts |
Quote:
Like I said earlier, for any given Mersenne factor, you can figure out what B1 and B2 bounds would have found that factor by P−1 testing. Some large factors are easy to find by P−1, some small ones are hard or impossible to find by P−1, it all depends on the individual factor. Formulate a precise question and you might get a precise answer. |
|
|
|
|
|
|
#35 | |
|
"/X\(‘-‘)/X\"
Jan 2013
22·733 Posts |
Quote:
|
|
|
|
|
|
|
#36 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2×7×383 Posts |
As far as I can determine, it's not primenet doing the B1, B2, d, e or NRP determination and dictating to the applications, it's most applications optimizing the bounds and other parameters, unless specified by the user, and the applications afterward telling primenet in the results record what parameters were selected and used.
The applications, mprime, prime95, CUDAPm1 (but not gpuowl v5.0's PRP-1), unless the user specifies otherwise, try to optimize the probable savings in total computing time for the exponent, based on computed probabilities over combinations of many B1 values and several B2 values, of finding a P-1 factor, for
From experiments with prime95, with somewhat larger exponents, it appears that optimization calculation occurs also during prime95 Test Status output generation, which shows considerable lag for P-1 work compared to other computation types. It appears there's no caching of previous computation of the optimal P-1 bounds. In my experience prime95 status output without a stack of P-1 work assignment is essentially instantaneous, while this example attached takes 5 seconds, even immediately after a preceding one. With larger P-1 exponents or more P-1 assignments (deeper work caching or more complete dedication of a system to P-1 work than the 1/4 in my example) I think that 5 seconds will increase. prime95.log: Code:
Got assignment [aid redacted]: P-1 M89787821 Sending expected completion date for M89787821: Dec 05 2018 ... [Thu Dec 06 09:17:24 2018 - ver 29.4] Sending result to server: UID: Kriesel/emu, M89787821 completed P-1, B1=730000, B2=14782500, E=12, Wg4: 123E2311, AID: redacted PrimeNet success code with additional info: CPU credit is 7.3113 GHz-days. Code:
Pfactor=[aid],1,2,89794319,-1,76,2 It's there to read in the source codes also. CUDAPm1 example: worktodo entry from manual assignment: Code:
PFactor=[aid],1,2,292000031,-1,81,2 Code:
CUDAPm1 v0.20 ------- DEVICE 1 ------- name GeForce GTX 480 Compatibility 2.0 clockRate (MHz) 1401 memClockRate (MHz) 1848 totalGlobalMem zu totalConstMem zu l2CacheSize 786432 sharedMemPerBlock zu regsPerBlock 32768 warpSize 32 memPitch zu maxThreadsPerBlock 1024 maxThreadsPerMP 1536 multiProcessorCount 15 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 65535,65535,65535 textureAlignment zu deviceOverlap 1 CUDA reports 1426M of 1536M GPU memory free. Index 91 Using threads: norm1 256, mult 128, norm2 32. Using up to 1408M GPU memory. Selected B1=1830000, B2=9607500, 2.39% chance of finding a factor Starting stage 1 P-1, M292000031, B1 = 1830000, B2 = 9607500, fft length = 16384K GPUOwL's PRP-1 implementation is a bit different approach, and requires user selection of B1. It defaults to B2=p but allows other B2 to be user specified. See https://www.mersenneforum.org/showth...=22204&page=70, posts 765-767 for Preda's description of gpuowl v5.0 P-1 handling. (See posts 694-706 for his earlier B1-only development; https://www.mersenneforum.org/showth...=22204&page=64.) (Code authors are welcome to weigh in re any errors, omissions, nuances etc.) Last fiddled with by kriesel on 2018-12-07 at 16:47 |
|
|
|
|
|
#37 |
|
Jul 2018
3110 Posts |
Personally, regarding TF vs. P-1: I find with my hardware that, in terms of maximizing d(probability of getting a factor)/dt, I should not TF to a higher level than around 74 bits. For exponents near 90M, a given one of my cards takes about a half hour to run through 73-74 bits, with success probability ~1.35%. That same card can do a P-1 with about 3.6% probability of success (using whatever bounds the software defaults to) in an hour and a half, thrice the time. Going to 75 would be too much. So if I want to maximize my factors-found per time in a range near 90M that has already been TF'd to 74 bits or more, then I should do P-1 work. In that sense, it's possible 76 bits is too high... on the other hand, my cards have a lot of memory, which probably pushes the TF/P-1 boundary down somewhat. But also d(probability of success under default params given available memory)/d(available memory) is not that big -- I don't know enough now about what the requirements are and how p(success) varies with B1, B2 to say.
In terms of optimal work reduction, I think that how many factors TF might find that P-1 would miss is not as important as the per-time probability of finding a factor. I think you could take this as a multi-armed bandit problem where each action is a pair (factoring method, device) that has some time cost and some factor-probability reward. It's somewhat complicated by that failure to find a factor for a given exponent also returns a small amount of information ("no factors under 2^75") which influences the future factor-probability estimate for a given (method, device) on that exponent. (Not that this makes the allocation problem easier, but there is a framework one could use to analyze it at least...) Of course, optimal work reduction isn't the only metric; one might be interested in e.g. maximizing coverage in a given range, in which case the best strategy might be different but probably this modeling approach would still find it. One might also be interested in maximizing the rate of Mersenne prime yield, which might also involve admitting "LL" as an action. Hopefully current "economic cross-over point" analysis matches whatever this would come up with. Last fiddled with by penlu on 2018-12-08 at 07:17 |
|
|
|
|
|
#38 | |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
100110000000102 Posts |
Quote:
You are in a somewhat unique situation as far as you are willing and able to target your "firepower" optimally. Few are as focused as to the optimality of deployment of cycles. Primenet and GPU72 are somewhat constrained as to what they can assign for optimal throughput because each user tends to fetch only a single type of work for each of their kit. To put on the table, we are currently over-powered with (GPU) TF'ing and (CPU and GPU) P-1'ing; we are years ahead of the LL'ers. What will come soon is the time to LLTF to 77 "bits". But possibly only after a P-1 run. Any advise anyone has with regards how to optimally manage this would be most welcomed. |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| TF bit level | davieddy | Puzzles | 71 | 2013-12-22 07:26 |
| Probability of TF per bit level | James Heinrich | PrimeNet | 11 | 2011-01-26 20:07 |
| Expiring policy for V5 Server | lycorn | PrimeNet | 16 | 2008-10-12 22:35 |
| k=5 and policy on reservations | gd_barnes | Riesel Prime Search | 33 | 2007-10-14 07:46 |
| Reservation policy | gd_barnes | Riesel Prime Search | 6 | 2007-10-01 18:52 |