mersenneforum.org > Data Deep dive TF
 Register FAQ Search Today's Posts Mark Forums Read

 2019-01-08, 02:51 #1 kriesel     "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 163068 Posts Deep dive TF Hi, I'm looking for a few intrepid volunteers, or more than a few, to take some scattered but strategically placed exponents to up to the full GPUto72 individual goal bit levels, or toward them, well distributed from 100 million to the mersenne.org upper limit of a billion. You can think of this as creating full-TF islands above the prevailing water line of bulk TF effort. The main purpose is to provide some already TF complete candidates for P-1 factoring software testing. Whoever does the TF gets the computing credit and credit for any factors found along the way. Reservation of exponents is highly recommended, and reasonably prompt completion. Consider the exponents from 100 million to 101 million as a bin. Strategic TF would focus in the first 1000, that is, 100,000,000 to 100,001,000, and aim for completing TF on at least two that have no P-1 or primality test done and with no factor found by TF, so in need of P-1 and perhaps primality testing. The known location within the bin is to make it easy to find them via https://www.mersenne.org/manual_gpu_assignment/ or https://www.mersenne.org/report_exponent/ It would be good to have multiple close spaced bins each containing an island of two or more full-depth TF exponents without P-1 result or primality result. Having multiple closely spaced bins allows using one set of well spaced bins to test CUDAPm1, and another well spaced set prime95, another gpuOwL PRP-1 or any future Preda P-1, etc. Also, occasionally software has an issue in a small range of exponents but is ok on either side of the trouble spot. (I've seen CUDAPm1 have trouble with one gpu but not another, even sometimes the same model, at 84M, 128M, and 171M.) Having more than one fully TF completed exponent per island is insurance against finding a stage 1 factor and so being unable to test stage two, and could act as a spare in case of a nearby island being a trouble spot for one of the applications. At the same time, staggering the bins a bit between applications or versions means a slightly wider distribution of exponents tested. For example, and from now on giving bin identifications as millions (for example 100 instead of 100 million): P95 owl CUDAPm1 100 101 102 103 120 121 122 123 150 151 152 153 200 201 202 203 250 251 252 253 300 301 302 303 350 351 352 353 400 401 402 403 500 501 452 453 600 601 700 701 800 801 900 901 After running CUDAPm1 on a given gpu model on several widely spaced exponents (which are usually chosen spaced about 50M or 100M apart so they plot nicely), often, as in CUDAPm1 v0.20, I find some exponents can not be run successfully to completion on a given gpu or any gpu. Then I start doing a binary search to see what the limits are. Closer spaced islands later would be useful for that. When I need to TF-qualify the exponents, it really slows down the testing of P-1 limits since I'm using the same gpus for both. I'm getting ready to start the testing and limit mapping of several gpu models on CUDAPm1 v0.22, and am started testing in prime95. Any helpful TF island building would be appreciated. The end result is tabulation of run times and plotting of scaling, and documentation of limits, NRP trends, software issues encountered, run time scaling, etc, as in https://www.mersenneforum.org/showthread.php?t=23389 Users like mikr and rudimeier have already done some of this deeper TF at the front of a million bin. It is very useful when prequalifying a few exponents for P-1 software testing on high exponents on gpu by finishing them myself to gputo72 factoring goal levels. Thank you to the pioneers who have already done some of this, to or near primenet goal bit levels several years ago, for example. The higher ones will represent a considerable amount of total work per exponent. A few examples of computing effort per exponent to full GPUto72 TF depth: https://www.mersenne.ca/exponent/101000117 114 GhzD to go to full gputo72 bit level (76) https://www.mersenne.ca/exponent/171000043 346 GhzD to go to full gputo72 bit level (78) https://www.mersenne.ca/exponent/371000039 1.2 ThzD to go to full gputo72 bit level (81) https://www.mersenne.ca/exponent/919000001 8.4 ThzD to go to full gputo72 bit level (85) https://www.mersenne.ca/exponent/999000061 15.6 ThzD to go to full gputo72 bit level (86)
 2019-01-08, 04:26 #2 potonono     Jun 2005 USA, IL 193 Posts I can volunteer some TF, but I'm not sure about what bit levels any particular range should be taken to. Are the 'full GPUto72 individual goal bit levels' posted somewhere?
2019-01-08, 05:19   #3
petrw1
1976 Toyota Corona years forever!

"Wayne"
Nov 2006

122338 Posts

Quote:
 Originally Posted by potonono I can volunteer some TF, but I'm not sure about what bit levels any particular range should be taken to. Are the 'full GPUto72 individual goal bit levels' posted somewhere?
I understand it to be the yellow boundary line here:
https://www.mersenne.ca/status/tf/0/0/1/0

Click on any line to drill down for finer limits.

 2019-01-08, 05:25 #4 Uncwilly 6809 > 6502     """"""""""""""""""" Aug 2003 101×103 Posts 10,891 Posts James H had posted a chart a while back (based upon perfomance data). That and Chris mentioned that GPU's should do about 3 bits deeper than Prime95's default. See this post of mine and James' response: https://mersenneforum.org/showthread.php?p=389094 and https://mersenneforum.org/showthread.php?p=490542
2019-01-08, 05:50   #5
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·29·127 Posts

Quote:
 Originally Posted by potonono I can volunteer some TF, but I'm not sure about what bit levels any particular range should be taken to. Are the 'full GPUto72 individual goal bit levels' posted somewhere?
Super! What I usually go by is individual lookups, since it gives both start and end level for the particular exponent, as well as whether LL, PRP, or P-1 have been run or assigned yet, for example:

https://www.mersenne.ca/exponent/101000117 after

https://www.mersenne.org/report_expo...1001000&full=1
One must be careful about mersenne.ca for current status, since it lags a bit until it syncs overnight from mersenne.org

If you're just after the TF level to go up to, going by the red curve for first LL on charts like https://www.mersenne.ca/cudalucas.ph...=100&mmax=1000 is not bad. You can get to those by clicking on any gpu in the list at https://www.mersenne.ca/cudalucas.php, and the low and high exponent limits are 50M to 300M by default but can be adjusted as shown in the URL above.

Or, I suppose I could add a target TF column. Lots of choices.

TFH P95 owl CUDAPm1
76 100 101 102 103
77 120 121 122 123
77 150 151 152 153
79 200 201 202 203
79 250 251 252 253
80 300 301 302 303
81 350 351 352 353
81 400 401 402 403
82 500 501 452 453
83 600 601
84 700 701
85 800 801
85 900 901

Last fiddled with by kriesel on 2019-01-08 at 06:29

 2019-01-09, 04:27 #6 potonono     Jun 2005 USA, IL 3018 Posts Thanks for the links and list everyone. Yes, that makes sense. Will it help your efforts more to work on any specific bins first, like smallest to largest, or just anything as available?
 2019-01-09, 05:27 #7 LaurV Romulan Interpreter     "name field" Jun 2011 Thailand 22·7·367 Posts Make an worktodo file (list of exponents with bitlevels, which I can summarily edit and paste to my rig) and pass it to me.
2019-01-09, 11:58   #8
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

1CC616 Posts

Quote:
 Originally Posted by potonono Thanks for the links and list everyone. Yes, that makes sense. Will it help your efforts more to work on any specific bins first, like smallest to largest, or just anything as available?
I suggest pick a column or portion of a column, describe it unambiguously in a post here so others can choose differently, and do an exponent per bin smallest first, then repeat for second exponent per bin.
I usually go from small to large, in the first part of testing, because it gives a quick feel for scaling and more rapidly and efficiently explores limits.
TF in the same order seems like it would work well along with that.
Examples of description: "entire left cudapm1 column"; "gpu column up to 401"; "p95 column 400 to 900"; an actual exponent list would work too.
I am running testing on different applications on different gear in parallel;
prime95 on cpus, gpuowl on AMD gpus, CUDAPm1 on NVIDIA gpus. So no particular priority between columns.

2019-01-09, 12:26   #9
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·29·127 Posts

Quote:
 Originally Posted by LaurV Make a worktodo file (list of exponents with bitlevels, which I can summarily edit and paste to my rig) and pass it to me.
How about for the cudaPm1 right column,
Factor=153000277,75,77
Factor=153000349,75,77
Factor=203000101,73,79
Factor=203000117,73,79

Factor=253000937,76,79
Factor=303000119,70,80
Factor=353000047,77,81
Factor=403000067,71,81
Factor=453000013,78,82

Factor=253000079,74,79
Factor=303000227,70,80
Factor=353000101,72,81
Factor=403000069,71,81
Factor=453000029,71,82
(Please reserve them to avoid duplications especially at 70 or 71 bit starting points)

Last fiddled with by kriesel on 2019-01-09 at 12:27

2019-01-09, 15:49   #10
chalsall
If I May

"Chris Halsall"
Sep 2002

1107410 Posts

Quote:
 Originally Posted by Uncwilly James H had posted a chart a while back (based upon perfomance data). That and Chris mentioned that GPU's should do about 3 bits deeper than Prime95's default.
Actually, the "GPUto72 individual goal bit levels" phrase is giving credit where it's not due.

GPU72's targets are guided by James' "economic cross-over" analysis, which has been peer reviewed by many very knowledgeable people.

The exact "optimal" TF'ing depth is a function of the range (candidate size) and the particular card's abilities (specifically, the "compute version"). For example, a RTX 2080 Ti (c.v. 7.5) should TF deeper than a GTX 580 (c.v. 2.0).

Please keep in mind that James' analysis is based on comparing what will "clear" a candidate faster (using statistical heuristics) ***using the same kit*** running either mfaktc vs a CUDA LL'er. Note that some TF (slightly) beyond the optimal economic cross-over point because they just like finding factors, or can't be bothered to switch between the different software.

 2019-01-09, 15:53 #11 Uncwilly 6809 > 6502     """"""""""""""""""" Aug 2003 101×103 Posts 10,891 Posts I suggest that you use James H's worktodo.txt balancer. Try to make each chunk posted as close as possible to the same GHz-days. Here is what it looks like as balanced as it can be: Code: [Worker #1] Factor=353000101,72,81 Factor=453000013,78,82 Factor=453000029,71,82 [Worker #2] Factor=153000349,75,77 Factor=253000937,76,79 Factor=203000117,73,79 Factor=303000227,70,80 Factor=403000069,71,81 Factor=353000047,77,81 [Worker #3] Factor=153000277,75,77 Factor=253000079,74,79 Factor=203000101,73,79 Factor=303000119,70,80 Factor=403000067,71,81 This breaks down to: • Worker #1 = 5,572.802 GHz-days • Worker #2 = 4,489.192 GHz-days • Worker #3 = 3,233.924 GHz-days No one has to buy a whole thing, you can just reprocess it for the next batch.

 Similar Threads Thread Thread Starter Forum Replies Last Post cheesehead Science & Technology 47 2014-12-14 13:45 diep Math 5 2012-10-05 17:44 MercPrime Software 22 2009-01-13 20:10 lavalamp Open Projects 53 2008-12-01 03:59 ixfd64 Lounge 5 2005-07-06 13:46

All times are UTC. The time now is 09:23.

Tue Jan 31 09:23:11 UTC 2023 up 166 days, 6:51, 0 users, load averages: 1.54, 1.58, 1.28