![]() |
[QUOTE=Prime95;389040]A prototype of the PrimeNet web page is: [URL]http://mersenne.org/manual_gpu_assignment/[/URL]
[/QUOTE] Suggestion for change: Preferred work range: [I]"[/I]TF for 100M[B] digits [/B]exponents". |
[QUOTE=Mark Rose;389033]
Until recently, AMD cards had a penalty factoring beyond 73 bits (I don't know if the version of mfakto has been released with the newer kernel). [/QUOTE] I wanted to release this version before LaurV passes me in the overall GPU72 stats, but I lost that race :down: AMD GCN cards will still have some slowdown from 73 to 74, and another one from 74 to 75 bits. And there's one from 70 to 71. And from 82 to 83. And ... It's a bit like LL-tests and different FFT lengths: bigger problems require bigger effort. The "penalty" comes from the GHz-days calculation that does not take that into account for TF - and it would be hard to do that, given the hardware differences. [QUOTE=Prime95;389040] A prototype of the PrimeNet web page is: [URL]http://mersenne.org/manual_gpu_assignment/[/URL] Feel free to click on getting assignments, it will display work without making any real reservations. Note that the assignments returned is not what you'll eventually get since GPU72 has nearly all the relevant DC and LL exponents reserved.[/QUOTE] Seeing that the AMD cards are not available in the "GPU info" selection makes me think if it is really meaningful what card is running the TF. If so, then some (imaginary) hardware that cannot run any type of LL-tests would need to do TF up to bitlevel = exponent -1? Or put differently, why care about the GPU-specific crossover point if LL is done on different hardware anyway? Even if all TF was done on a single type of GPU, the crossover point is meaningless compared to the ratio of LL-power vs. TF-power that the folks are willing to spend. At least, until primenet also balances the worktype by hardware (TF vs. LL, or even P-1, for that matter), which some contributors may dislike. I think, the [B]average [/B]crossover point can only be taken as a general guidance when suggesting to switch cards between TF and LL. Preferrably, the cards with a lower crossover point switch to LL first, but it's a user decision. The resulting firepower should decide about how far to factor - quite independent of the GPU type. Using the "GPU info" for considering the above-mentioned steps in the TF efficiency could be useful but hard to implement and maintain. |
[QUOTE=Bdot;389056]
Seeing that the AMD cards are not available in the "GPU info" selection makes me think if it is really meaningful what card is running the TF. If so, then some (imaginary) hardware that cannot run any type of LL-tests would need to do TF up to bitlevel = exponent -1? Or put differently, why care about the GPU-specific crossover point if LL is done on different hardware anyway?[/QUOTE] The only time the GPU info comes into play now is if you select "smallest exponent". Without the GPU info my GTX 570 would have been assigned a M54,xxx,xxx to 2^75. My GPU would be better off doing LLs in the 54M range. With the GPU info, I now get 60M exponents, a GTX770 would get a 65M exponent. Since we are nowhere near having enough GPU firepower, the "what makes sense" choice does not care about the GPU info. If the AMD (of Nvidia) barrett kernels are significantly worse going up one particular bit level then I could modify the SQL query to prefer the faster bit levels for that GPU. |
[QUOTE=Bdot;389056]
Or put differently, why care about the GPU-specific crossover point if LL is done on different hardware anyway? Even if all TF was done on a single type of GPU, the crossover point is meaningless compared to the ratio of LL-power vs. TF-power that the folks are willing to spend. At least, until primenet also balances the worktype by hardware (TF vs. LL, or even P-1, for that matter), which some contributors may dislike. I think, the [B]average [/B]crossover point can only be taken as a general guidance when suggesting to switch cards between TF and LL. Preferrably, the cards with a lower crossover point switch to LL first, but it's a user decision. The resulting firepower should decide about how far to factor - quite independent of the GPU type. [/QUOTE] +1 to this, especially to the last part. |
[QUOTE=Bdot;389056]I wanted to release this version before LaurV passes me in the overall GPU72 stats, but I lost that race :down:
AMD GCN cards will still have some slowdown from 73 to 74, and another one from 74 to 75 bits. And there's one from 70 to 71. And from 82 to 83. And ... It's a bit like LL-tests and different FFT lengths: bigger problems require bigger effort. The "penalty" comes from the GHz-days calculation that does not take that into account for TF - and it would be hard to do that, given the hardware differences. [/quote] How much of it is the GHz-d calculation and how much of it is extra math? I haven't looked very much at mfakto's code. I'm curious. mfaktc's barrett76 kernel needs only 5 32-bit ints and 9 multiplies for a 76 bit x 76 bit product, but the barrett77 kernel requires 6 ints and 12 multiplies for 77 bit x 77 bit product. There's about a 20% drop in performance going from 76 to 77 bits, before taking into account the GHz-d formula penalty for higher bit levels. |
[QUOTE=Prime95;389063]The only time the GPU info comes into play now is if you select "smallest exponent". Without the GPU info my GTX 570 would have been assigned a M54,xxx,xxx to 2^75. My GPU would be better off doing LLs in the 54M range. With the GPU info, I now get 60M exponents, a GTX770 would get a 65M exponent.
[/quote] I think the current menu may actually confuse users whose cards aren't list. For instance, I have two GT 520's and two GT 430's crunching DCTF (I got started with these cards and others will too). Also, all the 4xx and 5xx hardware (except the GT 405), and [i]some[/i] of the GT 6xx and GT 7xxM models have Fermi architecture and will have a similar crossover point. The current list also breaks down for the 7xx series. The GTX 750 (Maxwell) is much worse at LL than the GTX 760/770/780 (Kepler). Would it make more sense to have a list of architectures? Fermi, Kepler, Maxwell, and "Don't know" which aliases to the lowest crossover point or something like that? I actually still have a Tesla architecture card, but it's truly not worth bothering with at this point, getting about 3 GHz-d/d and stealing CPU from mprime since it can't do GPU sieving. [quote]Since we are nowhere near having enough GPU firepower, the "what makes sense" choice does not care about the GPU info. If the AMD (of Nvidia) barrett kernels are significantly worse going up one particular bit level then I could modify the SQL query to prefer the faster bit levels for that GPU.[/QUOTE] There's a big ~20% performance hit beyond 76 bits for all Nvidia cards. |
Whatever you make please remember the users who need to reserve and submit work manually.
:spot: |
[QUOTE=Mark Rose;389079]I think the current menu may actually confuse users whose cards aren't list.
Would it make more sense to have a list of architectures? Fermi, Kepler, Maxwell, and "Don't know" which aliases to the lowest crossover point or something like that? [/quote] That might be a good idea -- or I could delete the gpu info completely and assume the lowest crossovers since 99% of the time the information will not be used. Or see below for where I might use this info more often... [quote]There's a big ~20% performance hit beyond 76 bits for all Nvidia cards.[/QUOTE] 20% may well be worth taking into consideration. This would delay when Primenet starts handing out 76 bit factoring. Am I right in thinking I need 3 different tables: mfaktc, mfakto on GCN, mfakto on non-GCN. For each I need to know the bit levels at which the program uses a slower kernel and how much slower the kernel is than the previous kernel. I need this data for bit levels 69 to ~84. Right now I have one datapoint: mfaktc, 76 bits, 20% slower |
[QUOTE=Prime95;389040]I'm asking about 100M digits or M332192000. I think the chart you refer to is for M100000000.
[/QUOTE] [QUOTE=James Heinrich;389042]Correct. My performance charts are based on a simple 1-dimensional measurement, it does not scale appropriately non-linearly where different bit levels or kernels are invoked. Something I should probably look at in the future, I guess.[/QUOTE] This post contains a chart that may be helpful: [url]http://mersenneforum.org/showpost.php?p=324680&postcount=413[/url] |
[QUOTE=Uncwilly;389094]This post contains a chart that may be helpful[/QUOTE]That has essentially the same information as is shown (different presentation now) on [url=http://www.mersenne.ca/cudalucas.php?model=9]this page[/url], but it's still based on the flawed assumption that all GPUs of any particular compute level perform identically across all bitlevels. I should take into account things like the aforementioned performance drop above 2[sup]76[/sup] but I don't.
Once we gather the stats for various architectures at various bitlevels for George, I'll see about incorporating that new information into my charts and graphs as well. |
[QUOTE=Prime95;389091]That might be a good idea -- or I could delete the gpu info completely and assume the lowest crossovers since 99% of the time the information will not be used. Or see below for where I might use this info more often...
20% may well be worth taking into consideration. This would delay when Primenet starts handing out 76 bit factoring. Am I right in thinking I need 3 different tables: mfaktc, mfakto on GCN, mfakto on non-GCN. For each I need to know the bit levels at which the program uses a slower kernel and how much slower the kernel is than the previous kernel. I need this data for bit levels 69 to ~84. Right now I have one datapoint: mfaktc, 76 bits, 20% slower[/QUOTE] Trial factoring anything up to 76 bits is fast with mfaktc. Trial factoring 77 bits is slower. Here are some GHz-d/day numbers for M39467291 on a GTX 580 (at 1544MHz): 69,70: 426.02 70,71: 426.02 71,72: 424.85 72,73: 424.52 73,74: 424.32 74,75: 424.56 75,76: 424.39 76,77: 423.28 77,78: 414.38 // okay, not as bad as I remembered! The 20% I remembered was from [url=http://mersenneforum.org/showpost.php?p=306572&postcount=1824]this post[/url]. The new barrett76 kernel is only usable for 76 bits (77 overflows), and so a less efficient kernel must be used. 78,79: 414.24 79,80: 414.32 I don't have time to do more benchmarking at the moment. |
| All times are UTC. The time now is 23:17. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.