![]() |
Is there much point going past 72 on TF work?
|
[QUOTE=tului;376862]Is there much point going past 72 on TF work?[/QUOTE]
This has been the topic of much debate. The slippery answer is, "It depends," on the exponent range in particular. In the current active LL range, 74 is generally the target, but may be adjusted downward if LL workers are running short of material. The tables [URL="http://www.gpu72.com/reports/available/"]here[/URL] show the current aim points based on the break even levels of TF vs LL effort required. (The above can probably be better stated, and likely will be by those more mathematically capable than me.) EDIT: It may also depend of whether one is running AMD or nVidia GPUs. EDIT2: ATM, the page linked above is not showing me any data in the fields. I can't say if this is a problem on my end or at GPU72. |
[QUOTE=tului;376862]Is there much point going past 72 on TF work?[/QUOTE]
Yes, depending on your GPU, and the exponent, you may be much better doing TF to 74 (even 75), or doing LL. See [URL="http://www.mersenne.ca/cudalucas.php?model=12"]here[/URL] for details, click on the cards you have (or any cards) to see the graphs. |
[QUOTE=kladner;376872]This has been the topic of much debate.[/QUOTE]
Indeed! :wink: [QUOTE=kladner;376872]The slippery answer is, "It depends," on the exponent range in particular. In the current active LL range, 74 is generally the target, but may be adjusted downward if LL workers are running short of material.[/QUOTE] James put a lot of effort into providing a definitive analysis of where the "curves cross" (peer reviewed by many very smart and knowledgeable people). This is defined as where (and to what level) it is more efficient to "TF" than it is to "LL" (or "DC") [B][U]ON THE SAME COMPUTING DEVICE[/U][/B]. His analysis [URL="http://www.mersenne.ca/cudalucas.php?model=12"]is here[/URL]. Note importantly that the cross-over is not only a function of the candidate's size, but also the GPU core. You can click on the cards listed below the graph to see where the cross-over point is for that particular device (it varies slightly). [QUOTE=kladner;376872]EDIT: It may also depend of whether one is running AMD or nVidia GPUs.[/QUOTE] Definitely. Since OpenCL can't currently do LL testing, technically any depth Makes Sense [SUP](TM)[/SUP] (and is why no AMD cards are listed in the table below the graph). Bdot and others can definitely give you advise on what depth it makes the most sense for AMD cards to TF to (and where). I do code, not math... [QUOTE=kladner;376872]The tables [URL="http://www.gpu72.com/reports/available/"]here[/URL] show the current aim points based on the break even levels of TF vs LL effort required. EDIT2: ATM, the page linked above is not showing me any data in the fields. I can't say if this is a problem on my end or at GPU72.[/QUOTE] And these tables show what GPU72 is currently aiming for, based on James' analysis and our current available firepower. The cells which are yellow are where we are aiming for, and will release back to Primenet (if a P-1 test has also been done, or Primenet is "hungry"). We should really be going to 75 "bits" for some of our current ranges, but we simply can't do that at the moment. (P.S. The report from GPU is working for me. Please let me know if you continue to see issues.) |
[QUOTE=LaurV;376880]Yes, depending on your GPU, and the exponent, you may be much better doing TF to 74 (even 75), or doing LL. See [URL="http://www.mersenne.ca/cudalucas.php?model=12"]here[/URL] for details, click on the cards you have (or any cards) to see the graphs.[/QUOTE]
Although after 73 bits, production decreases a bit on AMD cards because it switches to a diffrent kernel... [QUOTE=chalsall;376882] Since OpenCL can't currently do LL testing, technically any depth Makes Sense [SUP](TM)[/SUP] (and is why no AMD cards are listed in the table below the graph).[/QUOTE] I think a better wording would be "Since a LL testing program hasn't been ported/made to OpenCL..." :razz: Technically, there [URL="http://mersenneforum.org/cllucas"]is[/URL] clLucas which I run on and off. It is "limited"/fastest on powers of two FFT's. If my memory is not mistaken, a 7970 does around 3.6 iter/ms on a 2M FFT. |
[QUOTE=tului;376858][URL]https://drive.google.com/file/d/0B0Yq8K5dWh1BWjlqRjAyVjY0elE/edit?usp=sharing[/URL]
[URL]https://drive.google.com/file/d/0B0Yq8K5dWh1BMS1waG5qZXhKVXM/edit?usp=sharing[/URL] Here they go.[/QUOTE] Thanks a lot. The data shows that mfakto does not need to make a difference between the older and the current GCN generations. I compared your results (adjusted for clock speed and number of compute units) to some older results of a 7770 and a 7850. The R7 260X came in 0.5-1.5% short of the expected results for the 15-bit kernels, and 3.2-3.8% short for the 32-bit kernels. I then compared it to some recent test results of a 7870XT. Here, your card was ahead 1-1.5% for the 15-bit kernels, and exactly as expected for the 32-bit kernels. I think, the new GCN generation has a slight performance improvement except in 32-bit multiplications. The newer drivers add a slight decline to all. The differences are so small that no kernel reordering is required. |
[QUOTE=chalsall;376882]Indeed! :wink:
<snip> (P.S. The report from GPU is working for me. Please let me know if you continue to see issues.)[/QUOTE] It now seems to be fine. It must have been a blockage in the particular internet tube through which I am connected. :razz: |
[QUOTE=kracker;376896]Technically, there [URL="http://mersenneforum.org/cllucas"]is[/URL] clLucas which I run on and off. It is "limited"/fastest on powers of two FFT's. If my memory is not mistaken, a 7970 does around 3.6 iter/ms on a 2M FFT.[/QUOTE]
Indeed, with the observation that the unit is milliseconds per iteration, and not iterations per millisecond (aren't you wishing that? :razz:) clLucas is good if you get to test exponents under (and close to) L1=38492887, which is the last for which 2048k FFT can be used, and if you test exponents under L2=75846319 (and close to it), which is the last one for which 4096k FFT can be used. Because openCL FFT is not optimized for non-powers of two, testing for example a 40M exponents will be extremely slow if a non-power-of-2 FFT is used, and if the 4096 is used, the iteration takes the same time as for a 75M exponent. Which is very slow. So, if you have a good GCN card, you can choose either: - do DC LL for exponents close to and lower than L1 - do first time LL for exponents close to and lower than L2 (risky if no DC is done in parallel, you don't know if the final result will be right, and you can miss a prime in the worst case, which will be find later by other hunter, and you will feel very sorry :wink:) - do TF. From the three, this is better and easier, you get max amount of credit, you make other guys happy. But you can't find primes (not that you will find so many, anyhow :razz:) |
Please fix
The Intel compiler is whining, can someone please change the source code for the warnings below -- I don't know how to use git yet.
Also, the mfakto wiki page needs to update to the current version 0.14. [CODE]In file included from :85: .\barrett15.cl:218:40: warning: operator '>>' has lower precedence than '+'; '+' will be evaluated first res->d5 = mad24(a.d3, b.d2, res->d5) + res->d4 >> 15; ~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~ ~~ .\barrett15.cl:218:40: note: place parentheses around the '+' expression to sile nce this warning res->d5 = mad24(a.d3, b.d2, res->d5) + res->d4 >> 15; ^ ( ) .\barrett15.cl:223:40: warning: operator '>>' has lower precedence than '+'; '+' will be evaluated first res->d6 = mad24(a.d3, b.d3, res->d6) + res->d5 >> 15; ~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~ ~~ .\barrett15.cl:223:40: note: place parentheses around the '+' expression to sile nce this warning res->d6 = mad24(a.d3, b.d3, res->d6) + res->d5 >> 15; ^ ( ) .\barrett15.cl:227:40: warning: operator '>>' has lower precedence than '+'; '+' will be evaluated first res->d7 = mad24(a.d3, b.d4, res->d7) + res->d6 >> 15; ~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~ ~~ .\barrett15.cl:227:40: note: place parentheses around the '+' expression to sile nce this warning res->d7 = mad24(a.d3, b.d4, res->d7) + res->d6 >> 15; ^ ( )[/CODE] |
bug reports:
1) The Intel compiler does not like -O3. 2) When clBuildProgram is called with invalid build options it returns error -43. If verbosity is set to 3, then clGetBuildInfo tries to get the build log -- there is none -- and some trash characters are output |
Please change line 84 of Montgomery.cl to:
r2 += ((r1!=0)? (ulong_v)1UL : (ulong_v)0UL); |
| All times are UTC. The time now is 23:06. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.