![]() |
|
|
#320 |
|
Jun 2005
3×43 Posts |
For one, I don't think that M/sec has anything to do with run time per factor. Or at least there's no easy way to map from one to the other. Turn off the siever and your M/sec would go through the roof - as would execution time since you're doing a lot of unnecessary work.
|
|
|
|
|
|
#321 | |
|
Oct 2011
Maryland
2·5·29 Posts |
Quote:
|
|
|
|
|
|
|
#322 |
|
Jun 2005
3·43 Posts |
Well yeah, if you record X/sec and X you can work back to seconds, but since time is also given in the output, there's no point in making things more complex than they have to be. But my point was that the rate by itself tells you nothing since you have no idea how much work the GPU is doing, even if you do know what rate it is doing it at.
Last fiddled with by kjaget on 2012-01-12 at 19:59 |
|
|
|
|
|
#323 | |
|
Nov 2010
Germany
25516 Posts |
Quote:
|
|
|
|
|
|
|
#324 | ||
|
Nov 2010
Germany
3·199 Posts |
Quote:
The question marks in the "Compute" column could be replaced by the OpenCL version that these cards support: 1.1 for all cards except those with an RVxxx chip with xxx<700. RV700 is the first to support OpenCL 1.1. But I don't know if the earlier cards supported 1.0 or no OpenCL at all ... And OpenCL 1.1 is required for mfakto, therefore the same split can be used to find the AMD cards with "will not run" mfakto. Should be the same as selecting anything below HD4xxx as "will not run". Quote:
![]() I now got the barrett mul24 kernel to work (correctly!), which increases the efficiency by ~20-30%. But to reach the "5" divider will be hard ... Maybe with the HD7970 ... |
||
|
|
|
|
|
#325 | |
|
Oct 2011
Maryland
2×5×29 Posts |
Quote:
M/s should be constant no matter what the assignment. Time and Candidates change. Thus M/s is better. I think M/s and GPU usage should be sufficient to determine a theoretical maximum GHz-d/d. Of course things like CPU (which affects SievePrimes) matters too, but I think you can get a theoretical max (independent of system) from those numbers. Last fiddled with by KyleAskine on 2012-01-13 at 13:28 Reason: Added last line, fixed a typo |
|
|
|
|
|
|
#326 | |
|
"James Heinrich"
May 2004
ex-Northern Ontario
23·149 Posts |
Quote:
Code:
UPDATE `gpu` SET `compute` = "1.2" WHERE (`brand` = "A") AND (`model` LIKE "Radeon HD 7%"); UPDATE `gpu` SET `compute` = "1.2" WHERE (`brand` = "A") AND (`model` LIKE "Radeon HD 6%"); UPDATE `gpu` SET `compute` = "1.1" WHERE (`brand` = "A") AND (`model` LIKE "Radeon HD 5%"); UPDATE `gpu` SET `compute` = "1.1" WHERE (`brand` = "A") AND (`model` LIKE "Radeon HD 4%"); UPDATE `gpu` SET `compute` = "1.1" WHERE (`brand` = "A") AND ((`codename` = "Westler") OR (`codename` = "Zacate") OR (`codename` = "Ontario") OR (`codename` = "WinterPark") OR (`codename` = "BeaverCreek")); Sorry! ![]() It's no reflection on your programming, just the design of AMD GPUs. This article illustrates some of the problems with VLIW4 that Graphics Core Next is supposed to remedy. Perhaps it can translate into better mfakto efficiency(?) It's also hard for mfaktc/NVIDIA. Older v1.x GPUs are pretty close to the current Radeon efficiency, and newer v2.1 GPUs actually take a 33% performance hit compared to v2.0, not quite sure why. But I still need more benchmark data. I've seen results ranging from 13x to 18x in the few benchmarks I've received so far, I need more data points to figure out what patterns there may be. |
|
|
|
|
|
|
#327 | |
|
Jun 2005
3·43 Posts |
Quote:
If you're looking for a theoretical measure, we'd need to hack the code to turn off sieving so as many candidates are fed to the GPU as possible per CPU<->GPU transaction. Run as many copies of these as necessary to max the GPU (or compare this to running 1 instance and scaling it with GPU load to see if it gives the same answer). Then we'd need to run through a pass with sieve primes maxed to see the minimum number of candidates required to test an exponent. This last step would only have to be done once since it's independent of the GPU. Combining the peak candidates per second with the minimum number of candidates per exponent would get us close to the theoretical peak throughput. But I'm not convinced that ignoring the real overhead in real systems is any more accurate a measurement than just seeing how long an exponent takes to run in a real system. |
|
|
|
|
|
|
#328 | |
|
Jun 2005
3×43 Posts |
Quote:
If I understand it correctly, 2.1 removed some compute resources and relies on a better scheduler to try and run more instructions in parallel. But mfaktc instruction parallelism can't be improved by the better scheduler so it gets hit by the reduced resources without any corresponding gain from the better scheduler. |
|
|
|
|
|
|
#329 | |
|
Jun 2005
2018 Posts |
Quote:
I would like to see the timing info grouped together first (time/class & eta), then sieve primes, then the throughput stuff grouped together last. This orders it roughly by order of importance performance-wise, at least from a user's perspective. I've seen too many people set sieveprimes as low as possible to get a higher candidates/sec number when all that does is kill their run times. Hopefully moving time first will inspire them to minimize that instead of trying to max M/s by making the GPU do unnecessary work. But whatever you do, I'd coordinate with Oliver so you guys keep as much of the code common as possible. Should make it easier later on when it's integrated into Prime95 (I can dream, can't I). |
|
|
|
|
|
|
#330 |
|
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
3·29·83 Posts |
I found that when testing a 200M number, avg. rate dropped from ~195 to ~170, maybe ~165 sometimes. When I went back to 50M, the rate went up again. Could this be due to a higher cost of checking factors?
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfaktc: a CUDA program for Mersenne prefactoring | TheJudger | GPU Computing | 3498 | 2021-08-06 21:07 |
| gpuOwL: an OpenCL program for Mersenne primality testing | preda | GpuOwl | 2719 | 2021-08-05 22:43 |
| LL with OpenCL | msft | GPU Computing | 433 | 2019-06-23 21:11 |
| OpenCL for FPGAs | TObject | GPU Computing | 2 | 2013-10-12 21:09 |
| Program to TF Mersenne numbers with more than 1 sextillion digits? | Stargate38 | Factoring | 24 | 2011-11-03 00:34 |