![]() |
[QUOTE=Xyzzy;259219]
[code]╔═════════╤════════╤════════╤════════╤════════╤════════╤══════╗ ║instances│cpu_load│gpu_load│ave_rate│cpu_temp│gpu_temp│ time ║ ╟─────────┼────────┼────────┼────────┼────────┼────────┼──────╢ ║ 0│ 0%│ 0%│ n/a│ 29°C│ 32°C│ n/a║ ║ 1│ 26%│ 52%│ 190M/s│ 54°C│ 52°C│ 9m10s║ ║ 2│ 51%│ 92%│ 173M/s│ 63°C│ 62°C│10m08s║ ║ 3│ 76%│ 95%│ 121M/s│ 68°C│ 65°C│12m29s║ ║ 4│ 100%│ 97%│ 97M/s│ 71°C│ 66°C│14m44s║ ╚═════════╧════════╧════════╧════════╧════════╧════════╧══════╝[/code] Is it sane to use the average rate to determine overall throughput? [/QUOTE] As a rough estimate it might be OK but you should keep an eye on SievePrimes, too. Higher SievePrimes means more candidates are removed with sieving on CPU so each class has less candidates. A better estimate is the time per class. [QUOTE=Xyzzy;259219] [code]╔═════════╤════════════════╗ ║instances│ throughput ║ ╟─────────┼────────────────╢ ║ 1│190 × 1 = 190M/s║ ║ 2│173 × 2 = 346M/s║ ║ 3│121 × 3 = 363M/s║ ║ 4│ 97 × 4 = 388M/s║ ╚═════════╧════════════════╝[/code] We interpret the data above to be that the CPU is filling the GPU "bucket" faster than the GPU can empty the "bucket". With 2 or more instances the GPU load is nearly topped out. [/QUOTE] Yepp, but mentioned above with more CPU power you can do more sieving and this reduces the number of candidates per class. Oliver |
[QUOTE]Higher SievePrimes means more candidates are removed with sieving on CPU so each class has less candidates.[/QUOTE]We have tested raising "SievePrimes" from 25,000 up to 1,000,000 without any difference in performance. The CPU load and CPU memory usage did not increase. Perhaps we are doing something wrong?
|
CUDA 4.0 driver out
1 Attachment(s)
The CUDA 4.0 driver is out (see attachment, driver 270.XX). Could anybody compile mfaktc for Win7 with 4.0? I'd like to test if there's gonna be any speedup.
|
[QUOTE=Brain;259273]The CUDA 4.0 driver is out (see attachment, driver 270.XX). Could anybody compile mfaktc for Win7 with 4.0? I'd like to test if there's gonna be any speedup.[/QUOTE]
Is it enough to update drivers to 4.0 if you still have 3.0 sdk? Luigi |
1 Attachment(s)
[QUOTE=ET_;259281]Is it enough to update drivers to 4.0 if you still have 3.0 sdk?[/QUOTE]
I only updated the driver and was surprised by the mfaktc output... Find attached the updated lib from CUDA 4.0.12 RC2 toolkit. It will be needed when somebody does the recompile. |
[QUOTE=Xyzzy;259251]We have tested raising "SievePrimes" from 25,000 up to 1,000,000 without any difference in performance. The CPU load and CPU memory usage did not increase. Perhaps we are doing something wrong?[/QUOTE]
Unless you've modified the code you can't increase SievePrimes to 1,000,000. In mfaktc 0.16 it is limited to 100,000. Did you set SievePrimesAdjust to 0 for your tests? Otherwise mfaktc will adjust SievePrimes automatically during the run and both settings (25k and 100k for SievePrimes) will have the same settings after some time. Hint: You can see the actually SievePrimes in the "per class status output". No matter to what you've set SievePrimes, CPU load will allways be 100% of one core. Memory usage does not depend on SievePrimes. Oliver |
[QUOTE=Brain;259283]I only updated the driver and was surprised by the mfaktc output...
[/QUOTE] Brain - how's your performance? I found a drop of roughly 10% when I upgraded win7 64bit driver to ver 270+ -- Craig |
[QUOTE=TheJudger;259284]Unless you've modified the code you can't increase SievePrimes to 1,000,000. In mfaktc 0.16 it is limited to 100,000. Did you set SievePrimesAdjust to 0 for your tests? Otherwise mfaktc will adjust SievePrimes automatically during the run and both settings (25k and 100k for SievePrimes) will have the same settings after some time.
Hint: You can see the actually SievePrimes in the "per class status output". No matter to what you've set SievePrimes, CPU load will allways be 100% of one core. Memory usage does not depend on SievePrimes. Oliver[/QUOTE] Oliver: xyzzy is probably no more immune to typos than the rest of us, and it's easy to confuse a 1 followed by 5 0s with a 1 followed by 6 zeros unless a thousands separator is used. Remember that you were in the process of making mfaktc do sleeps to the operating system instead of busy-waits, but we don't have that yet, so to first order, if the CPU is keeping the GPU fed, changing SievePrimes will make no difference in performance. My windows machine with the slow GPU needs that upgrade still. I have to get the i7 ubuntu machine with GTX570 on order still. Eric |
Hi Eric,
[QUOTE=Christenson;259289] Remember that you were in the process of making mfaktc do sleeps to the operating system instead of busy-waits, but we don't have that yet, so to first order, if the CPU is keeping the GPU fed, changing SievePrimes will make no difference in performance.[/QUOTE] Depends how you define performance. If your CPU can keep the GPU busy all the time the GPU with different values of SievePrimes the GPU throughput remains the same. But a higher SievePrimes removes more candidates before GPU work is done. If a CPU can keep the GPU busy with a higher SievePrimes the runtime per class/exponent will be lower while the GPU rate remains the same. Oliver |
1 Attachment(s)
[QUOTE=nucleon;259286]Brain - how's your performance? I found a drop of roughly 10% when I upgraded win7 64bit driver to ver 270+[/QUOTE]
I cannot see any great differences compared with [URL="http://www.mersenneforum.org/showpost.php?p=255177&postcount=647"]my last benchmark run[/URL]: see attached file. (Win7 64bit) |
Running 2 instances only, I can confirm a slight drop of 5 to 10%.
|
| All times are UTC. The time now is 23:09. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.