![]() |
A Balancing Act
1 Attachment(s)
I have been running a lot of [I]Prime95[/I] recently. I was not fond of running that with [I]mfaktc[/I]. So, I did a video card swap with my older HP. The HP could not handle the 1080 running at its factory defaults. So, I am using [I]MSI AfterBurner[/I] to under-clock it.
At the factory defaults, I would get a hard reset after a few seconds. I throttled it back to 65%. At this setting, the temperature is staying around 60°C. It is still running above 900 GHz-Days/Day. Another concern about heat is the yellow and black power adapter plug. The wire is too thin for this. It is warm to the touch, bot now where near what I would consider hot. I increased the speed of the case fan to adjust. It is high capacity and moves a lot of air. The thing about this HP is that I can give it a job, turn the monitor off, and let it run for days, or even weeks, without looking at it. In this setup, I will check it a few times a day, maybe. After I looked at the attached photo, I saw that I need to do a [U]lot[/U] of cleaning. This brown dust gets into everything. |
Just running a vacuum cleaner over it may not be enough. I've recently had one system crashing because of dust and fluff built up under the fan clogging the heat sink. I had to scrape the fluff with a straightened out paperclip to get it loose enough for the vacuum cleaner to suck it out.
It would have been easier if I could have taken the fan off the heat sink so I could get at the fluff. Chris |
mfaktc uses 4620 classes = 2 × (2×3×5×7×11)
Why not add × 13 ? Wouldn't it filter even better that way? |
[QUOTE=GP2;503795]mfaktc uses 4620 classes = 2 × (2×3×5×7×11)
Why not add × 13 ? Wouldn't it filter even better that way?[/QUOTE] Yes, but there will be even more overhead in handling those residue classes, too. Ofc the overhead could be reduced in some places but it isn't really worth doing so. Oliver |
[QUOTE=chris2be8;503790]Just running a vacuum cleaner over it may not be enough. I've recently had one system crashing because of dust and fluff built up under the fan clogging the heat sink. I had to scrape the fluff with a straightened out paperclip to get it loose enough for the vacuum cleaner to suck it out.
It would have been easier if I could have taken the fan off the heat sink so I could get at the fluff. Chris[/QUOTE] This is a [U]soon[/U] to-do item. I used canned air first, then a Dust Buster vac. The heat-sink on this is a bit strange and I'm not exactly sure how the fan comes off. A great thing to use on fan blades, if you can get it out, is an old toothbrush. |
[QUOTE=storm5510;503812]This is a [U]soon[/U] to-do item. I used canned air first, then a Dust Buster vac.[/QUOTE]Using the canned air in one hand and the vac in the other can work well. PrimeMonster or CheeseHead used to extol the virtues of this method.
|
[QUOTE=TheJudger;503806]Yes, but there will be even more overhead in handling those residue classes, too. Ofc the overhead could be reduced in some places but it isn't really worth doing so.[/QUOTE]
What is the biggest source of overhead? Is it the sieving that is done after the classes are selected, or something to do with the GPU, or something else? Currently, 960/4620 means that 20.78% of the classes are "classes_needed", and then I guess sieve_init_class filters out some percentage of the remainder. But if it was possible to go all the way to 2 × (2 × 3 × 5 × 7 × 11 × 13 × 17 × 19 × 23 × 29), then there would be a lot more classes but only 15.79% of those classes would be needed. It would be one-quarter fewer as a percent of the total classes. Presumably in this case sieve_init_class could be configured to do considerably less sieving. Maybe the number of classes itself could be dynamically adjustable by the program. Presumably you want fewer classes if you are TFing to a smaller number of bits and zipping through each exponent in a couple of seconds, and more classes if you are using much higher TF limits and each exponent may take hours. The code already has the MORE_CLASSES adjustment between 420 and 4620, which is presumably based on such considerations, but it's fixed at compile time. I'm sure everything above has already been thought about, so I'm hoping to understand better in what areas the overhead increases and in what ways. |
[QUOTE=GP2;503820]The code already has the MORE_CLASSES adjustment between 420 and 4620, which is presumably based on such considerations, but it's fixed at compile time.[/QUOTE]For what it's worth, mfakt[b]o[/b] has a similar option, but it's set in .ini (MoreClasses=1) and not at compile-time.
|
[QUOTE=Uncwilly;503819]Using the canned air in one hand and the vac in the other can work well. PrimeMonster or CheeseHead used to extol the virtues of this method.[/QUOTE]
Not a bad idea. :smile: I found a GPU power adapter cable on Amazon that is an extension. It does not split into two like the one I am using now. It is heavier gauge wire as well. For now, I am running is with the side cover off. I was able to sneak the power consumption up to 70%. No issues. Something I have not mentioned is PSU heat. The exhaust outlet on the back is relatively cool to the touch. Some here may think all this is a bit off-topic. It is [I]mfaktc[/I] that I am running. [I]GPUto72[/I] |
Hi,
[QUOTE=GP2;503820]What is the biggest source of overhead? Is it the sieving that is done after the classes are selected, or something to do with the GPU, or something else? Currently, 960/4620 means that 20.78% of the classes are "classes_needed", and then I guess sieve_init_class filters out some percentage of the remainder. But if it was possible to go all the way to 2 × (2 × 3 × 5 × 7 × 11 × 13 × 17 × 19 × 23 × 29), then there would be a lot more classes but only 15.79% of those classes would be needed. It would be one-quarter fewer as a percent of the total classes. Presumably in this case sieve_init_class could be configured to do considerably less sieving. Maybe the number of classes itself could be dynamically adjustable by the program. Presumably you want fewer classes if you are TFing to a smaller number of bits and zipping through each exponent in a couple of seconds, and more classes if you are using much higher TF limits and each exponent may take hours. The code already has the MORE_CLASSES adjustment between 420 and 4620, which is presumably based on such considerations, but it's fixed at compile time. I'm sure everything above has already been thought about, so I'm hoping to understand better in what areas the overhead increases and in what ways.[/QUOTE] One source of overhead is that GPU queue runs empty after each class. You fire alot of kernel invocations to your GPU and shouldn't assume that they are executed in that order (it is possible to make sure they are running in specific order). Let the queue run empty after each class is just simpler and doesn't cost that much. More classes means the distances between two factor candidates increases - at some point you have to take care about this, too (e.g. 64bit instead of 32bit for difference values). Another source of overhead is the fact that mfaktc doesn't handle partial blocks, the last block is always fully used - with more classes the work above the upper limit increases aswell. You can compare the two versions (420 vs. 4620 classes) and give an estimate when the next prime should be included in the number of residue classes. Oliver |
[QUOTE=TheJudger;503945]One source of overhead is that GPU queue runs empty after each class. You fire alot of kernel invocations to your GPU and shouldn't assume that they are executed in that order (it is possible to make sure they are running in specific order). Let the queue run empty after each class is just simpler and doesn't cost that much.
[...] Another source of overhead is the fact that mfaktc doesn't handle partial blocks, the last block is always fully used - with more classes the work above the upper limit increases aswell. [/QUOTE] Is it somehow possible, at least in principle, to accumulate full-size blocks that may include work from more than one class or even more than one exponents, and then send that as a batch to the GPU? In other words, how much of the overhead is fundamental to the way the CUDA code works, and how much is just a legacy implementation choice? I know very little about CUDA and am trying to decide if it's worth learning more. [QUOTE] More classes means the distances between two factor candidates increases - at some point you have to take care about this, too (e.g. 64bit instead of 32bit for difference values).[/QUOTE] But most people will have a 64-bit architecture, I think? Or are you referring to single-precision vs. double-precision speed within the GPU? |
| All times are UTC. The time now is 23:04. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.