![]() |
[QUOTE=flashjh;280059]If I let it sleep it probably won't wake up again.
After testing everything, I think it's clear that my CPU and board architechture just can't keep up with the GPUs. Not that is a bad thing, but I think that for optimum mfakto it's two cores per instance, one instance for each GPU with sieve at 5000 -- thoughts? I attached a screen shot of one of the instances with that setup. CPU sits at about 85%, GPUs are at about 64%.[/QUOTE] The "time" column is the time per class. Each printed line is a class that was tested. Jumps in the class number mean there were classes that do not need to be tested (because there are no primes in that class as all are divisible by 11, for instance). For good throughput I suggest running two instances per GPU, without limiting the affinity. The scheduler should sort out, what is available. The siever is single-threaded, but as mfakto (rather the OpenCL runtime) uses a background thread to drive the GPU, you'd never get best performance when setting the affinity to just one core. Driving the GPU and preparing the next set of factor candidates would be serialized on one core, adding big delays (also seen in high wait times). When allowing two cores per instance, the cores will not be fully loaded all the time, but sometimes it is just better to let the two threads per instance run in parallel. Fixing SievePrimes in this mode should not be necessary, but setting it to 5000 may leave some room for one or two prime95 threads. I guess you'd get even more throughput (tests per day) when running 3 instances per GPU, not limiting the affinity, SievePrimesAutoAdjust, no prime95. BTW, when just testing this, you don't need to create extra directories. You can use the -tf parameter and specify different exponents for each instance in the same directory: start mfakto -d 11 -tf 50017789 69 70 start mfakto -d 11 -tf 50019539 69 70 start mfakto -d 11 -tf 50024621 69 70 start mfakto -d 12 -tf 50030767 69 70 start mfakto -d 12 -tf 50031103 69 70 start mfakto -d 12 -tf 50034529 69 70 They will share mfakto.ini and results.txt, but not use any worktodo file. Regarding the CPU-sleep: mfakto always puts the CPU to sleep when the CPU is waiting for the GPU - the AllowSleep parameter is not used in mfakto. But testing a busy-wait like what you can choose on mfaktc is on my test wish list ... And regarding the HD6870 machine that is bound to 11.11: This is not my box, and is mainly used for gaming ... :smile: On mission-critical systems like this you can't just replace a driver and risk stability issues :grin: |
Ooh that post explains a lot. Is that bit about thread design the same for mfaktc?
|
[QUOTE=flashjh;280097]I can't speak to the multithreaded siever, but if I only assign 1 core per instance, it way underpowers the GPU.
Thanks everyone for the inputs. I was able to max the CPU and push the GPUs fairly hard with three instances, but in the end my CPU doesn't have the ability to maximize the GPUs. All-in-all I can run two instances for about 240 M/s along with Prime95. I'm happy with that. mfakto doesn't have a time per class, per se, but with these settings I get the best throughout and fastest times.[/QUOTE] One other thing you can try, which I did on my system, is to adjust your NumStreams and CPUStreams. I noticed on one core and 3/3 streams that I'd have way low avg wait, then way high, repeating. I changed it to 4/4 and avg wait stayed low. When I went to 2 instances, I changed it to 5/5. 2 instances will now max out both my 560 and 560Ti, and the changes to streams and locking the sieve seemed to help a lot. The 560 was doing 175M/s with 1 and 200M/s with 2 mfaktc running though this did not max the gpu, but setting #1 at 5k sieve and #2 @ 60k Sieve with the 5/5 streams, I got a combined 238M/s and 99% gpu load. |
[QUOTE=Dubslow;280165]Ooh that post explains a lot. Is that bit about thread design the same for mfaktc?[/QUOTE]
No, CUDA does not start another thread, mfaktc is single-threaded. As there is certainly some parallelism required for CUDA as well, I assume the NV driver is hiding that completely. |
Seems I responded too soon, but hopefully this is still good info.
[QUOTE=flashjh;280047]Thanks for the explanation... I have another question then. So, I understand on my system that the two 5870s outclass my X9650. But, I keep the SievePrimes at 5000 on both mfakto instances because otherwise the M/s drop way down.[/QUOTE] Again, this is not a good measure of performance. M/sec drops because the CPU is filtering out some useless work that the GPU is forced to do if you lower the SievePrimes value. The higher M/sec is because you're forcing the GPU to factor primes which would be discovered as such and skipped if you let the CPU sieve a bit more. Look at average time per class instead. [QUOTE]How do I determine/get maximum throughput?[/QUOTE] Turn on sieve primes adjust or whatever it's called. Run 1 instance, let it stabilize, see how long the per-class time settles on. Take the inverse of this - this is number of classes per second which directly coorelates to exponents/time which is a measure of throughput. Run two instances of the same exponent, let the per-class times stabilize. Do (1/time_1) + (1/time_2) - this is the number of classes both combined instances will do per unit time. If it's higher than the 1-CPU instance, you're doing more work overall with 2 instances. Repeat with 3 instances. Do (1/time_1)+(1/time_2)+(1/time_3). See if this is higher than the other two cases. Eventually you'll run into the situation where you've loaded up on the GPU and so adding more CPUs gives the same throughput (actually probably a bit less since there's some overhead switching between the various runs). That will tell you when to stop adding CPUs. Probably a bit before that, because it's really unlikely that the GPU requires an exact multiple of CPUs to max it out. You'll probably find something like 2 CPUs gets you 90% of the performance of 3 CPUs because 2 almost but not quite max out the GPU. In that case, it's a question of whether that last 10% of performance is worth giving up a CPU which could be doing LL tests instead. |
mfakto freezing with multiple instances in linux - fixed??
I think I may have fixed my earlier problem. I think my NumStreams was set wayyy to high (it was at 5, don't know why). I lowered it to two, and it has been running two instances for half an hour just fine, which is way longer than it has ever run in the past. I will let it run all day today, and if it goes well, I might bump it again to three instances!
So I guess the lesson is (or may be if it actually works): more instances --> less streams. |
Spoke too soon! It crashed again. I may try lowering vectors to two when I get home, but I am officially stumped.
|
[QUOTE=KyleAskine;280378]Spoke too soon! It crashed again. I may try lowering vectors to two when I get home, but I am officially stumped.[/QUOTE]
Lowering vectors and lowering GridSize will have the same effect: reduced runtime per kernel but running more kernels. However, lowering GridSize will (almost) keep the efficiency while lowering vectors will reduce it much more. Still, you can test it of course. Anyway, I don't think any of these will be a permanent solution. Did you already check /var/log/messages? The kernel usually logs something when a hang occurs ... Can you try downclocking the GPU(s)? In case downclocking helps, this could hint at some hardware issue as the GPU does not live up to its specs. |
[QUOTE=Bdot;280514]
Anyway, I don't think any of these will be a permanent solution. Did you already check /var/log/messages? The kernel usually logs something when a hang occurs ... [/QUOTE] The most recent hang: [CODE]Nov 29 17:24:09 kyleserv kernel: [40370.931288] [fglrx] ASIC hang happened Nov 29 17:24:09 kyleserv kernel: [40370.931296] Pid: 5923, comm: mfakto Tainted: P 2.6.32-5-amd64 #1 Nov 29 17:24:09 kyleserv kernel: [40370.931297] Call Trace: Nov 29 17:24:09 kyleserv kernel: [40370.931434] [<ffffffffa01a000c>] ? firegl_hardwareHangRecovery+0x1c/0x50 [fglrx] Nov 29 17:24:09 kyleserv kernel: [40370.931483] [<ffffffffa0223344>] ? _ZN18mmEnginesContainer9timestampEP26_QS_MM_TIMESTAMP_PACKET_INP27_QS_MM_TIMESTAMP_PACKET_OUT+0x184/0x1c0 [fglrx] Nov 29 17:24:09 kyleserv kernel: [40370.931528] [<ffffffffa02393a0>] ? _ZN7PM4Ring9PM4submitEPPjb+0xb0/0x1d0 [fglrx] Nov 29 17:24:09 kyleserv kernel: [40370.931560] [<ffffffffa01bc222>] ? firegl_trace+0x72/0x1e0 [fglrx] Nov 29 17:24:09 kyleserv kernel: [40370.931603] [<ffffffffa022e6b3>] ? _ZN8mmEngine9timestampEv+0x63/0x90 [fglrx] Nov 29 17:24:09 kyleserv kernel: [40370.931644] [<ffffffffa0223330>] ? _ZN18mmEnginesContainer9timestampEP26_QS_MM_TIMESTAMP_PACKET_INP27_QS_MM_TIMESTAMP_PACKET_OUT+0x170/0x1c0 [fglrx] Nov 29 17:24:09 kyleserv kernel: [40370.931677] [<ffffffffa01bebf0>] ? firegl_cmmqs_TSExpired+0x0/0xd0 [fglrx] Nov 29 17:24:09 kyleserv kernel: [40370.931716] [<ffffffffa0203e3a>] ? IsThreadTSExpired+0xca/0x110 [fglrx] Nov 29 17:24:09 kyleserv kernel: [40370.931748] [<ffffffffa01bec4b>] ? firegl_cmmqs_TSExpired+0x5b/0xd0 [fglrx] Nov 29 17:24:09 kyleserv kernel: [40370.931778] [<ffffffffa01937a3>] ? KCL_WAIT_Add_Exclusive+0x6c/0x74 [fglrx] Nov 29 17:24:09 kyleserv kernel: [40370.931809] [<ffffffffa01aa9fa>] ? irqmgr_wrap_wait_for_hifreq_interrupt_ex+0xba/0x3d0 [fglrx] Nov 29 17:24:09 kyleserv kernel: [40370.931840] [<ffffffffa01a8d7b>] ? MCIL_SuspendThread+0xdb/0x120 [fglrx] Nov 29 17:24:09 kyleserv kernel: [40370.931882] [<ffffffffa020d662>] ? _ZN2OS13suspendThreadEj+0x22/0x30 [fglrx] Nov 29 17:24:09 kyleserv kernel: [40370.931922] [<ffffffffa020635f>] ? CMMQSWaitOnTsSignal+0xaf/0xd0 [fglrx] Nov 29 17:24:09 kyleserv kernel: [40370.931963] [<ffffffffa0215a72>] ? _Z8uCWDDEQCmjjPvjS_+0xc32/0x10c0 [fglrx] Nov 29 17:24:09 kyleserv kernel: [40370.931995] [<ffffffffa01be732>] ? firegl_cmmqs_CWDDE_32+0x332/0x440 [fglrx] Nov 29 17:24:09 kyleserv kernel: [40370.932027] [<ffffffffa01bd060>] ? firegl_cmmqs_CWDDE32+0x70/0x100 [fglrx] Nov 29 17:24:09 kyleserv kernel: [40370.932059] [<ffffffffa01bcff0>] ? firegl_cmmqs_CWDDE32+0x0/0x100 [fglrx] Nov 29 17:24:09 kyleserv kernel: [40370.932088] [<ffffffffa019bc18>] ? firegl_ioctl+0x1e8/0x250 [fglrx] Nov 29 17:24:09 kyleserv kernel: [40370.932092] [<ffffffff81041298>] ? pick_next_task_fair+0xca/0xd6 Nov 29 17:24:09 kyleserv kernel: [40370.932121] [<ffffffffa01921f6>] ? ip_firegl_unlocked_ioctl+0x9/0xd [fglrx] Nov 29 17:24:09 kyleserv kernel: [40370.932125] [<ffffffff810fab66>] ? vfs_ioctl+0x21/0x6c Nov 29 17:24:09 kyleserv kernel: [40370.932127] [<ffffffff810fb0b4>] ? do_vfs_ioctl+0x48d/0x4cb Nov 29 17:24:09 kyleserv kernel: [40370.932130] [<ffffffff810740cc>] ? sys_futex+0x113/0x131 Nov 29 17:24:09 kyleserv kernel: [40370.932131] [<ffffffff810fb143>] ? sys_ioctl+0x51/0x70 Nov 29 17:24:09 kyleserv kernel: [40370.932134] [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b Nov 29 17:24:09 kyleserv kernel: [40370.932137] pubdev:0xffffffffa03f3d90, num of device:1 , name:fglrx, major 8, minor 88. Nov 29 17:24:09 kyleserv kernel: [40370.932139] device 0 : 0xffff88022d8e0000 . Nov 29 17:24:09 kyleserv kernel: [40370.932140] Asic ID:0x6898, revision:0x2, MMIOReg:0xffffc90011940000. Nov 29 17:24:09 kyleserv kernel: [40370.932142] FB phys addr: 0xd0000000, MC :0xf00000000, Total FB size :0x40000000. Nov 29 17:24:09 kyleserv kernel: [40370.932144] gart table MC:0xf0f8fd000, Physical:0xdf8fd000, size:0x402000. Nov 29 17:24:09 kyleserv kernel: [40370.932145] mc_node :FB, total 1 zones Nov 29 17:24:09 kyleserv kernel: [40370.932147] MC start:0xf00000000, Physical:0xd0000000, size:0xfd00000. Nov 29 17:24:09 kyleserv kernel: [40370.932149] Mapped heap -- Offset:0x0, size:0xf8fd000, reference count:18, mapping count:0, Nov 29 17:24:09 kyleserv kernel: [40370.932151] Mapped heap -- Offset:0x0, size:0x1000000, reference count:1, mapping count:0, Nov 29 17:24:09 kyleserv kernel: [40370.932152] Mapped heap -- Offset:0xf8fd000, size:0x403000, reference count:1, mapping count:0, Nov 29 17:24:09 kyleserv kernel: [40370.932154] mc_node :INV_FB, total 1 zones Nov 29 17:24:09 kyleserv kernel: [40370.932155] MC start:0xf0fd00000, Physical:0xdfd00000, size:0x30300000. Nov 29 17:24:09 kyleserv kernel: [40370.932157] Mapped heap -- Offset:0x302f4000, size:0xc000, reference count:1, mapping count:0, Nov 29 17:24:09 kyleserv kernel: [40370.932158] mc_node :GART_USWC, total 2 zones Nov 29 17:24:09 kyleserv kernel: [40370.932160] MC start:0x40100000, Physical:0x0, size:0x50000000. Nov 29 17:24:09 kyleserv kernel: [40370.932162] Mapped heap -- Offset:0x0, size:0x2000000, reference count:22, mapping count:0, Nov 29 17:24:09 kyleserv kernel: [40370.932163] mc_node :GART_CACHEABLE, total 3 zones Nov 29 17:24:09 kyleserv kernel: [40370.932164] MC start:0x10400000, Physical:0x0, size:0x2fd00000. Nov 29 17:24:09 kyleserv kernel: [40370.932166] Mapped heap -- Offset:0x1c00000, size:0x200000, reference count:1, mapping count:0, Nov 29 17:24:09 kyleserv kernel: [40370.932168] Mapped heap -- Offset:0x1a00000, size:0x200000, reference count:1, mapping count:0, Nov 29 17:24:09 kyleserv kernel: [40370.932170] Mapped heap -- Offset:0xb00000, size:0xf00000, reference count:5, mapping count:0, Nov 29 17:24:09 kyleserv kernel: [40370.932172] Mapped heap -- Offset:0x200000, size:0x900000, reference count:8, mapping count:0, Nov 29 17:24:09 kyleserv kernel: [40370.932174] Mapped heap -- Offset:0x0, size:0x200000, reference count:10, mapping count:0, Nov 29 17:24:09 kyleserv kernel: [40370.932176] Mapped heap -- Offset:0xef000, size:0x11000, reference count:1, mapping count:0, Nov 29 17:24:09 kyleserv kernel: [40370.932177] Mapped heap -- Offset:0x282000, size:0x281000, reference count:1, mapping count:0, Nov 29 17:24:09 kyleserv kernel: [40370.932180] GRBM : 0x3828, SRBM : 0x200000c0 . Nov 29 17:24:09 kyleserv kernel: [40370.932183] CP_RB_BASE : 0x401000, CP_RB_RPTR : 0x36a80 , CP_RB_WPTR :0x36a80. Nov 29 17:24:09 kyleserv kernel: [40370.932186] CP_IB1_BUFSZ:0x0, CP_IB1_BASE_HI:0x0, CP_IB1_BASE_LO:0x40489000. Nov 29 17:24:09 kyleserv kernel: [40370.932188] last submit IB buffer -- MC :0x40489000,phys:0x223c16000. Nov 29 17:24:09 kyleserv kernel: [40370.932189] Dump the trace queue. Nov 29 17:24:09 kyleserv kernel: [40370.932190] End of dump[/CODE] |
[QUOTE=KyleAskine;280537]The most recent hang:
[CODE]Nov 29 17:24:09 kyleserv kernel: [40370.931288] [fglrx] ASIC hang happened [/CODE][/QUOTE] This appears when a kernel locks the GPU for more than around one or two seconds. This should never happen with mfakto, especially on fast cards like yours. Try reducing the memory clock to the minimum (almost zero effect on mfakto anyway), and also reduce the GPU clock by 5 or 10% for a test. |
[QUOTE=Bdot;280582]This appears when a kernel locks the GPU for more than around one or two seconds. This should never happen with mfakto, especially on fast cards like yours.
Try reducing the memory clock to the minimum (almost zero effect on mfakto anyway), and also reduce the GPU clock by 5 or 10% for a test.[/QUOTE] Core clock is now 750 and mem clock is 900. |
| All times are UTC. The time now is 22:00. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.