![]() |
QS postprocessing has not changed in literally years; the LA is much more efficient now but only for matrices much larger than QS would generate.
|
[QUOTE=bsquared;225717]It took a while to find one, but I now have a repeatable test case which locks up. I'll start looking into what's causing that.
[/QUOTE] The bug which was causing large SIQS jobs to hang occasionally should now be fixed in [URL="https://sites.google.com/site/bbuhrow/"]version 1.19.2[/URL]. The other issues mentioned were 1.) multi-threaded loses cores 2.) batch files don't resume as expected. I'm not sure what to do about 1.), but I'll keep looking into it. Maybe the thread pool architecture needs to be re-thought... As for 2.), I guess I need to be educated as to how people expect resuming work in a batchfile to behave. Do you want/expect it to pick up where it left off? Would you be ok with the program modifying the batchfile (deleting rows as they are completed, for example)? Right now, manual modification of the batchfile to remove lines which have already been completed is the correct thing to do. |
[quote=bsquared;225864]
As for 2.), I guess I need to be educated as to how people expect resuming work in a batchfile to behave. Do you want/expect it to pick up where it left off? Would you be ok with the program modifying the batchfile (deleting rows as they are completed, for example)? [/quote] I expect the client to pick up the work where it left off just by deleting the numbers (row) completed so far. |
[quote=bsquared;225864]As for 2.), I guess I need to be educated as to how people expect resuming work in a batchfile to behave. Do you want/expect it to pick up where it left off? Would you be ok with the program modifying the batchfile (deleting rows as they are completed, for example)? Right now, manual modification of the batchfile to remove lines which have already been completed is the correct thing to do.[/quote]I don't know how applicable the analogy may be, but the CWI LA "just works". That is, no-one needs to edit anything and the computation continues from the last checkpoint with no human intervention other than re-starting with the relevant checkpoint.
Paul |
[quote=bsquared;225864]The bug which was causing large SIQS jobs to hang occasionally should now be fixed in [URL="https://sites.google.com/site/bbuhrow/"]version 1.19.2[/URL].
[/quote] I'm sorry but could you upload yafu-1.19.2.zip with the 64-bit versions? |
[QUOTE=em99010pepe;226052]I'm sorry but could you upload yafu-1.19.2.zip with the 64-bit versions?[/QUOTE]
Sorry, I just got them from Brian Gladman today, in fact. I'm still having trouble getting my express edition 2010 on win7 to compile 64 bit code. Anyway, they should now be in the 1.19.2 zip file for download. |
Thank you. I'm running some Yafu tasks, I'll let you know more about the losing cores issue. We need to understand why it happens and fix it.
|
[QUOTE=em99010pepe;226269]Thank you. I'm running some Yafu tasks, I'll let you know more about the losing cores issue. We need to understand why it happens and fix it.[/QUOTE]
Agreed. Thanks, that will help. Here is some interesting data that is maybe related. On a machine running windows server 2008 and a nehalem based CPU (x5570), the scheduler seems to do a horrible job with yafu and performance really suffers. Looking at the task manager I see that every core is partially utilized, no matter how many threads I specify. On a machine running windows server 2008 and a core2 based CPU (xeon 5160), the scheduler seems to do a decent job with yafu and performance is fine. Looking at the task manager I see that every core is partially utilized, same as in the nehalem case. Is the problem that windows doesn't know the difference between a hyperthread and a physical core? Is linux smarter than this, or just lucky in the way it enumerates cores? |
As Brian stated earlier, we can fix the scheduling issue for nehalems by programmatically assigning an affinity mask for the thread. But nothing informs this decision, so what if we assign the thread to an already loaded core? It doesn't seem like a very graceful fix, but the only other fix seems to be inside the windows scheduler. I'm open to ideas here... can we detect the utlization of a core during runtime in order to inform the affinity mask (i.e. try to hack in a scheduler within yafu)? is there a way to detect in windows whether a core is a hyperthread or not?
em99010pepe: do you see "core loss" behavior if you disable hyperthreading in the BIOS? |
[quote=bsquared;226278]
em99010pepe: do you see "core loss" behavior if you disable hyperthreading in the BIOS?[/quote] My machines don't have hyperthreading. |
[QUOTE=em99010pepe;226279]My machines don't have hyperthreading.[/QUOTE]
oh, sorry. nevermind. |
| All times are UTC. The time now is 22:56. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.