![]() |
Little help
I have Ryzen 3900X with HT OFF, so I have 12 cores. , have two ccd , every ccd has two ccx and every ccx has 3 cores. I need to setup with affinity settings that -first worker - two cores -second worker - one wcore ---------------------------------------- first ccx full -third worker - two cores -fourth worker- one core -------------------------------------- second ccx full etc up to the end so should correct settings will be ( under linux, mprime) Worker #1] Affinity=0,1 [Worker #1] Affinity=2 etc etc |
Setting affinity should not be necessary
|
[QUOTE=Prime95;553822]Setting affinity should not be necessary[/QUOTE]
Yes, but please answer. I ask this since this CPU have 12 cores Can I use this scheme (under linux) AffinityScramble2=0123456789ABCDEFGHIJKLMNOPQRSTUV Worker #1] Affinity=0,1 [Worker #2] Affinity=2 Worker #3] Affinity=3,4 [Worker #4] Affinity=5 Worker #5] Affinity=6,7 [Worker #6] Affinity=8 Worker #7] Affinity=9,A [Worker #8] Affinity=B Is this looks correct? |
[QUOTE=pepi37;553921]Yes, but please answer.
I ask this since this CPU have 12 cores Can I use this scheme (under linux) AffinityScramble2=0123456789ABCDEFGHIJKLMNOPQRSTUV Worker #1] Affinity=0,1 [Worker #2] Affinity=2 Worker #3] Affinity=3,4 [Worker #4] Affinity=5 Worker #5] Affinity=6,7 [Worker #6] Affinity=8 Worker #7] Affinity=9,A [Worker #8] Affinity=B Is this looks correct?[/QUOTE] You might actually get better performance if you cut this back to four workers: [Worker #1] Affinity=0,(1,2) [Worker #2] Affinity=3,(4,5) [Worker #3] Affinity=6,(7,8) [Worker #4] Affinity=9,(A,B) 4 worker with 2 helpers each. :smile: |
[QUOTE=pepi37;553921]Yes, but please answer.
I ask this since this CPU have 12 cores Can I use this scheme (under linux) AffinityScramble2=0123456789ABCDEFGHIJKLMNOPQRSTUV Worker #1] Affinity=0,1 [Worker #2] Affinity=2 Worker #3] Affinity=3,4 [Worker #4] Affinity=5 Worker #5] Affinity=6,7 [Worker #6] Affinity=8 Worker #7] Affinity=9,A [Worker #8] Affinity=B Is this looks correct?[/QUOTE] AffinityScramble is deprecated. Your settings are OK except "9,A" should be "9,10" and B should be 11. |
[QUOTE=storm5510;553945]You might actually get better performance if you cut this back to four workers:
[Worker #1] Affinity=0,(1,2) [Worker #2] Affinity=3,(4,5) [Worker #3] Affinity=6,(7,8) [Worker #4] Affinity=9,(A,B) 4 worker with 2 helpers each. :smile:[/QUOTE] Since there is no rush doing CRUS sequence at home I am concentrated to best output, and 4 workers with 3 core each is not that in this case. But thanks for advice |
[QUOTE=pepi37;554180]Since there is no rush doing CRUS sequence at home I am concentrated to best output, and 4 workers with 3 core each is not that in this case. But thanks for advice[/QUOTE]
You are most welcome. George is correct. You should not have to use the "Affinity" settings. Closer to the top of [I]local.txt[/I], you [U]might[/U] have something like this: [QUOTE]WorkerThreads=x CoresPerTest=x [/QUOTE]Where [I]x[/I] is a number. I started using [I]Prime95[/I], the Windows equivalent of [I]mprime[/I], in 2005. To date, [I]Prime95[/I] extremely rarely uses more than 50% of of a CPU's capability. I have tried to force it in the past without success. You could set [I]WorkerThreads=12[/I] and [I]CoresPerTest=1[/I], but I doubt it would use this much. Something has to be left for the operating system and other background processes. What I have is an i7. Four physical cores and four logical cores. When I put this together, a long-time member here suggested I use one worker thread and four cores per test, the working being one of the four. This is 50% of the CPU's capacity. I have no "Affinity" settings. You can experiment with those two settings until you find what you feel does the best. If those two settings are not there, then you can add them. They are case-sensitive and must be written as I have them above. |
[QUOTE=storm5510;554194]To date, [I]Prime95[/I] extremely rarely uses more than 50% of of a CPU's capability. .[/QUOTE]
You are very mistaken. Prime95 uses more of a CPU's capability than nearly any other software, period. It is so efficient at using available capacity that using hyperthreads makes the software run slower. Even so, your belief that it's only using 50% of the CPU capability reflects a severe misunderstanding of what logical processors are- a misunderstanding you ought to remedy. To wit: If you have a 4-lane bridge with 8 lanes of highway that merge into the bridge, you claim the bridge is only half used when I close 4 highway lanes and leave 4 lanes open to flow smoothly onto the bridge. Is the bridge half-used? |
3 Attachment(s)
Prime95 & mprime and other GIMPS primality test codes are typically memory bandwidth limited. So say their authors. George has given examples of using MORE instructions to use LESS data memory transfers. TF is less demanding of memory bandwidth. I've found I can run Ernst's Mfactor program on most HT cores alongside, for exponents too large to factor with mfaktx, with modest impact on prime95's primality test throughput, ~15%. [URL]https://en.wikipedia.org/wiki/Hyper-threading[/URL] does not increase memory bandwidth, only certain parts of a core are duplicated. Intel indicated 15-30% performance increase, not 100%, from HT.
I've burned out two motherboards, with an i7-4790 running prime95 and its igp running mfakto at full tilt at the same time. Seems like rather full utilization to me, to be able to take the chip beyond the power rating the board was designed for and presumably would tolerate. The Windows Task Manager display of core utilization, for n core & HT showing us 2n core utilization graphs, can be misleading. Compare the attached Task Manager performance pane screen captures of a 4-core&HT i7-4790 and a dual-6-core-Xeon-x5650 (no HT), each running prime95 at its optimally benchmarked configuration;. The i7-4790 1 worker 4 cores no HT use by prime95 primality testing, 50% cpu utilization indicated for prime95, 13% for gpuowl, 63% total. The x5650s 2 workers 6 cores each chip package & the prime95 process indicates 98-99% cpu utilization. Also Core 2 Duo, no HT, prime95 process is 98% cpu utilization indicated. Prime95 automatically handles the helper thread core affinity for us. |
[QUOTE=storm5510;554194]Something has to be left for the operating system and other background processes.[/QUOTE]Prime95 runs at low priority and is preempted by the OS or assorted user processes. On a HT system, the virtual cores can come into play and lessen the impact. A prime95 worker is preempted as needed. This happens with gpuowl's GCD on a cpu core for example; one prime95 worker yields to it.
|
1 Attachment(s)
With everything above read, explain this (image attached). This is all I can get using multiple configurations. I only use HT on [U]recommended[/U] processes. Others, it seems to have no affect. This is on a P-1, if that makes any differences. So, what is it that I have done wrong all this time.
|
[QUOTE=storm5510;554260] So, what is it that I have done wrong all this time.[/QUOTE]
That looks perfect! It shows each physical core has a prime95 worker on it. |
They way I look at Windows CPU usage reporting is that it shows the thread usage time, which is not a linear relationship to core usage or throughput when HT is involved.
Consider a 4 core CPU with no HT. If you run at that 100%, all cores are used. Take the same CPU (same clocks) and turn on HT. If you run 4 core (using 4 threads), you get the same work done, but now Windows reports 50% usage. Now run 8 threads, you see 100% usage. How much real work is being done at 100% compared to 50% indicated? It depends a lot on the software. Take Cinebench R15 or R20 on modern processors as an example of something that scales relatively well. Running that with 8 threads compared to 4 would give you around 30% more throughput (score). And this is one of the better cases. Prime95, for practical purposes, is close to 0% because it is able to extract the most performance out of the cores without using more than one thread per core. Seeing around 50% usage in Windows is already using 100% of your cores. |
The most relevant benchmark is time. Take one candidate and run it in 1*4 2*2 4*1. Record time, then turn HT on and do same but no use 1*8 2*4 4*2 and 8*1 configuration, and record time. Compare time and that is most accurate benchmark with most useful data
|
[QUOTE=storm5510;554260]what is it that I have done wrong all this time.[/QUOTE]Take Task Manager's display too seriously; expect 4 cores to do the work of 8.
Even Intel with an ideal process mix expects no more than ~1.1 to 1.3 x, or 4.4 to 5.2 cores equivalent performance on your 4-core hardware, and that's the manufacturer's PR. What you show is 4 fully utilized cores and that's all you've got in the hardware, except for the HT duplication of registers etc. Which help with incidental other workloads while prime95 runs essentially full tilt on the 4 full cores actually present in hardware. |
[QUOTE=kriesel;554603]Take Task Manager's display too seriously; expect 4 cores to do the work of 8.
Even Intel with an ideal process mix expects no more than ~1.1 to 1.3 x, or 4.4 to 5.2 cores equivalent performance on your 4-core hardware, and that's the manufacturer's PR. What you show is 4 fully utilized cores and that's all you've got in the hardware, except for the HT duplication of registers etc. Which help with incidental other workloads while prime95 runs essentially full tilt on the 4 full cores actually present in hardware.[/QUOTE] I understand what you are saying. The utilization percentage is misleading. I allow [I]Prime95[/I] to pick what it wants to use. Sometimes, one or more will appear as logicals. The associated physicals are not used. In total, never more than 4. 1 from each physical/logical pair. |
[QUOTE=storm5510;554260]With everything above read, explain this (image attached). [/QUOTE]
That image shows a CPU which is used ONE HUNDRED percent. Not 50%. :rofl: |
[QUOTE=LaurV;554873]That image shows a CPU which is used ONE HUNDRED percent. Not 50%. :rofl:[/QUOTE]
With a little guidance, I figured this out. There are four pairs, not eight individuals. A logical may look unused, but its paired physical is at max, or close t it. The reverse is also true. No pair can exceed 100%. I have also seen them mixed with other applications. A physical may be at 60% and its logical at 40%. Again, no more than 100%. :smile: |
[QUOTE=storm5510;554929]No pair can exceed 100%. I have also seen them mixed with other applications. A physical may be at 60% and its logical at 40%. Again, no more than 100%. :smile:[/QUOTE]
That's not true either. Each "logical" core can go to 100% independent of the other, and a pair together can reach 200%, hihi. I can easily make my computer (and any computer) show 100% CPU usage just by launching few copies of the "usual" programs (word or excel running some macros, uvision running a compilation, or any multi-thread video editor or cad program I use daily, or just opening 100 threads of "isprime()" in pari/gp with some large number like 50 digits or so). In that case, all bars in task manager/performance, either logical or "illogical" cores (in my computer, I believe all cores are illogical, because they run very slow, and you can not logically reason with them! :razz:) will be glued to the ceiling, and never drop a pixel. Windoze [URL="https://en.wikipedia.org/wiki/List_of_military_slang_terms#FUBAR"]fubars[/URL] that task manager up completely, in the sense that, if HT is enabled in bios, it believes your CPU has 2n cores (where n is the number of physical cores). Therefore, if you run n threads of a task (like P95) with HT disabled, on a system with n physical cores, it will always show a 50% occupancy, because it sees only n cores running from 2n he assumes you have. But this doesn't mean you can not occupy it "100%" (please see the quotes!) if you want. To continue on the analogy made by other user before, you have a highway with 8 lanes, but there is a bridge on it with 4 lanes only. Imagine a police checkpoint on the bridge, but not all cars are checked, only the yellow ones. Most of the programs will run different colors of cars, red, and yellow and pink and green, orange, and purple and [URL="https://www.youtube.com/watch?v=_6lFvE-21JE"]blue[/URL]... So they still can keep the highway busy while not feeling much on the bottle neck on the bridge. When you run those programs in your computer, they will each push cars over that bridge from time to time, according with their priorities, when cars from different lanes try to "merge" in a single band on the bridge in the same time, the task with higher priority will push its car first. But there are always gaps between the cars on lanes, and gaps between the cars on the bridge too, because the programs do other things too, not only computing, they access peripherals, wait for you to type key or move the mouse, etc. These things don't push cars on the bridge (i.e. don't need CPU resources). That is why the 4 lanes on the bridge could, in theory, be shared by 8 (and more) lanes (tasks) on the highway, and yet, the traffic won't slow down. This is how HT was born, the CPU makers wanted to use those "dead times" spaces between cars, and give more work to the police checkpoint on the bridge, so they allow cars from more lanes to merge on the bridge, and split back each on its own lane, after. What windoze does (well, not exactly, but to keep the analogy), it has no idea about the bridge, and it only measures the traffic on each of the 8 lanes. With all colors of cars, the bridge won't slow the traffic, and Windoze will believe that the highway is busy enough. Now, P95 only runs yellow cars. They run at low priority, to make sure they don't slow the traffic, so any time they meet another car at the merging point, they will make space for that another car, but on the other hand, they will immediately fill all the spaces between other cars when there is no merging conflict. This way, all 4 bands on the bridge go at max speed, and the police checkpoint works full time, but yet, it will appears for windows that the traffic on the highway is only 50% of what the highway can support, in spite of the fact that the police on the bridge works at full capacity. On the other hand, running two copies of P95, or one copy with HT enabled, is like pushing yellow cars on all 8 lanes, the poor policemen are 100% full anyhow with 4 lanes, and you won't get faster output. Contrarily, things will slow down as the drivers of the yellow cars start yelling at each-other and compete for an earlier place in line at the checkpoints. Some will even get out of the cars and start fighting, like in Russian videos with "road rage" on youtube. This doesn't mean you can't cheat it, in the sense that you can only push green cars through the bridge, without letting any spaces between them, so the police can safely sleep and do no work, but the highway is full, and no yellow cars (lower priority) will ever pass the bridge. In that case you will see 100% CPU occupancy, but yet, the CPU doesn't really do any work, and P95 just waits. This is quite easy to do, and people used this trick to write "messages" on task manager's screen, or even play Doom (search youtube for "Max Holt 896"). |
[QUOTE=storm5510;554194]What I have is an i7. Four physical cores and four logical cores.[/QUOTE]Not really. You have four computing cores, fed by eight logical cores; two logical cores per one computing core.
Everything is physical; all eight logical core instruction feeders are physical, all four computing cores are physical. Logical is not the opposite of physical. [/rant] :smile: |
Okay, I am going to let this go and take everyone's word for what they have written. The last time I studied CPU architecture was in the Computer Dark Ages, the late 1980's. It is working properly and that is all I really need to know.
I thank you all for your time and efforts. :smile: |
in my prime.txt
[QUOTE]SilentVictoryPRP=1 OutputPrimes=1 OutputComposites=1[/QUOTE] Ok, there is no sound when PRP is found but also there no print of PRP in main thread? Can somehow PRP be printed in Main thread,so you dont need to search log file. |
When i scroll prime95 output it scrolls too much
When i scroll prime95 output it scrolls too much.
I have set scrolling settings to scrol 3 lines for one click. |
If more than 31 entry is present in worktodo, only 31 entry is displayed in status window.
1 Attachment(s)
[QUOTE=kotenok2000;566029]When i scroll prime95 output it scrolls too much.
I have set scrolling settings to scrol 3 lines for one click.[/QUOTE] If more than 31 entry is present in worktodo, only 31 entry is displayed in status window. |
[QUOTE=kotenok2000;566039]If more than 31 entry is present in worktodo, only 31 entry is displayed in status window.[/QUOTE]
I think that's intentional and meant for practical use. At the bottom there is "More...", meaning there is more, but the program didn't display it, because it was told to not make the window too big. The average user doesn't need to know the finish of task 32 or more. If you need that, you can always rearrange the tasks in worktodo.txt. |
[QUOTE=kotenok2000;566039]If more than 31 entry is present in worktodo, only 31 entry is displayed in status window.[/QUOTE]There can be far fewer than 31 entries if you have more than one worker configured, the window is constrained by a total number of lines.
If you're interested in estimated completion dates of large numbers of assignments, you can see it on your [url=https://www.mersenne.org/workload/]Workload[/url] page. Any bugreport/feature-requests should probably be checked against the current version of Prime95 and if still relevant posted in the [url=https://www.mersenneforum.org/showthread.php?t=25823]v30.3 thread[/url], this thread should probably be un-stickied since it's no longer the current version. |
[QUOTE=James Heinrich;566046]
Any bugreport/feature-requests should probably be checked against the current version of Prime95 and if still relevant posted in the [url=https://www.mersenneforum.org/showthread.php?t=25823]v30.3 thread[/url], this thread should probably be un-stickied since it's no longer the current version.[/QUOTE] The first part was done, the second part not. The version used is obviously 30.x, because of proof lines in the communication sub-window of the Prime95. |
This More... behavior is an issue, overused, since it shows up in place of the sole concealed line per worker. And yes, in V30.3b6. Why not display the line that will fit, instead of the uninformative More... line?
|
I remember a discussion long ago in the past where George said that the limit is not by the window space (which can easily be made scrollable) but by the computing time. If you have a billion assignments (like P-1 or TF, that we were doing with P95 at the time), the program wasted a lot of time to compute the ETAs for all the queued work, and it looked like it is frozen when the window appeared. Therefore, we (the participants at that talk at the time) somehow concluded that the most of the people only have one-two assignments, and the space/time should be limited to that. George changed it accordingly. If you take versions of P95 from 10 or 15 years ago, you understand what I mean by "freezing" when you open that window.
Processors at the time had two cores, and only "rich people", or those quite dedicated, had 4 cores, :razz:, and LL-ing a 35M exponent still took 30 to 60 days or so... Therefore most people only had 1-2 assignments to look at, and the TF was coming "automatically", if the exponent you got was not TFed enough, P95 would do the TF for you. Meantime, processors with a million cores appeared... Even so. Mostly, we are interested of the work queued in the near future. But if you have so many assignments, you still could just go to worktodo.txt and look inside... |
A way to circumvent this "problem" is to run different instances of Prime95... The status window will contain at least your first assignments, opening it will not stall the computer too much...
Jacob |
[QUOTE=LaurV;566130]
Mostly, we are interested of the work queued in the near future. But if you have so many assignments, you still could just go to worktodo.txt and look inside...[/QUOTE]The second worktodo line per worker is the near future, possibly days or hours away. But prime95 will hide its projected completion or identity, even if there are only two entries per worker, if there are enough workers, substituting More... lines for half the worktodo lines. Its computation of when More... is necessary or useful seems to be off by one. On a Xeon Phi, configured as prime95 does by default with 4 cores per worker, there will be 16, 17, or 18 workers depending on Phi model 7210, 7250, or 7290. The status output will look something like [CODE][Worker thread 1] M101xxxxxx, PRP, Mon Dec 14 07:45 2020 More... [Worker thread #2] M101xxxxxx, PRP, Mon Dec 14 22:11 2020 More... [Worker thread #3] M56xxxxxx, Double Check, Mon Dec 17 12:34 2020 More... (etc through worker thread 16, 17 or 18, hiding half the assignments behind "More...") [/CODE]Phis are no longer expensive systems; a used 7210 system can be had for US$499 & shipping & sales tax on eBay. The delays I've seen on status output relate to P-1 assignments. Prime95 could cache the run time or at least the bounds selection, and speed that up a lot. It apparently redoes the B1 and B2 optimization computations for each P-1 assignment, every time status is displayed for the work list whether the list or memory available changed or not. |
| All times are UTC. The time now is 20:42. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.