mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfakto: an OpenCL program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=15646)

flashjh 2012-12-26 05:51

[QUOTE=Rodrigo;322654]Another question, about using mfakto on the GPU along with Prime95 on the CPU:

Is there any way to tell these two programs to use specific cores of the i7 3770? The reason is that yesterday I was using two CPU cores to do LLs while a third core was supporting mfakto, and everything was running smoothly. The time/class for mfakto was at 2.xxx seconds and one exponent was taking about 45 minutes to finish.

Then, I discovered a manually reserved LL that I had forgotten about for 174 days, so I decided to add it to Prime95 in a third worker window. But now, the time/class for mfakto is over [B]4.xxx[/B] seconds. (And yet, according to GPU-Z, the GPU load is at 1%. :unsure:) The per-iteration times for the original two Prime95 worker windows have gone up from 0.020 and 0.019 to 0.025 and 0.026, respectively. Evidently, mfakto and Prime95 are stepping on each other.

A further complication is that when I selected CPUs 2 and 4 for Prime95, according to Task Manager (Windows 7) there are 8 available threads (or whatever the right designation is for that) in the quad-core system, and it was the second and the fourth of these eight that were busy, so I am not sure if Prime95 was actually using the second and fourth [B]cores,[/B] or merely the second halves of the first two cores. (Have I put my question clearly enough?) Now with the third worker operating and mfakto doing it thing, I've ended up with [B]five[/B] of the eight threads running at or near 100%. I would have guessed three for each of the Prime95 workers, and one for mfakto (3 + 1 = 4, not 5).

FWIW, that third "emergency" LL is set to "Smart Assignment" CPU selection. When I had it selected to CPU 3, the per-iteration times on the other two shot up to 0.035 and 0.040. I ended up doing Smart Assignment, with ThreadsPerTest set at 2.

Bottom line: I would like to learn how to tell Prime95 to use (say) the first three physical cores (only), and mfakto to use the last core (only). And this while using just one thread, not two, per Prime95 worker.

Suggestions are very welcome...

Rodrigo[/QUOTE]
P95 has settings for the CPU cores in the 'Test' menu under 'Worker Windows...' option. As for mfakto, use the /affinity option from start in the cmd window like:
[CODE]start /affinity 0x# "mfakto2" mfakto-win-64 -d 0[/CODE]
Replace the # with the code for the core, in your case use 8:
[CODE]CPU3 CPU2 CPU1 CPU0 Bin Hex
================================
OFF OFF OFF ON = 0001 = 1
OFF OFF ON OFF = 0010 = 2
OFF OFF ON ON = 0011 = 3
OFF ON OFF OFF = 0100 = 4
OFF ON OFF ON = 0101 = 5
OFF ON ON OFF = 0110 = 6
OFF ON ON ON = 0111 = 7
ON OFF OFF OFF = 1000 = 8
ON OFF OFF ON = 1001 = 9
ON OFF ON OFF = 1010 = A
ON OFF ON ON = 1011 = B
ON ON OFF OFF = 1100 = C
ON ON OFF ON = 1101 = D
ON ON ON OFF = 1110 = E
ON ON ON ON = 1111 = F [/CODE]

Dubslow 2012-12-26 05:56

When you're running something else that's CPU intensive besides Prime95, it's generally best to assign cores manually. You can set the affinity of Prime95 through its GUI, as you know, and the easiest way to set the mfakto affinity is through the Task Manager. Right click on the process, and affinity or something like it should appear as one of the options on the menu. You would have to remember to do this every time you start mfakto. (If you restart it a lot or use multiple instances, it can be easier to use a batch file -- ask others, e.g. kladner, for assistance there.)

On Windows, threads 1 and 2 form one physical core, 3 and 4 do, etc. (On Linux, it's most likely that 1 and 5 are a pair, 2 and 6, etc)

Edit: ninja'd. flash explained how to set affinities from the command line/batch file.

Rodrigo 2012-12-26 18:37

Before trying the affinity settings for mfakto, I'm looking to set the best settings for Prime95.

Unfortunately, it looks like Prime95 isn't discovering the threads and cores correctly. When I set the three workers to operate off what Prime95 calls "CPUs" 1, 2, and 3, this is what I got after restarting (with ThreadsPerTest set to 2):

[QUOTE]
Worker #1
Setting affinity to run worker on logical CPU #1
Setting affinity to run helper thread 1 on [B]logical CPU #2[/B]

Worker #2
Setting affinity to run worker on [B]logical CPU #2[/B]
Setting affinity to run helper thread 1 on [U]logical CPU #3[/U]

Worker #3
Setting affinity to run worker on [U]logical CPU #3[/U]
Setting affinity to run helper thread 1 on logical CPU #4
[/QUOTE]
So the upshot is that Worker #2 is overlapping with Worker #1, and Worker #3 is overlapping Worker #2. Two of these threads are doing double duty. Meanwhile, according to Task Manager, the fifth through eighth threads (two full cores' worth) are sitting idle. For whatever reason, Prime95 seems to view (for example) the second thread of the second core, as the "fourth" CPU, and so the workers get assigned that way, incorrectly. It doesn't offer to do work on any "CPUs" 5 to 8.

I had tried doing this with ThreadsPerTest set to 1, but the LL per-iteration times were worse.

FWIW, this is Prime95 version 27.7, build 2.

Next I can try setting the Prime95 affinity in Task Manager as Dubslow suggested, but if Prime95 isn't finding the correct CPUs in the first place, I'm not sure what good it'll do. If I set its affinity to "CPUs" 0-5, I have no confidence that it will find and actually use 4 and 5, since they're not being used now.

This all might sound OT because we're in the mfakto thread, but again the idea is that I'm trying to get mfakto and Prime95 to each keep to their assigned cores...

Rodrigo

Dubslow 2012-12-26 19:53

You shouldn't assign one worker to core 1, and the next to core 2, since you yourself then are assigning two workers to one core (1&2 are one core, 3&4, are one core, etc.). Instead, assign worker 1 to CPU 1, assign worker 2 to CPU 3, assign worker 3 to CPU 5, and worker 4 to CPU 7. That way each worker is on a separate physical CPU. (When you add mfakto then, your timings will drop because mfakto will have to be on one of those cores.) How many instances of mfakto do you run?

Rodrigo 2012-12-26 20:44

Well, that's exactly the problem: in Prime95 there seems to be no way to assign worker 3 to CPU 5, since the choices (in Test --> Worker Windows) end at CPU 4.

Right now I'm not running any instances of mfakto, while this issue gets sorted out. Ideally, I'd like to run Prime95 on three cores (doesn't matter which three), and then use the last core to support mfakto.

BTW, since the previous post I set all three current workers to Smart Assignment, and the per-iteration times dropped precipitously to 0.019-0.022 seconds in all three cases. Now Task Manager shows that CPUs 0, 2, and 4 (that is to say, the first, third, and fifth threads) are taking on the bulk of the work, although the remaining five do have substantial loads as well (three of them as helper threads). But I have yet to find a way to set this setting manually, so that when mfakto gets thrown into the mix it doesn't interfere with Prime95.

Rodrigo

Dubslow 2012-12-26 20:51

[QUOTE=Rodrigo;322715]Well, that's exactly the problem: in Prime95 there seems to be no way to assign worker 3 to CPU 5, since the choices (in Test --> Worker Windows) end at CPU 4.[/QUOTE]

That is incredibly bizarre.

firejuggler 2012-12-26 20:53

if i remember there is an, option, with like coretouse or scrambleCPU where you specify wwich cpu to use, right?

Rodrigo 2012-12-26 20:56

How to report a factor?
 
An unrelated question:

This morning I manually submitted a bundle of 12 TF results completed by mfakto. One of them included a factor found, and that one doesn't seem to be getting through to the PrimeNet server.

When I submitted the results in a bunch, the server seemed to hang and eventually I got dumped to a mostly blank PrimeNet page that only had the sign-in blanks on the left, and nothing at all in the right panel. The same thing happened when I cut-and-pasted the two-line result:

[QUOTE]
M77xxxxxx has a factor: 19xxxxxxxxxxxxxxxxxxxx [TF:70:71*:mfakto 0.12-Win barrett15_75]
found 1 factor for M77xxxxxx from 2^70 to 2^71 (partially tested) [mfakto 0.12-Win barrett15_75_2]
[/QUOTE]
Next I tried submitting just the first of these two lines, and still PrimeNet doesn't seem to be aware of the report as it's not showing up in my Results Details page. (The other 11 are already there.)

What to do?

Rodrigo

Dubslow 2012-12-26 21:00

[QUOTE=Rodrigo;322719]An unrelated question:

This morning I manually submitted a bundle of 12 TF results completed by mfakto. One of them included a factor found, and that one doesn't seem to be getting through to the PrimeNet server.

When I submitted the results in a bunch, the server seemed to hang and eventually I got dumped to a mostly blank PrimeNet page that only had the sign-in blanks on the left, and nothing at all in the right panel. The same thing happened when I cut-and-pasted the two-line result:


Next I tried submitting just the first of these two lines, and still PrimeNet doesn't seem to be aware of the report as it's not showing up in my Results Details page. (The other 11 are already there.)

What to do?

Rodrigo[/QUOTE]

PrimeNet's [URL="http://www.mersenneforum.org/showthread.php?t=17607"]having an "episode"[/URL] right now.

chalsall 2012-12-26 21:02

[QUOTE=Rodrigo;322719]What to do?[/QUOTE]

This is a known bug with the Primenet server -- memory starvation affecting the tool used to verify the factor.

Wait for George or Scott to reboot the server, or keep trying. Preferably not at the top of the hour -- wait until at least 15 minutes past.

Rodrigo 2012-12-26 21:09

[QUOTE=firejuggler;322718]if i remember there is an, option, with like coretouse or scrambleCPU where you specify wwich cpu to use, right?[/QUOTE]
Ah, your clue led me to this in UNDOC.TXT:

[QUOTE]
The program makes its best guess at how the OS maps hyperthreaded logical CPU
numbers to physical CPUs. It also assigns workers and helper threads
to CPUs for optimal speed. However, bugs, new architectures, or situations we
haven't considered may make different affinity settings desirable. In
local.txt set
AffinityScramble2=string
Where the characters in "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz()"
represent 64 logical CPU numbers. For example, let's say you have a system
with 8 logical cores with 4 workers each using a helper thread. Also, assume
your system has logical CPUs 0 & 4 on the same physical CPU core, 1 & 5, etc.
If the program is properly determining which logical CPUs share the same physical
CPU, then the program internally generates an affinity scramble string of "04152637".
The program's default policy is to assign the worker and helper threads to the same
physical CPU. If the program is not properly determining which logical CPUs share the
same physical CPU, or you think a different affinity policy would result in better
performance, then set AffinityScramble2 accordingly. Let's say you think
running the helper threads on a different physical core would be better, then
you might set AffinityScramble2=02134657 to test out your theory.
[/QUOTE]So if Dubslow is right, then I should add an AffinityScramble line to LOCAL.TXT (does it matter where?), and make it 01234567. What do you think?

Rodrigo


All times are UTC. The time now is 23:05.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.