mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Information & Answers (https://www.mersenneforum.org/forumdisplay.php?f=38)
-   -   Need some help please (https://www.mersenneforum.org/showthread.php?t=21646)

TrellisCross 2016-10-11 05:01

Need some help please
 
1 Attachment(s)
I need some help configuring the workers on a new machine build, a water-cooled dual Xeon E-5-2690 v4 motherboard (SuperMicro X10DAX) with 32 Gb of 2400MHz RAM for each CPU. I am running Windows 10 pro, and hyperthreading is turned off in the BIOS. I am willing to pay someone to get Prime95 properly configured to take advantage of both CPUs optimally for exponents in the 70-120M range.

I did a standard install of the latest Prime95 application and only CPU #1 (0) was recognized. I accepted the default suggestions and 4 workers were created. Via the Core Temp 1.3 application, I could see that all 14 cores were being utilized at 90-100% (no other apps were running). With all 4 workers doing a LL double-check of 42.8M exponents, the iteration times were about 15.5 ms/iter for each of the workers.

I installed a second instance of Prime95 and the second CPU (1) was detected and another 4 workers were set up. Again all 14 cores were being utilized. or the same LL double-checks of 42.8M exponents, the Initial times were about 6.5 ms/iter for each of the 4 workers, much better than CPU (0). However, after a restart the performance dropped off to match the ~15.5 ms/iter of the CPU(0) workers. To only have 4 workers performing at 15.5 ms/iter for a 42.8M exponent with 14 cores available does not seem right. The Prime95 performance of one of these E-5 2690 Xeons with 14 cores running at up to 3200MHz in turbo mode seems to be no better than another machine with a Core i7 4820 with 4 cores running at 3700MHz.

I did a benchmark of the instance running the workers on CPU(1), but cannot find any reference document that explains how to interpret these figures, which is very frustrating. Attached is the testing.txt file with the benchmarking data. Also, the explanation of how to reset affinity in undoc.txt file is extremely vague.

I did see the section in undoc.txt that suggests a core assignment approach, but i am reluctant to mess around much without understanding the benchmarking results:

"The program automatically computes the number of CPUs, hyperthreading, and speed.
This information is used to calculate how much work to get.
If the program did not correctly figure out your CPU information,
you can override the info in local.txt:
NumCPUs=n
CpuNumHyperthreads=1 or 2
CpuSpeed=s
Where n is the number of physical CPUs or cores, not logical CPUs created by
hyperthreading. Choose 1 for non-hyperthreaded and 2 for hyperthreaded. Finally,
s is the speed in MHz."

TrellisCross 2016-10-11 05:49

Update
 
I went ahead and put in three lines in the local.txt file of one Prime95 instance as follows (interestingly the other instance seemed to have used the same "local.txt" file even though the instances are in different folders; I do have prime95 set up to run as a service, so maybe that explains it):

NumCPUs=14
CpuNumHyperthreads=1 [not hyperthreaded]
CpuSpeed=2900 [2900 Mhz, which is the Intel boost speed, first tier]

After a reboot, still 4 workers per instance (per CPU) but the iteration speed has improved dramatically to about 6 ms/iter per each of four workers for one CPU and ~9 ms/iter for each of four workers on the other CPU, for LL doublechecks of 42.8M exponents. Is this the best I can hope for given the on-chip memory limitations described by others on this forum?

MattcAnderson 2016-10-11 07:15

Hi TrellisCross,

Let me be the first to welcome you to the mersenneforum. I hope you are able to fully optimize your machine without overheating or destroying it.

Currently, my custom PC is broken. I think it overheated with the summer heat.

Hopefully some other forum members can help with your specific question regarding milli-seconds per iteration.

Best of luck,
Matt

firejuggler 2016-10-11 10:11

look at the worker window( test tab) and CPU affinity , maybe?

kladner 2016-10-11 15:40

Welcome to the project, and to the forum.

Those are some impressive CPUs!

There has been a lot of discussion of core assignments. "Affinity scramble" is a key phrase. I run AMD, so I don't have a lot of comprehension of these advanced Intel issues.

The internal forum search is very limiting. Use an external search engine, ie: Google, with the phrasing, "site: mersenneforum.org [keywords]". (No quotes or brackets.) I came up with some example results, but be aware that the discussion in some regards Linux, not Windows.
[url]http://www.mersenneforum.org/showthread.php?t=16779&page=3[/url]
[url]http://www.mersenneforum.org/showthread.php?t=20920[/url]
[url]http://www.mersenneforum.org/showthread.php?t=21159&page=7[/url]

It will take some digging, and filtering of results to find what you need. Meanwhile, there are many people here who may respond with first-hand knowledge.
[QUOTE]I hope you are able to fully optimize your machine without overheating or destroying it.[/QUOTE]
Unless you are in a very hot place, or there are problems in the cooling system, I would not worry about overheating too much. Absolutely DO monitor temps!

Mark Rose 2016-10-11 15:48

Do note Affinity Scramble is likely going away in the next version. No idea when that will be released though.

TrellisCross 2016-10-12 00:53

Thanks
 
Thanks for all the responses. I appreciate the links. Frankly, this was a project to learn about water cooling and also get some experience with server-level motherboards. For those of you who might be interested in the cooling specs, the CPU core temperatures reported by Core Temp 1.3 (highly recommended because it tracks the temperature of every core independently) range from 42-52C at full Prime95 load of 28 cores, depending on the room temperature (60-80 F). From what I can gather from internet comments, this is quite good. Interestingly, the temperature of the distilled water coolant has not been above 35C (95F). This cooling loop has a 480mm radiator (4 fans) and another 240 mm radiator (2 fans) in the case. For anyone who wants to try some watercooling, particularly with the large factor motherboards, the Phanteks Primo case is really nice. I am hoping that I have all the fittings nice and tight to prevent a rather costly short-circuit "anomaly." So far so good.

Anyway, I will try to better understand the benchmarking outputs, and check out the links. I have always enjoyed looking around the forum, and really appreciate interacting with other people who love prime numbers and searching for them, even if there isn't a clear practical use yet.

kladner 2016-10-12 04:15

With those ambient temps, your CPU temps are outstanding. Such a rig is enviable, too. :smile:

TrellisCross 2016-10-17 19:15

The optimal configuration...
 
...may not be known. But here is what I believe is the best approach for folks like me with Xeon CPUs who don't understand the math well enough to customize the Prime95 client for specific tasks like factoring.

1. Set hyperthreading to "off" in the BIOS if you can, and it doesn't seem to affect your other applications. When I had hyperthreading on, the iteration times for the 4 workers per CPU doubled. I have seen an explanation for this somewhere on the forum, and if I remember, the two hyperthreads have to complete for the core memory registers, which slows down the iterations. Anyway, you may be able to insert the line "CpuNumHyperthreads=1" in the local.txt file in the Prime95 directory (use Notepad in Windows 10, found in the Windows Accessories folder in the application start menu) to inform Prime95 not to use hyperthreading, even when hyperthreading is on, not sure.

2. If Prime95 is not recognizing the proper number of cores in your CPU, insert a statement (again using Notepad) in your local.txt file as follows: NumCPUs=n, where n is the number of actual cores on your CPU (not the number of hyperthreaded cores, which is 2x the number of actual cores). NOTE: Prime95 will not necessarily set up a worker for each core on the CPU because the algorithm is evidently more efficient when multiple physical cores are assigned to one worker to use. In my case, 4 workers were created for a 14 core CPU.

3. If you have more than one CPU on your motherboard, evidently you often have to install a second "instance" of Prime95 to get both utilized. In my case, the first installation of Prime95 identified the first CPU but not the second, and I had to install a second instance of Prime95. One way to do this is to create a second folder named "Prime95B" or something similar, and download the zip file to that folder, then open the zip folder and proceed with installation. In my case, I changed the name of the executable from prime95.exe to prime95b.exe* to help me remember that "b" was the second instance when looking in the folder. Anyway, in my case, once I started the second executable, the second CPU was identified immediately (I had the first instance already running) and 4 more workers were created. I added the "NumCPUs=n" (in my case 14) to the local.txt file in each of the two folders. one problem I haven't solved is that only the original instance of Prime95 will start on boot by itself. I tried adding a script to start both of them at boot, but it doesn't seem to work. You have to pretend that your computer is actually two computers when submitting results because each instance of Prime95 assumes it is running on an independent machine. I just named my virtual computers A and B to track them in the results log online. [*someone else had this idea in the forum, but I can't find the thread now to give credit]

There is a certain mystery about the speed you get on one CPU versus another, even on the same motherboard. In my case, the 4 workers on the first CPU (logical #0, 14 cores) are running at about 9 milliseconds/iteration for a 48 M exponent, which is good. However, the 4 workers on the second, identical CPU are running at less than 6 milliseconds/ iteration for the same size exponent. I suspect that the first CPU is tasked with the operating system housekeeping, so it is slower. But 50% slower? Seems strange. The second, faster CPU does run 2-3 C hotter than the first CPU.

If I made any errors hopefully someone will set me straight. Maybe this will help someone else at least get started.


All times are UTC. The time now is 06:54.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.