mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Cunningham Tables (https://www.mersenneforum.org/forumdisplay.php?f=51)
-   -   Planning & Coordination for 2,2330L (https://www.mersenneforum.org/showthread.php?t=24292)

R.D. Silverman 2019-07-15 22:09

[QUOTE=swellman;521536]Yoyo kindly killed the rest of the CN queue. I’ll update it tomorrow night with 2,2210M as the focus.[/QUOTE]

It seems that the CN and HC queues have not progressed at all in days.

swellman 2019-07-15 22:49

The server controls the inflow of numbers. Sometimes it can be like watching water boil.

SethTro 2019-07-16 06:30

I don't see many results on taskset with respect to CADO-NFS ([URL="https://www.mersenneforum.org/showthread.php?p=519874#post519874"]other than fivemack's[/URL]) so I'm going to try and duplicate fivemack's testing with a couple of extra dimensions.

my plan is to test for about 6 WU with each of these configs (I have 2x 2650v2 which is 16 cores, 32 threads with 64 gigs of ram which supports 5 WU at a time but not quite 6)
Right now I'm confused how fivemack tested because taskset get's reset after each WU with cado-nfs-client.py

Looking at
[CODE]cat /proc/cpuinfo | grep -v 'flags\|bugs\|bogomips\|wp\|fpu\|family\|model\|MHz\|apicid\|cpuid\|cache_align\|address\|clflush\|vendor_id\|stepping\|microcode'
# and
lstopo[/CODE]

[CODE]

CPU 0-7 are hyperthreaded with CPU 16-23
and
CPU 8-15 are hyperthreaded with CPU 24-31[/CODE]

[CODE]
1x 4threads, no CPU affinity
1x 4threads, CPU 0-4
---
1x 8threads, CPU 0-7
---
1x 16threads, CPU 0-15
1x 16threads, CPU 0-7,16-23
---
4x 4threads, no CPU affinity
4x 4threads, CPU 0-3,4-7,16-23 (16 threads on 8 cores)
4x 4threads, CPU 0-3,4-7,8-11,12-15 (16 threads on 16 cores)
---
4x 8threads, CPU 0-7,8-15,16-23,24-31
---
2x 16threads, CPU 0-15,16-31 (both on all cores)
2x 16threads, CPU 0-7+16-23,8-15+24-31 (one job on one CPU, one job on the other CPU)
[/CODE]

Results will be updated in place over the next couple of days.

fivemack 2019-07-16 07:14

Tasksets are one of the things a process inherits from the process that started it, so if you do

taskset -c 0-3 python cado-nfs-client.py {parameters}

All the las jobs that it starts will use that taskset

SethTro 2019-07-16 07:49

[QUOTE=fivemack;521716]Tasksets are one of the things a process inherits from the process that started it, so if you do

taskset -c 0-3 python cado-nfs-client.py {parameters}

All the las jobs that it starts will use that taskset[/QUOTE]

Hum this didn't seem to work before, maybe I wasn't setting -c correctly. Thanks for the answer

[CODE]
time taskset -c 0-4 python3 -c "import gmpy2; import multiprocessing as mp; p = mp.Pool(4); print (sum(p.map(gmpy2.is_prime, range(10**7))))"
664579
[/CODE]

SethTro 2019-07-16 08:32

2 Attachment(s)
[QUOTE=fivemack;521716]Tasksets are one of the things a process inherits from the process that started it, so if you do

taskset -c 0-3 python cado-nfs-client.py {parameters}

All the las jobs that it starts will use that taskset[/QUOTE]

Actually I tried again on my main server and it didn't work again.

[CODE]
taskset -c 0-7 python cado-nfs-client.py --bindir=build/seven --server="<SERVER>"
[/CODE]
When I look at the process tree the first child inherits cpu affinity but build/seven/sieve/las was reset to all cpu affinity

[CODE]
htop
3265 | 'python cado-nfs-client...'
3266 |-- /bin/sh -c 'build/seven/sieve/las' ...
3267 |-- build/seven/sieve/las -I 16 ...

pid 3265's current affinity mask: ff
pid 3266's current affinity mask: ff
pid 3267's current affinity mask: ffffffff
[/CODE]

A couple of places in the code (las-parallel,cpp, bind_threads.sh, cpubinding.cpp) all might affect this. I'll have to dig into that :/

xilman 2019-07-16 09:21

[QUOTE=R.D. Silverman;521685]It seems that the CN and HC queues have not progressed at all in days.[/QUOTE]Happens all the time to everyone. Sometimes the GCW queue doesn't move for a week or more and then there is a flurry of activity.

No, I don't know why.

R.D. Silverman 2019-07-16 09:36

[QUOTE=swellman;521687]The server controls the inflow of numbers. Sometimes it can be like watching water boil.[/QUOTE]

These are numbers already running. 2,2210M has been stuck at 8960 curves for
days; It is not progressing at all.

xilman 2019-07-16 10:56

[QUOTE=R.D. Silverman;521731]These are numbers already running. 2,2210M has been stuck at 8960 curves for days; It is not progressing at all.[/QUOTE]

My guess: the server only allows for a limited number of outstanding tasks for any one number (or queue?) and won't allocate more until the pipeline drains. If this is the case, there must be a expiry limit on any allocated task beyond which the machine to which has been allocated is declared MIA and its allocation given to someone else. What we've been seeing is a blockage in the pipeline and the server hasn't yet started using [URL="https://en.wikipedia.org/wiki/Dyno-Rod"]Dyno-Rod[/URL].

R.D. Silverman 2019-07-16 11:10

[QUOTE=xilman;521736]My guess: the server only allows for a limited number of outstanding tasks for any one number (or queue?) and won't allocate more until the pipeline drains. If this is the case, there must be a expiry limit on any allocated task beyond which the machine to which has been allocated is declared MIA and its allocation given to someone else. What we've been seeing is a blockage in the pipeline and the server hasn't yet started using [URL="https://en.wikipedia.org/wiki/Dyno-Rod"]Dyno-Rod[/URL].[/QUOTE]

Certainly Plausible. But not the best allocation scheme IMO.

yoyo 2019-07-16 12:40

The workunit processing is not fully under control by the server. There are also volunteers in the game. For CN, HC, GCW 5 curves are put into one boinc workunit and sent to a volunteer with a deadline of 5 days. If the volunteer doesn't return any result it takes 5 days until the server recognise it. After 5 days this workunit is sent to someone else which also might not return. So some curves just need time.
Those resent also happens if a workunit returns with an error. In this case it is also resent to someone else.
Such resents doesn't lead to any progress.
In the meantime the server sends out other ecm workunits from it's "unsent tasks" list. It sends always the oldest ones first.

If the "unsent tasks" list drains below 1000 it generates new tasks from one of the ecm input queues. It chooses the input queue from the project which got least computing power in the last 5 days.


All times are UTC. The time now is 21:49.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.