mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Cunningham Tables (https://www.mersenneforum.org/forumdisplay.php?f=51)
-   -   Planning & Coordination for 2,2330L (https://www.mersenneforum.org/showthread.php?t=24292)

SethTro 2019-06-11 09:55

10%
 
Also we've 10% done with sieving!

ETA has finally stabilized at late Aug which is ~76 days out (assuming vebis stays on, otherwise ~150 days)

lukerichards 2019-06-11 11:20

Can someone with access to the raw data check what's happening with lukerichards-pre1 please?

Been stuck on 9.1% for a while with no updates since 2019-06-10 14:32:13 despite the fact the instance is running and 'top' reads ~800% CPU usage, with ~hourly network traffic spikes.

As CADO is running as a cron job, I can't check the shell output, but will check the log files later.
EDIT: Having said that, I can't see anything in the cado folder which is obviously a local log file... does such a thing exist?

Thomas11 2019-06-11 12:18

[QUOTE=lukerichards;519103]EDIT: Having said that, I can't see anything in the cado folder which is obviously a local log file... does such a thing exist?[/QUOTE]

I'm not aware of any local log file.
But you may have a look into your work directory: [B]lukerichards-pre1.xxxxxxx.work[/B].
It your client is still running properly, then the output file [B]2330L.c207.xxxxx000-xxxxx000.gz[/B] should be updated every few minutes...

lukerichards 2019-06-11 15:02

[QUOTE=Thomas11;519106]I'm not aware of any local log file.
But you may have a look into your work directory: [B]lukerichards-pre1.xxxxxxx.work[/B].
It your client is still running properly, then the output file [B]2330L.c207.xxxxx000-xxxxx000.gz[/B] should be updated every few minutes...[/QUOTE]

[code]{username}@lukerichards-pre1:~/cado$ ls -lt -d *.work
drwxr-xr-x 2 {username} {username} 4096 Jun 11 14:49 localhost.1301f44e.work
drwxr-xr-x 2 {username} {username} 4096 Jun 11 02:06 lukerichards-pre1.682f18fd.work
drwxr-xr-x 2 {username} {username} 4096 Jun 11 01:49 localhost.fdd311f5.work
drwxr-xr-x 2 {username} {username} 4096 Jun 10 21:32 lukerichards-pre1.c858abc2.work
drwxr-xr-x 2 {username} {username} 4096 Jun 10 20:26 localhost.618e02b.work
drwxr-xr-x 2 {username} {username} 4096 Jun 10 17:03 lukerichards-pre1.8bcb75f3.work
drwxr-xr-x 2 {username} {username} 4096 Jun 10 15:29 lukerichards-pre1.500b33f1.work
drwxr-xr-x 2 {username} {username} 4096 Jun 10 13:53 localhost.dcd0720d.work
drwxr-xr-x 2 {username} {username} 4096 Jun 10 11:15 lukerichards-pre1.ba003e6f.work
drwxr-xr-x 2 {username} {username} 4096 Jun 10 10:01 lukerichards-pre1.e4edb53b.work
drwxr-xr-x 2 {username} {username} 4096 Jun 10 08:11 localhost.857e3c54.work
drwxr-xr-x 2 {username} {username} 4096 Jun 7 15:51 lukerichards-pre1.b4526b32.work
drwxr-xr-x 2 {username} {username} 4096 Jun 6 15:34 lukerichards-pre1.23245b81.work
[/code]

Interesting that the instance seems to be started dumping the workfiles into a localhost folder, not a 'lukerichards-pre1' folder.

I literally have not changed a thing since it was working - this change did not started when I began running it from boot in crontab, it started after this. So I've no idea what's gone wrong! Is there any way to check if my work units are being successfully reported, albeit under localhost? I note there are some 'localhost' entries on the list, with ~7 CPU days of work, which is about right considering it has been about a day of 8 CPUs...

Thomas11 2019-06-11 15:20

[QUOTE=lukerichards;519115]
Interesting that the instance seems to be started dumping the workfiles into a localhost folder, not a 'lukerichards-pre1' folder.

I literally have not changed a thing since it was working - this change did not started when I began running it from boot in crontab, it started after this. So I've no idea what's gone wrong![/QUOTE]

Don't know what's going wrong there.
But you could force it to use your intended ID by adding the following to the command line:
[CODE]--clientid=lukerichards-pre1[/CODE]
or
[CODE]--clientid=lukerichards-pre1.1
--clientid=lukerichards-pre1.2
--clientid=lukerichards-pre1.3
--clientid=lukerichards-pre1.4
...
[/CODE]
if you're running multiple instances on the same machine. The trailing number just replaces the "random" number generated by cado-nfs.

VBCurtis 2019-06-11 17:12

Localhost has been submitting workunits for a few days now; I can get the IP address PMed to you to confirm it's you.

lukerichards 2019-06-11 18:47

[QUOTE=VBCurtis;519123]Localhost has been submitting workunits for a few days now; I can get the IP address PMed to you to confirm it's you.[/QUOTE]

Yes please.

SethTro 2019-06-12 00:31

I'm hungry to add more badges, can anyone explain (or point me at resources) to understand what Average J and special-q represent?

[CODE]
Info:Lattice Sieving: Total number of relations: 19266332
Info:Lattice Sieving: Average J: 7782.82 for 77739 special-q, max bucket fill: 0.628863[/CODE]

VBCurtis 2019-06-12 04:25

special-q are the actual prime values we sieve over. When you run a workunit of Q from, say, 30200000 to 30202000, you're running the sieve program las over each prime (I think? Are some primes skipped for algorithmic reasons?) within that range.

Quite a lot of the variation in relations found is related to some ranges having more primes than others. So, while we speak in terms of yield of relations per Q-range, it's actually more accurate to speak in terms of relations per special q sieved. Both kinds of yield decrease as Q rises: there are fewer primes in each range, and fewer relations found per prime. If you divide total relations found by total Q searched, our yield is around 9.0 so far. My test-sieving indicates the yield at Q=100M will be in the low 7's, a 20% drop. By 200M, it's 6.0 and sec/rel is around double its current value.

I don't know what J means in CADO context.

SethTro 2019-06-12 05:16

[QUOTE=VBCurtis;519169]special-q are the actual prime values we sieve over. When you run a workunit of Q from, say, 30200000 to 30202000, you're running the sieve program las over each prime (I think? Are some primes skipped for algorithmic reasons?) within that range.

Quite a lot of the variation in relations found is related to some ranges having more primes than others. So, while we speak in terms of yield of relations per Q-range, it's actually more accurate to speak in terms of relations per special q sieved. Both kinds of yield decrease as Q rises: there are fewer primes in each range, and fewer relations found per prime. If you divide total relations found by total Q searched, our yield is around 9.0 so far. My test-sieving indicates the yield at Q=100M will be in the low 7's, a 20% drop. By 200M, it's 6.0 and sec/rel is around double its current value.

I don't know what J means in CADO context.[/QUOTE]

Thanks for the detailed description. It's informative and solidifies my understanding.

Is there anything else you'd like for monitoring? maybe time since last workunit?

lukerichards 2019-06-13 15:49

[code]
host # of workunits Relations CPU-days Last workunit submitted
instance-1 970 9236551 (2.8% total) 45.6 2019-06-05 23:27:08,078
lukerichards-pre1 175 3180220 (0.9% total) 21.9 2019-06-13 08:18:30,440
[/code]

Both machines are my VM instances running on Google Cloud. I'm fairly sure both have the same architecture (although they could be different).

One has managed 48% of the CPU-days of the other one but found only 34% of the relations. Is this within expected bounds of variance?

I'm assuming the huge range in # of work units is because of the change in work unit size?


All times are UTC. The time now is 21:49.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.