mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Cloud Computing (https://www.mersenneforum.org/forumdisplay.php?f=134)
-   -   Google Diet Colab Notebook (https://www.mersenneforum.org/showthread.php?t=24646)

chalsall 2019-10-12 13:49

[QUOTE=xx005fs;527794]However, I need to figure out a way to store the checkpoints and worktodo files, and I have yet to find a solution.[/QUOTE]

This is the fundamental issue anyone implementing a Notebook will be facing.

A critical question is how large are the checkpoint files? I was lucky with mfaktc in that the CPs are tiny. The other work types' CPs are quite a bit larger -- on the order of MBs.

It then comes down to an economic curve crossing analysis -- how much work are you willing to lose vs. what is the bandwidth available (and the cost of same).

A couple of things you might look into is setting up an NFS connection between your instance(s) and a "public-facing" server you have control of. Or, you could simply "scp" the files out every half hour or so (again, to a public-facing server you control).

And while it doesn't work on Kaggle, for Colab you can always attach your Google Drive for persistent storage. Be sure to design your directory structure such that multiple instances can work at the same time, without stepping on each other's toes.

matzetoni 2019-10-12 14:00

I tried to get LLR running on Kaggle, it runs but I have some trouble getting the actual results output.



I thought it would be nice to run PRPNet NPLB on those 10 CPUs at Kaggle. But I'm a super newbie in those script things, has someone already tried something like this?

kriesel 2019-10-12 14:18

[QUOTE=chalsall;527809]A critical question is how large are the checkpoint files? I was lucky with mfaktc in that the CPs are tiny. The other work types' CPs are quite a bit larger -- on the order of MBs.[/QUOTE]
TF checkpoints ~45. bytes regardless of exponent

P-1 and primality test are of order 10[SUP]5[/SUP] to 10[SUP]6 [/SUP]+ times larger and increase significantly with exponent (or fft length):

P-1 (CUDAPm1 v0.20) ~0.125*p + 102 bytes, ~11MB for 89M, 28MB for 230M
LL (CUDALucas v2.06) ~0.122*p bytes; 10MB at 82.6M, 49MB at 402M
PRP (gpuowl) ~0.122*p bytes; 10MB at 84M, 41MB at 335M, 131MB at 1073M

xx005fs 2019-10-12 18:12

RIP
 
Well, I got blocked on kaggle (maybe for having to install another version of nvidia driver on it to make OpenCL work), and I guess I can't suggest others running such code on the kaggle VMs.

chalsall 2019-10-12 19:19

[QUOTE=xx005fs;527838]Well, I got blocked on kaggle (maybe for having to install another version of nvidia driver on it to make OpenCL work), and I guess I can't suggest others running such code on the kaggle VMs.[/QUOTE]

When you say blocked, what exactly do you mean?

Were you told you had violated their terms of service? Or did something not work well on the instance you were in, and it stopped working?

If the latter, you should just be able to go to the "Run" menu and select "Restart Session". This will then give you a brand new instance to work with.

pepi37 2019-10-12 19:26

Ok I got it compiled, and run: I add my sieve file , but I cannot sieve file with genefer app?


Command line: ./geneferocl_linux64 -x ocl2 -llr 131072.txt Normal priority change succeeded. Cannot open '131072.txt' Fatal error (3). Genefer is terminating.

xx005fs 2019-10-12 19:33

[QUOTE=chalsall;527846]When you say blocked, what exactly do you mean?

Were you told you had violated their terms of service? Or did something not work well on the instance you were in, and it stopped working?

If the latter, you should just be able to go to the "Run" menu and select "Restart Session". This will then give you a brand new instance to work with.[/QUOTE]

When i log in, a blank page pops up and says "Your account has been blocked. Please contact support if you believe this is unjustified." and my account has completely being deactivated. I have contacted their support to hopefully get my account unbanned since I wasn't told that I violated their terms of service or anything.

Dylan14 2019-10-12 19:36

[QUOTE=pepi37;527847]Ok I got it compiled, and run: I add my sieve file , but I cannot sieve file with genefer app?


Command line: ./geneferocl_linux64 -x ocl2 -llr 131072.txt Normal priority change succeeded. Cannot open '131072.txt' Fatal error (3). Genefer is terminating.[/QUOTE]


Can you show me what the sieve file looks like? As Genefer has a certain format that the lines need to be in.

pepi37 2019-10-12 20:04

ABC $a^$b+1 // 3238000000000000000
299982944 131072
299982946 131072
299982958 131072
299982966 131072
299982968 131072




Problem is that I cannot "link" 131072.txt file ( sieve is inside it) with genefer app : path is problem
Same 131072.txt at my home, at my linux box works without any problem

Dylan14 2019-10-12 20:21

[QUOTE=pepi37;527854]ABC $a^$b+1 // 3238000000000000000
299982944 131072
299982946 131072
299982958 131072
299982966 131072
299982968 131072




Problem is that I cannot "link" 131072.txt file ( sieve is inside it) with genefer app : path is problem
Same 131072.txt at my home, at my linux box works without any problem[/QUOTE]


I got it to work: just remove the -llr flag in your command line, like this
[CODE]!./geneferocl_linux64 -d 0 -x ocl2 131072.txt[/CODE]and then it works:


[CODE]Testing 299982944^131072+1...

Using OCL2 transform

Running on platform 'NVIDIA CUDA', device 'Tesla P100-PCIE-16GB', vendor 'NVIDIA Corporation', version 'OpenCL 1.2 CUDA' and driver '418.67'.

56 computeUnits @ 1328MHz, memSize=16280MB, cacheSize=896kB, cacheLineSize=128B, localMemSize=48kB, maxWorkGroupSize=1024.

Starting initialization...

Initialization complete (0.199 seconds).

Estimated time for 299982944^131072+1 is 0:05:45

^Csting 299982944^131072+1... 3092596 steps to go (0:04:49 remaining)

^C caught. Writing checkpoint.[/CODE]
Ignore the ^C, I just was seeing if the change would work. And it did.

pepi37 2019-10-12 20:27

great, but I cannot still link 131072.txt, to genefer app
where to upload( how to link) it

/kaggle/working/GFN Checked out revision 1388. /kaggle/working/GFN/linux geneferocl 3.3.4 (Linux/OpenCL/64-bit) Copyright 2001-2018, Yves Gallot Copyright 2009, Mark Rodenkirch, David Underbakke Copyright 2010-2012, Shoichiro Yamada, Ken Brazier Copyright 2011-2014, Michael Goetz, Ronald Schneider Copyright 2011-2018, Iain Bethune Genefer is free source code, under the MIT license. Command line: ./geneferocl_linux64 -d 0 -x ocl2 131072.txt Normal priority change succeeded. Cannot open '131072.txt' Fatal error (3). Genefer is terminating.


All times are UTC. The time now is 22:43.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.