mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Cloud Computing (https://www.mersenneforum.org/forumdisplay.php?f=134)
-   -   Google Diet Colab Notebook (https://www.mersenneforum.org/showthread.php?t=24646)

bayanne 2019-11-19 07:31

Great, thanks for that :)

LaurV 2019-11-19 08:28

[QUOTE=kracker;530931]Got a T4, decided to try gpuowl on it... expected results, others as reference.
[code]
gpuowl(PRP) 92M exponent - 5M FFT

Tesla K80:
4.68 ms/iter - 66.8 GHz/days
430 GHz/days (mfaktc)

Tesla T4:
5.96 ms/iter - 52.4 GHz/days
~1700 GHz/days (mfaktc)

Tesla P100:
1.17 ms/iter - 266 GHz/days
~1100 GHz/days (mfaktc)
[/code][/QUOTE]

Nice! Thanks for that! As stated before, both K80 and P100 are beasts for LL/PRP. Using them for TF is a waste. If you are lucky enough to get a T4, use it for TF (in this side of the world, we didn't see one for ages!)

Did you try two instances on K80? (if that is a dual-gpu card, the two chips may not communicate so fast with each-other, and "in the cloud" may be different from "locally").

ATH 2019-11-19 10:32

CUDALucas also needs cufft installed:

!apt-get install -y cuda-cudart-10-0
!apt-get install -y cuda-cufft-dev-10-0

kracker 2019-11-19 12:31

[QUOTE=LaurV;530968]Nice! Thanks for that! As stated before, both K80 and P100 are beasts for LL/PRP. Using them for TF is a waste. If you are lucky enough to get a T4, use it for TF (in this side of the world, we didn't see one for ages!)

Did you try two instances on K80? (if that is a dual-gpu card, the two chips may not communicate so fast with each-other, and "in the cloud" may be different from "locally").[/QUOTE]

I only have access to one K80 gpu "core" in an instance(so half of the physical card), though if you mean trying two instances of gpuowl on one gpu, I haven't tried that.

bayanne 2019-11-19 12:56

I am finding now that exponents that have been trial factored are not being cleared from the 'worktodo' file, and are being repeat tested. I really am not sure what I can do to stop this. Anyone else finding anything similar happening?

bayanne 2019-11-19 13:44

[QUOTE=bayanne;530000]An exponent that had been allocated to me 97930517 has been completed by someone else as well, and their result has been accepted. No problem to me, except that this result is not been cleared from the results.txt file, wherever that may be held. Thus it keeps appearing in the results for my instance name.

Where is that file, and can I clear this entry from it?[/QUOTE]

There are now 6 exponents that are stuck in the 'results' file.

How can I clear them out pleae?

chalsall 2019-11-19 14:37

[QUOTE=bayanne;530985]There are now 6 exponents that are stuck in the 'results' file. How can I clear them out pleae?[/QUOTE]

Could you please PM me an example?

ric 2019-11-19 15:19

[QUOTE=ATH;530974]CUDALucas also needs cufft installed:

!apt-get install -y cuda-cudart-10-0
!apt-get install -y cuda-cufft-dev-10-0[/QUOTE]

... and the same holds true for CUDAPm1: after adding these two lines to the notebook, everything is fine.

kracker 2019-11-19 15:57

[QUOTE=ric;530992]... and the same holds true for CUDAPm1: after adding these two lines to the notebook, everything is fine.[/QUOTE]

That fixed it for me as well.

[STRIKE]Also, I know I sound like an idiot, but how exactly do I use -cufftbench? Cudapm1 seems to be ignoring it...[/STRIKE] Nevermind, figured it out. :whistle:

ATH 2019-11-19 22:53

I did get some error messages that file or folder was locked when trying to install cudart and cufft in the main script, but you do not have to run the installations separately, adding 2 delays worked for me:

(reverse ssh code)
.
!sleep 30
!apt-get install -y cuda-cudart-10-0
!sleep 5
!apt-get install -y cuda-cufft-dev-10-0
.
(starting mprime+cudalucas)



[QUOTE=kriesel;530915]Please PM me a session capture of the P100 problem. "buffer overflow detected" is not present in the CUDALucas bug and wish list.
I guess you can try using CUDALucas on K80 and gpuowl on P100 for now.
I've taken to using the following at the very front of Colab scripts, so I can decide whether to go with what the session got, or try again.
!lscpu
!nvidia-smi

I have seen CUDALucas run into problems when run locally, if the span of cufftbench or -threadbench is too large; too many fft lengths for the size of the program's arrays. Threadbench can be run in multiple subranges to avoid that issue.[/QUOTE]

It is an error from Linux not from CUDALucas, it never writes anything to my outputcudaluas.txt file:

[CODE]*** buffer overflow detected ***: ./CUDALucas terminated
/bin/bash: line 1: 2509 Aborted (core dumped) ./CUDALucas >> outputcudalucas.txt[/CODE]

kriesel 2019-11-20 20:57

[QUOTE=ATH;531020]It is an error from Linux not from CUDALucas, it never writes anything to my outputcudaluas.txt file:

[CODE]*** buffer overflow detected ***: ./CUDALucas terminated
/bin/bash: line 1: 2509 Aborted (core dumped) ./CUDALucas >> outputcudalucas.txt[/CODE][/QUOTE]If you don't specify a path for the output text file, where does it go? I've had scripts fail unless I explicitly use ./whatever


All times are UTC. The time now is 23:04.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.