mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Cloud Computing (https://www.mersenneforum.org/forumdisplay.php?f=134)
-   -   New Google Colab Notebooks For Primality Testing (https://www.mersenneforum.org/showthread.php?t=26522)

danc2 2021-02-21 03:53

New Google Colab Notebooks For Primality Testing
 
[B]Overview:[/B]
Hello fellow Crunchers,
Cruncher Teal Dulcet (@tdulcet) and I (@danc2) have been working on a project to expedite the computation of prime numbers. This project offers users the ability to run GIMPS programs on Google Colab for primality testing. This offers a cost effective way to utilize GPUs and CPUs to contribute to GIMPS without forking any money out for expensive hardware. Users may also increase their contribution by upgrading to Google Colab Pro, which offers a higher usage limit for GPU and CPU utilization than that of the 12 hours in the free version. This project is facilitated by the following noteworthy additions to the GIMPS project:

[B]1. [URL="https://github.com/tdulcet/Distributed-Computing-Scripts/tree/master/google-colab"]GPU-And-CPU-Powered Colab Jupyter Notebooks[/URL][/B]
Google Colab offers two [B][U]free[/U][/B] Jupyter notebooks for each Google Account, which can be run for a maximum of 12 hours per day without interruption. The [URL="https://www.gpu72.com/"]GPU72 project[/URL] previously created a notebook that can utilize assigned Nvidia GPUs and run [C]mfaktc[/C]. Our CPU-powered notebook runs [C]Prime95[/C] and our GPU-powered notebook runs [C]CUDALucas[/C]; with the added bonus of also using the CPU on that machine to run [C]Prime95[/C]. On average, the time to finish a [URL="https://www.mersenne.org/thresholds/"]Category 4[/URL] [C]First time LL test[/C] takes about 30 days when running CUDALucas on Google Colab. The results from these tests are highly reliable as Google Cloud uses ECC memory in its backend [URL="https://www.mersenneforum.org/showpost.php?p=525190&postcount=14"](source)[/URL]. Users can run multiple instances of each notebook while utilizing the same CUDALucas and/or MPrime binaries, but separate supporting files (e.g., worktodo). We also dynamically fix buffer overflows in CUDALucas, allowing it to utilize the P100 and V100 GPUs Google has to offer. Users interact with a form and run notebooks without the prerequisite need to know how to code.

[B]2. [URL="https://github.com/tdulcet/Colab-Autorun-and-Connect"]Colab Autorun and Connect Browser Add-on/Extension[/URL][/B]
Our [URL="https://github.com/tdulcet/Colab-Autorun-and-Connect"]Colab Autorun and Connect[/URL] Firefox and Chrome/Chromium add-on/extension will automatically connect, reconnect and run the first cell of notebooks in Google Colab. It can be used to automatically run both our GPU and CPU notebooks when the user opens their browser or wakes their computer. It will soon be published to Addons.mozilla.org (AMO) and possibly the Chrome Web Store.

[B]3. [URL="https://github.com/tdulcet/Distributed-Computing-Scripts/blob/master/primenet.py"]PrimeNet Python Script For Mlucas and CUDALucas[/URL][/B]
Our script may also be used outside of these notebooks for other GIMPS projects. Notably, the script now uses the v5 API for all worktypes. We have also addressed a number of bugs from previous versions of the PrimeNet script.

[B]Differences from the GPU72 Project notebook[/B]
1. Compute work types that are eligible for [URL="https://www.mersenne.org/legal/#awards"]cash research discovery award of up to $50,000[/URL].
2. Choices are not limited to the Trial Factorization work type. Choose from any LL worktype for the GPU and all LL and PRP work types (not recommended due to ~3.5 GiB for current first time tests; ~23% of Drive space) on the CPU. Change these worktypes later if desired.
3. Assignments are sent from/to the PrimeNet server, without needing a third party intermediary server.
4. One can save state for long running work types (e.g., LL, PRP, etc.) using Google Drive.
5. Automatically obtain and submit jobs with PrimeNet, with little configuration.
6. Can utilize both the GPU and CPU to run CUDALucas and Prime95 (the GPU notebook).
7. Open Source.
8. Builds CUDALucas directly from source code rather than precompiled binaries.
9. Colab GPUs are much faster at primality testing (LL/PRP) than trial factoring (TF)[URL="https://www.mersenneforum.org/showpost.php?p=529686&postcount=497"](source)[/URL].
10. Easier setup than existing Notebook implementations.
11. Includes counts of all previously received CPUs and GPUs.

[B]How to Get Started[/B]
Use the [URL="https://github.com/tdulcet/Distributed-Computing-Scripts/blob/master/google-colab/GoogleColabGPU.ipynb"]GPU notebook[/URL].
Use the [URL="https://github.com/tdulcet/Distributed-Computing-Scripts/blob/master/google-colab/GoogleColabCPU.ipynb"]CPU notebook[/URL].
Or visit the [URL="https://github.com/tdulcet/Distributed-Computing-Scripts/tree/master/google-colab"]repository[/URL] for more detailed instructions.

Teal and I are available for further contact, questions, and feedback. Happy crunching!

VBCurtis 2021-02-21 04:42

I'm glad you did this project, and starting a new thread for it is the right thing to do.
Please, please don't post links to your new thread in every possibly relevant (or not-so-relevant) thread you can think of. I've removed all those other posts.

Trust that users of this forum are smart enough to notice your new thread. Most users use the "new posts" feature in the menu bar at the top of the forum. Your topic will get plenty of notice there, and all your other links are redundant and clutter up that exact new-post feature.

Uncwilly 2021-02-21 05:53

There is a notebook that Chris of GPU72 set up. I am guessing it does what you think yours does. His automatically uses the CPU and GPU (when available). The thing is, transferring big files associated with primality tests (and storing them) is not real practical. So he has his set to do P-1 on the CPU.
[url]https://www.mersenneforum.org/showthread.php?t=24875[/url]
[url]https://www.mersenneforum.org/showpost.php?p=525235&postcount=19[/url]

tdulcet 2021-02-21 14:19

[QUOTE=Uncwilly;572121]There is a notebook that Chris of GPU72 set up. I am guessing it does what you think yours does.[/QUOTE]

See the "Differences from the GPU72 Project notebook" section of the first post, which addresses this. In short, his notebook is for trial factoring and our notebooks are for primality testing.

Note that users of the GPU72 notebook or any other notebooks can also use our browser add-on/extension.

kriesel 2021-02-21 15:12

Please do not encourage people to run LL first tests. We're trying to phase LL out entirely as routine first tests. Assignment rules may prevent issuance of LL first test work at some point in the future. PRP with GEC and proof generation is far superior for first tests:
[LIST=1][*]Error detection almost 100% by GEC on PRP, vs. only 50% probability of detection by Jacobi check (and CUDALucas does not have the Jacobi check)[*]Essentially equal effort for first primality test computation[*]Approx. 1% the effort of verification, so less than half the total effort for first-test and verification, more than twice the effective primality testing throughput (~1.01 tests equiv per PRP, proof generation, and Cert completion of an exponent, vs. ~2.04 tests per LL, DC, and occasional TC, QC, 5C, 6C)[*]Superior quality / standard of verification (certs can't be faked, even by the person who did the PRP test, while LL DC result reports could be; a successful cert also shows the primality test was done completely and correctly)[*]Far faster verification of a primality test (hours to weeks for PRP with proof generation via prompt brief Cert priority assignments, vs. several YEARS for LL DC, TC, etc.) Verification issues can be traced back to problem hardware or configuration usefully early in the hardware's useful life, instead of after it's removed from service.[*]Setting up for PRP from the start avoids a transition from LL to PRP later.[/LIST]Those determined to run LL despite PRP's superiority are welcome to help with the approximately 8 year backlog of LL DC. There will be less of an issue with assignment expiration before completion on Google Colab, because LL DC assignments are much quicker than first-time LL assignments; about 1/4 the duration as a result of the nearly 2:1 ratio of exponent values at their respective wavefronts.

chalsall 2021-02-21 16:49

[QUOTE=tdulcet;572140]See the "Differences from the GPU72 Project notebook" section of the first post, which addresses this. In short, his notebook is for trial factoring and our notebooks are for primality testing.[/QUOTE]

Very nice work gentlemen. :tu:

Thanks for the reference in your docs. I had a hoot designing and implementing my particular "proof-of-concept".

And, yes. I made a conscious decision to only do Trial Factoring on the GPUs (although as you said, they are actually better at other work on some of the GPU offerings). I didn't want to get into the whole legal question of what is the situation if an MP was actually found.

Also... I decided very early on not to require the attachment of a Drive. At the time, it wasn't as easy as it is now, and so didn't "scale" well. And so the tiny TF checkpoint files mapped well (only later were the much larger P-1 contexts added).

[QUOTE=tdulcet;572140]Note that users of the GPU72 notebook or any other notebooks can also use our browser add-on/extension.[/QUOTE]

I would advise against this.

The Colab Terms of Service are very clear that automation is *not* acceptable for Notebook startup / restarting. They want a human in the loop (I suspect part of this is actually a sociology experiment... 9-)

But, again... Nice work! I'm sure there are many out there who will use this. :smile:

danc2 2021-02-21 19:50

Response
 
Thank you for all the encouragement, congratulatory words, and constructive feedback. Below is a simple clarification on our intentions, thoughts, and motivations.

[QUOTE]Trust that users of this forum are smart enough to notice your new thread[/QUOTE]
We hope no offense was taken to forum users' intellect. Posting in other places was an attempt to update users who are still interested in GIMPS, have become less active on forums, but still hold email subscriptions to popular threads (i.e., are still interested in the topics discussed therein). Nevertheless, we will honor moderator opinions.

[QUOTE]Please do not encourage people to run LL first tests.[/QUOTE]
While we would love for this project to be optimal for PRP testing, the Google Drive size constraints make it difficult to do so, especially if Drive space is used for more than this project by the user. We are not encouraging users to do more LL testing as opposed to PRP testing on their other machines, but rather are trying to increase the net contribution with spare compute time offered by Google Cloud and not cause the project to fill up/bork the user's drive. I think that it is a very good point that this project could speed up DC checks. Note also that we started this project before GEC on PRP was a part of GIMPS.


[QUOTE]The Colab Terms of Service are very clear that automation is *not* acceptable[/QUOTE]
I could not find anything besides the pro terms of service, but did not see anything about automation (I am probably missing it). The warning is noted, however. Though I will say that personally I think that rule would be a bit silly as the whole point of the project is to increase progress in research and machines can be disconnected quite often, even though another machine is available to reconnect to and continue progress on. This may be a rule that is more focused on miners of cryptocurrency then anything else (or maybe it is a sociological experiment). There are also a number of existing extensions (though they do not work) that attempt to do the same thing which Google has not yet barred from running on their machines thus far, perhaps because of their ineffectiveness.

[QUOTE]Nice work! I'm sure there are many out there who will use this.:[/QUOTE]
Thanks! We really appreciate viewing and building off of your work!

Note: The link in #1 of overview is incorrect (my fault). The link should be [URL="https://github.com/tdulcet/Distributed-Computing-Scripts/tree/master/google-colab"]GPU-And-CPU-Powered Colab Jupyter Notebooks[/URL].

Chuck 2021-02-22 00:48

I used Chris' notebook setup successfully for several months (even signing up for the paid Colab Pro service), but Colab kept continually reducing the availability of service with no stated policy. It finally became so restricted that I just dumped the whole thing and cancelled the service.

danc2 2021-02-22 06:16

[QUOTE]It finally became so restricted that I just dumped the whole thing and cancelled the service.[/QUOTE]
I have not noticed any severe availability issues as of this present date. In fact I am so happy with the service, I actually have paid for three Colab Pro accounts ($9.99 each). Keep in mind also that some availability issues are made null by the AutoRun and Connect extension. That is, notebooks reconnect after (tight) usage limits and reconnect if your particular backend has been assigned to a Google Cloud customer.

danc2 2021-02-22 06:24

GPU Output
 
1 Attachment(s)
I realize we did not post any output or pictures, just links.

Since we have this dedicated thread, here is example output from a GPU notebook running the Tesla V100-SMX2-16GB (a $6,195.00 GPU according to Amazon).

[ATTACH]24374[/ATTACH]

Uncwilly 2021-02-22 06:25

[QUOTE=danc2;572210]I have not noticed any severe availability issues as of this present date.[/QUOTE]
There are messages that pop up about usage limits.

bayanne 2021-02-22 10:50

Still waiting fo Colab Pro to be made available for use outside the US ...

danc2 2021-02-22 16:48

[QUOTE]There are messages that pop up about usage limits.[/QUOTE]
For sure. The key word in my response was [C]severe[/C]: "[no] [C]severe[/C] availability issues". See the original post:
[QUOTE]can be run for a maximum of 12 hours per day without interruption.[/QUOTE]
This is actually for [B]non-pro users[/B], so the pro users may have even longer usage limits depending on the needs/demands of Google. However, I think that $9.99/month for up to four high-end GPUs (not to mention okay CPUs) I get intermittently which requires no ancilary costs to me (electricity, maintenance, cooling, etc.) is pretty fair, but that is also just my opinion.

[QUOTE]Still waiting fo Colab Pro to be made available for use outside the US[/QUOTE]
Bummer! Have you tried using a VPN to make a Google Account and log into the US endpoint? You may find a workaround that way and/or with a US payment method.

S485122 2021-02-22 17:13

[QUOTE=danc2;572245]...
Bummer! Have you tried using a VPN to make a Google Account and log into the US endpoint? You may find a workaround that way and/or with a US payment method.[/QUOTE]Bummer ? Not a nice thing to say to a fellow forum user... And in the UK the word has a different meaning : [url=https://en.wiktionary.org/wiki/bummer]Bummer in Wictionary[/url].

Then disparaging someone because he does not want to go against the rules set by a provider is, how to say ... special ?

In the mean time somebody has suggested another explanation for your usage of the word. I might have been wrong in my understanding of what you wrote.

Jacob

kriesel 2021-02-22 17:21

I'm always happy to see someone chip in and contribute to development.
The announcement was well crafted. That one errant URL could be fixed in post 1 with the assistance of a kind moderator on request.

Google Colaboratory has resorted at times to requiring ostensibly human image analysis before authorizing a Colab session.
Three by three arrays of little and sometimes unclear images, with a requirement to select each image that contains bicycles, or palm trees, or hills, or buses, etc. (One object category per challenge session.) Sometimes selected images are replaced with additional until no qualifying images remain; sometimes it's only the initial set of 9. And there have sometimes been child windows specifying it is for human interactive use, not bots, and requiring click confirmation that yes it's a human at the keyboard. (I wonder if Colab free use is where Google coders test their "verify it's a session with a human" algorithms.)

It detects closing the browser tab (or loss of internet connection or operating computer hosting the browser session), and shuts down the Colab VM.
[URL]https://www.mersenneforum.org/showpost.php?p=527364&postcount=201[/URL]

There was the following caution posted; "Be careful using more than one Google account. Apparently people on the LCZero project were banned from using CoLab because they did that."
[URL]https://www.mersenneforum.org/showpost.php?p=525427&postcount=32[/URL] To my knowledge we have not seen such an issue in GIMPS use of Colab.

Gpuowl reportedly is faster than CUDALucas on the same gpu model and exponent task. [URL]https://www.mersenneforum.org/showpost.php?p=525499&postcount=36[/URL]

Google Drive free capacity is 15 GB, including its trash folder. That is sufficient for PRP&proof runs in parallel on cpu and gpu at the current wavefront if used efficiently.
Note that Google offers multiple free mail, storage, etc accounts per person, so one's personal or other email and other cloud storage can be segregated by account, allowing multiple Colab-only accounts to be set up to use the full free 15GB each. Mprime and Gpuowl clean up after themselves.
Cleaning out the trash [URL]https://mersenneforum.org/showpost.php?p=559397&postcount=1025[/URL]
"If you'd like to purchase more Drive space, visit Google Drive. Note that purchasing more space on Drive will not increase the amount of disk available on Colab VMs. Subscribing to Colab Pro will."
[URL]https://research.google.com/colaboratory/faq.html[/URL]
Standard plan Google One (100GB) is $20/year; Advanced (200GB) $30/year; Premium (2TB) $100/year. [URL]https://one.google.com/about#upgrade[/URL]

Nominal Colab Free session max length is 12 hours cpu-only, 10 hours GPU. (TPU irrelevant to GIMPS)
Record longest observed (by me) Colab free session duration >26 hours (with gpu!) [URL]https://mersenneforum.org/showpost.php?p=535260&postcount=829[/URL]
Briefest: ~9 minutes [URL]https://mersenneforum.org/showpost.php?p=535454&postcount=837[/URL]

Nominal Colab Pro session max length is 24 hours.

kruoli 2021-02-22 17:24

[QUOTE=S485122;572253]Bummer ? Not a nice thing to say to a fellow forum user...[/QUOTE]

For what I can recall, I only heard that saying when referring to an unfortunate circumstance. As in: "Really unfortunate that Google still has not expanded this to other countries!" I am sure he is not scolding the forum user here.

Edit: Yes, there is also the other meaning, I do not want to deny that. In this case, I assumed "Bummer" as a shorthand for "That's a bummer".

danc2 2021-02-22 18:18

No offense to you or anyone S485122. Kruoli understood my American definition/intention :smile:.

Thank you Kriesel for your insightful comments. If a moderator can help fix that link it would be greatly appreciated!
[QUOTE]Google Colaboratory has resorted at times to requiring ostensibly human image analysis before authorizing a Colab session.[/QUOTE]
Interesting. I've used Colab (free and Pro) for over 3 months and have not seen this, but will keep a look out for it.

[QUOTE]It detects closing the browser tab.[/QUOTE]
Yes, true. It also historically will shut down if there is no output to the screen. In the README and in the notebooks we instruct users to keep their tabs open.

[QUOTE]Be careful using more than one Google account.[/QUOTE]
Fair warning. Thank you.

[QUOTE]Gpuowl reportedly is faster than CUDALucas on the same gpu model and exponent task.[/QUOTE]
Indeed. See the [C]Contributing[/C] section and subsection [C]General[/C] of the [URL="https://github.com/tdulcet/Distributed-Computing-Scripts"]repository.[/URL] Pull requests are welcome.

[QUOTE]Google Drive free capacity is 15 GB..[sufficient] for PRP&proof runs in parallel on cpu and gpu....purchasing more space on Drive will not increase the amount of disk available on Colab VMs[/QUOTE]
Yes, you are right that a user can setup a Google account dedicated to GIMPS. But it depends on how many notebooks an account is running. A user may run two GPU backends/notebooks and I don't know there is a notebook limit for the CPU notebooks beyond CPU usage limits. A user may exceed the 15GiB drive limit if they create multiple notebooks requesting PRP assignments by opening 5 notebooks (if average is ~3.5GiB per first time test). That is true, users are more than welcome to purchase more space :thumbs-up:. We were trying to warn the user, but maybe we can rephrase to make it sound less like we favor LL tests. I'll talk to Teal about doing so.

I think that Google's comment on Colab VM space is deceptive. To clarify, we do not use Colab VM space ([C]/sample_data[/C] or perhaps [C]/[/C] I believe), but rather the Google Drive space ([C]/drive[/C] because it is persistent, unlike the Colab VM space).

[QUOTE]Record longest observed (by me) Colab free session duration >26 hours (with gpu!)[/QUOTE]
Wow! That is amazing! :explode:
Thank you again for lending your expertise on this subject.

kriesel 2021-02-22 18:55

Since the Google Colab free gpu time allocation per account (~10 hours/day) is consumed twice as fast running two notebooks on one account, may as well run one at a time. This helps with crowding of Drive space, retiring one big set of proof generation temporary files before reserving (mprime) or creating (gpuowl) the next. For Colab Pro, multiple gpus available to one account, I'd probably set it up to branch by model so slow-DP models do TF, medium-DP do P-1, and only the fast-DP gpu runs PRP/proof, to conserve space. I have notebooks and drives set up with model-specific paths in Colab free doing something similar to handle latency differences.

With multiple Colab instances, on multiple hosts' browsers, I find it helpful to have a 1:1 mapping between Google account, mprime machine id, gpu app computerid, Google drive, and notebook instance. And easy to mess that up during initial setup. A plan and documentation helps, initially, or during cleanup.

My previous post was based on experience with Google Colab since Oct 2019. Things change.

danc2 2021-02-22 19:19

[QUOTE]Google Colab free gpu time per allocation per account (~10 hours/day) is consumed twice as fast...may as well run one at a time.[/QUOTE]
The usage limit is per notebook, not per account. See the [URL="https://research.google.com/colaboratory/faq.html#idle-timeouts"]FAQ[/URL]. However, of course, one can decide to use less of their potential usage to do PRP tests and help the project out.

[QUOTE]With multiple Colab instances, on multiple hosts' browsers, I find it helpful to have a 1:1 mapping[/QUOTE]
Yes, definitely there are probably a lot of different ways to do it. We actually recommend using [URL="https://addons.mozilla.org/en-US/firefox/addon/multi-account-containers/"]Firefox containers[/URL]. There is also a Chrome equivalent called [URL="https://chrome.google.com/webstore/detail/sessionbox-multi-login-to/megbklhjamjbcafknkgmokldgolkdfig?hl=en"]SessionBox[/URL], though we haven't tested this yet.

[QUOTE]My previous post was based on experience with Google Colab since Oct 2019. Things change.[/QUOTE]
Understood. I was only hoping to clarify the status of 2020/21. Your insight is appreciated.

LaurV 2021-02-23 07:01

Testing that right now. It seems to work well, albeit very slow. I am getting a lot of errors due to deprecated string conversions in cudaLucas, but beside of it, it works. "Slow" is because I got a shitty CPU/GPU, not because of the errors. I got a T4 (as you said: Bummer! That card is a waste on LL, and when I run TF from Chris, I almost never get one!) which would need about [U][B]70 hours (2 days and 22 hours)[/B][/U] to LL-DC in 60M range (that's for comparison and recording). Hopefully it will be able to store the checkpoint residues properly into the drive (6G free space there) and resume properly when the card will vaporize (never lasts more than few hours, in this part of the world).

Question: I see you still have the version of mprime which offers ECM work, etc, but that is not accessible in the selection menu. Can you make it that we be able to select it, or input directly the numbers (work type) there? For example, if I like to play with Fermat numbers (yeah, I know, bad example, that's discouraged from the server side... but you got the idea).

Also, you could offer a selection between cudaLucas and mfaktc in case we get a T4 or a P100, etc., mfaktc would run "from the box" there.

I named the computer "tsweet" in your honor :razz: ([URL="https://translate.google.com/?sl=ro&tl=en&text=dulce&op=translate"]because[/URL]).

LaurV 2021-02-23 08:39

Ok, it kicked me out. I set the CPU work for PRP-CF-DC work, which would only take few hours, so I can do them in 2 or 3 puny sessions (puny because they don't seem to last more than 1-2-3 hours, here around).

However, there is another problem, now there is no GPU available for me, and [B][U]the script won't run "cpu only"[/U][/B], and as I said, there is quite a "party time" here when I get a GPU. Even rarer when this is a T4 or P100. Usually is the P4 which sucks at both LL and TF. I didn't get a K80 for a while (still good for LL) and I think they discontinue K80s, because K80s have a huge hunger for power. The V100 was never seen here on this side of the pond, we don't believe it exists, hehe, there are only conspiracies and lies! :razz:

So, to be functional, you still have to tickle it! Waiting for it!

Let me chose LL or TF with the GPU, if I get one card which I know is better for one or the other, and let me chose "nothing" if I get no GPU. Also, let me select the CPU work I like to do from the whole mprime list, so I can do "shorter" tasks, like P-1, ECM, etc., which I know will finish in few hours I can keep the steering wheel on my hands. And let PRPs that would take for me a month, for Ben Delo and Curtis C :razz:, they can do them faster. If I can't finish a 30 day colab task, because of improper storage, bad resuming, too much headache and manual work, stupidity, laziness, whatever, then the time and resources would be lost, and I won't help the project, moreover, I would keep colab resources busy when they could be put to better use by other people.

Then we talk.

tdulcet 2021-02-23 16:58

[QUOTE=LaurV;572297]Testing that right now. It seems to work well, albeit very slow. I am getting a lot of errors due to deprecated string conversions in cudaLucas, but beside of it, it works. "Slow" is because I got a shitty CPU/GPU, not because of the errors.[/QUOTE]

Thanks for testing it and for the feedback! Those errors are in the CUDALucas source code, so there is not much we can do about them and they do not cause any known issues. Our GPU notebook just downloads and builds the latest version of CUDALucas, dynamically making a few changes to fix buffer overflow errors with the P100 and V100 GPUs.

[QUOTE=LaurV;572297]Hopefully it will be able to store the checkpoint residues properly into the drive (6G free space there) and resume properly when the card will vaporize (never lasts more than few hours, in this part of the world).
[/QUOTE]

LL DC tests should take less than 50 MiB of your Google Drive storage. First time PRP tests will take about 3.5 GiB.

[QUOTE=LaurV;572297]Question: I see you still have the version of mprime which offers ECM work, etc, but that is not accessible in the selection menu. Can you make it that we be able to select it, or input directly the numbers (work type) there? For example, if I like to play with Fermat numbers (yeah, I know, bad example, that's discouraged from the server side... but you got the idea).[/QUOTE]

Yes, that would be a trivial change we will consider for the next version of our notebooks.

[QUOTE=LaurV;572297]Also, you could offer a selection between cudaLucas and mfaktc in case we get a T4 or a P100, etc., mfaktc would run "from the box" there.[/QUOTE]

Our notebooks are only designed for primality testing (note the title of this thread). If you want to do trial factoring, I would recommend using the GPU72 notebook. Pull requests are welcome if someone wants to combine our notebooks with the GPU72 notebook.

[QUOTE=LaurV;572304]Ok, it kicked me out. I set the CPU work for PRP-CF-DC work, which would only take few hours, so I can do them in 2 or 3 puny sessions (puny because they don't seem to last more than 1-2-3 hours, here around).

However, there is another problem, now there is no GPU available for me, and [B][U]the script won't run "cpu only"[/U][/B], and as I said, there is quite a "party time" here when I get a GPU. [/QUOTE]

We have a separate CPU only notebook for this purpose. If users use our GPU notebook for only CPU work, they cannot retry to get a GPU backend, which is why we created a separate CPU only notebook. If you are using the free Colab, we would recommend running both our "GPU and CPU" and "CPU only" notebooks get the most throughput.

Note that because of how MPrime is setup, users cannot currently change the CPU type of work after first running the notebooks, they can only change the GPU type of work. You can get around this by creating a new copy of the notebook with a different computer number value. Users can currently create up to 10 copies of both notebooks using computer number values of 0-9.

[QUOTE=LaurV;572304]If I can't finish a 30 day colab task, because of improper storage, bad resuming, too much headache and manual work, stupidity, laziness, whatever, then the time and resources would be lost, and I won't help the project, moreover, I would keep colab resources busy when they could be put to better use by other people.[/QUOTE]

Everything should be completely automated. Note that with Colab Pro, first time primality tests on the GPU will only take around 2-3 days.

danc2 2021-02-23 19:55

~1 Month Results
 
1 Attachment(s)
[QUOTE]Note that with Colab Pro, first time primality tests on the GPU will only take around 2-3 days.[/QUOTE]
To add to Teal's point about how quickly results are returned with the Pro version, attached is a picture of the results from approximately one month and 8 days of testing using Colab Pro. I ran mostly 1 Colab Pro machine and later ran a second Colab Pro machine I purchased. I was using the Colab Extension (though had some issues, which slowed me down; so, think: "even more results are possible").

The results from `Oracle*` and `pdxEmail`, and `Windows` CPUs can be ignored as they are not from Colab, but everything else is. Please also note that I was 5 days without power and thus missed out on those 6 days of results. In tottal, the sum of total primality results up to the current date returned in this timeframe of Januuary 16 - February 23rd is [B][SIZE="4"]46[/SIZE][/B]. This is a significant number due to the fact that most of these are not small DC or CERTS.

chalsall 2021-02-23 21:20

[QUOTE=danc2;572346]This is a significant number due to the fact that most of these are not small DC or CERTS.[/QUOTE]

If I may please share, I'm really enjoying this experiment. :tu:


1. A reasonable amount of compute can be "harvested" from Colab.


2. There seem to be quite a few "dimensions" to the compute allotments.


3. While those running the GPU72 Notebook were shut-out, others were reporting 12 hours or so of GPU.

3.1. The Google Gods (which may simply be Humans directing machines) act in mysterious ways.


4. Recently, those running the GPU72 Notebook have been getting a bit of compute each day.

4.1. My thirteen (13#) instances (spread across five machines in three countries) have to be interacted with, but they always get at least CPU compute for at least 20 minutes.


To be honest, I've been as fascinated with watching the experimenters experiment with the Subjects as much as anything else.

(I'm reminded of Douglas Adams, and the Mice and the Dolphins (or was it the whales)).

Uncwilly 2021-02-23 22:26

[QUOTE=chalsall;572351]To be honest, I've been as fascinated with watching the experimenters experiment with the Subjects as much as anything else.

(I'm reminded of Douglas Adams, and the Mice and the Dolphins (or was it the whales)).[/QUOTE]Not Milgram?

chalsall 2021-02-23 22:48

[QUOTE=Uncwilly;572356]Not Milgram?[/QUOTE]

While seminal, in my opinion "lightweight".

That study didn't bring the profit driver function into the equation (although it might have identified psychopaths as interesting subjects).

tdulcet 2021-02-24 15:54

[QUOTE=kriesel;572256]I'm always happy to see someone chip in and contribute to development.[/QUOTE]

No problem, we are happy to help.

[QUOTE=kriesel;572256]Google Colaboratory has resorted at times to requiring ostensibly human image analysis before authorizing a Colab session.
Three by three arrays of little and sometimes unclear images, with a requirement to select each image that contains bicycles, or palm trees, or hills, or buses, etc. (One object category per challenge session.) Sometimes selected images are replaced with additional until no qualifying images remain; sometimes it's only the initial set of 9. And there have sometimes been child windows specifying it is for human interactive use, not bots, and requiring click confirmation that yes it's a human at the keyboard.[/QUOTE]

I have only seen this once and only with the free Colab. However, even if a notebook disconnects, our extension will just automatically reconnect it. I added a new optional feature to our extension which will automatically rotate through the users Colab tabs when there system is idle or locked (similar to a screen saver, but the screen does not need to be on). This should help prevent the notebooks from being perceived as inactive, particularly for users who are using a dedicated device such as a Raspberry Pi to run their notebooks.

[QUOTE=kriesel;572256]Gpuowl reportedly is faster than CUDALucas on the same gpu model and exponent task.[/QUOTE]

Yeah, I have seen a few posts that claim this, but I do not think anyone has tested yet with all five GPUs currently available on Colab and I am not sure exactly what procedure they followed to come to that conclusion. Our GPU notebook (which uses my CUDALucas install script) makes several changes to the Makefile before building CUDALucas, which likely affects the resulting performance, including enabling the [C]-O3[/C] optimization and correctly setting the [C]--generate-code[/C] flag for every GPU available on Colab. We also did in advance the cufftbench and threadbench tuning for all five GPUs to the 32768K FFT length, which covers exponents up to 580,225,813. You can see the resulting [C]*fft.txt[/C] and [C]*threads.txt[/C] files in our repository [URL="https://github.com/tdulcet/Distributed-Computing-Scripts/tree/master/google-colab/gpu_optimizations"]here[/URL], which lists the ms/iter speeds at every FFT length.

kriesel 2021-02-24 17:50

1 Attachment(s)
[QUOTE=tdulcet;572425]I have only seen this once and only with the free Colab. [/QUOTE]In support of chalsall's statement that Google offers Colab for interactive use, not bot use, the image interpretation task used to occur at least daily on one of my several Colab free accounts; same account every time. It doesn't happen often now, but it still comes up.

re gpuowl faster than cudalucas:
[QUOTE]Yeah, I have seen a few posts that claim this, but I do not think anyone has tested yet with all five GPUs currently available on Colab and I am not sure exactly what procedure they followed to come to that conclusion[/QUOTE]I don't have the time now to respond thoroughly to that. But I did enough testing to decide that all my local gpus that could run gpuowl would completely transition from already-established CUDALucas. I had thoroughly tested and tuned for numerous gpu models from Quadro2000 to GTX1080Ti in CUDALucas before that. Here's a recent quick compare on GTX1080.

Compare LL on CUDALucas to PRP on gpuowl. Same exponent, same host, same gpu, same hour, same environmental and clocking conditions, a GTX1080 for this quick benchmark.

CUDALucas v2.06 May 5 2017 version compiled by flashjh; Windows 10 run environment
[CODE]Starting M240110503 fft length = 13824K
| Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done |
| Feb 23 16:14:28 | M240110503 10000 0x5b6b7cbec1bdc015 | 13824K 0.08594 13.4883 134.88s | 37:11:35:46 0.00% |
| Feb 23 16:16:43 | M240110503 20000 0xde34ff2ddb2080a4 | 13824K 0.08789 13.5358 135.35s | 37:13:08:45 0.00% |
| Feb 23 16:18:58 | M240110503 30000 0x14e2c4cd92c29164 | 13824K 0.09180 13.5395 135.39s | 37:13:43:10 0.01% |
| Feb 23 16:21:14 | M240110503 40000 0x5256dd82035447c4 | 13824K 0.08594 13.5488 135.48s | 37:14:08:29 0.01% |
| Feb 23 16:23:29 | M240110503 50000 0xe89ddd5520561b21 | 13824K 0.08594 13.5361 135.36s | 37:14:12:38 0.02% |[/CODE]average ms/it 13.5297
ETA 240110503 * .0135297 sec /3600/24 days/sec =~ 37.600 days

Gpuowl v6.11-380 excerpt mid-run of PRP/GEC/proof, 13M fft (1k:13:512):[CODE]
2021-02-23 15:37:13 asr3/gtx1080 240110503 OK 131700000 54.85%; 11875 us/it; ETA 14d 21:36; a5f295da6eddc0a1 (check 5.17s)
2021-02-23 15:47:13 asr3/gtx1080 240110503 OK 131750000 54.87%; 11877 us/it; ETA 14d 21:30; f20a694bd0c842de (check 5.71s)
2021-02-23 15:57:12 asr3/gtx1080 240110503 OK 131800000 54.89%; 11883 us/it; ETA 14d 21:31; 7ddaab01bbd26fcd (check 5.20s)
2021-02-23 16:07:11 asr3/gtx1080 240110503 OK 131850000 54.91%; 11866 us/it; ETA 14d 20:50; 38b6acb7773f3896 (check 5.28s)[/CODE]average ms/it 11.875
ETA start to finish 240110503 * .011875 sec /3600/24 days/sec =~ 33.001 days

Raw iteration speed ratio gpuowl PRP / CUDALucas LL = 37.6/33.001 =~ 1.1394

The fft length difference (13.5M CUDALucas vs 13M gpuowl) only accounts for ~4% out of the observed 14% difference favoring gpuowl (like getting 8 days per week!)


What's omitted above is the slightly more than 2:1 overall project speed advantage of PRP/GEC/proof vs. LL, LLDC, and typically 4% LLTC, that's lost by using CUDALucas. And the loss of error checking; not even the relatively weaker Jacobi symbol check in CUDALucas, unless you've added it in your builds. The higher the exponent, the longer the run, and the less likely a run will complete correctly without GEC.

In P-1, you could perhaps compare my CUDAPm1 fft and threads file timings and estimate P-1 run times. If you try running P-1 tests on Colab I'd be interested in learning how to resolve the zero-residue issue I ran into. [URL]https://www.mersenneforum.org/showpost.php?p=527928&postcount=5[/URL]

Gpuowl P-1 run time scaling for various gpus including 2 Colab models can be found [URL="https://www.mersenneforum.org/showpost.php?p=525955&postcount=17"]here[/URL]. Benchmarking on V100 has been a nonissue since I don't recall ever encountering one. Lately it's almost entirely T4s, more suitable for TF.

danc2 2021-02-24 19:31

[QUOTE][image interpretation task] doesn't happen often now, but it still comes up.[/QUOTE]
I would be curious if Teal has seen this when using the extension or not. The extension can check (clicks on the play button of the first cell) every 5 seconds IIRC (customizable by the user). With this setup, I've never seen the interpretation task.

GPUOwl stuff:
Yes, it would be great if we could use GPUOwl instead of CUDALucas as it sounds like there is more that can be done, as great as CUDALucas is.

kriesel 2021-02-25 12:46

GTX1060 gpuowl vs. CUDALucas ~58M LL DC
 
Executive summary: Gpuowl 5.8 ms/iter with Jacobi check, CUDALuca 6.25-6.5 ms/iter (no Jacobi check)


Gpuowl v6.11-380 on GTX1060 ~5.806 ms/iter in 58.75M LL DC with Jacobi check:[CODE]2021-02-22 21:04:36 condor/gtx1060 58755607 FFT: 3M 1K:6:256 (18.68 bpw)
2021-02-22 21:04:36 condor/gtx1060 Expected maximum carry32: 50550000
2021-02-22 21:04:36 condor/gtx1060 OpenCL args "-DEXP=58755607u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=6u -DPM1=0 -DMM_CHAIN=1u -DMM2_CHAIN=2u -DMAX_ACCURAC
Y=1 -DWEIGHT_STEP_MINUS_1=0x8.01304be8dc228p-5 -DIWEIGHT_STEP_MINUS_1=-0xc.ce52411c70cep-6 -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2021-02-22 21:04:39 condor/gtx1060

2021-02-22 21:04:39 condor/gtx1060 OpenCL compilation in 2.52 s
2021-02-22 21:04:39 condor/gtx1060 58755607 LL 0 loaded: 0000000000000004
2021-02-22 21:06:00 condor/gtx1060 102714151 P2 GCD: no factor
2021-02-22 21:06:00 condor/gtx1060 {"status":"NF", "exponent":"102714151", "worktype":"PM1", "B1":"1000000", "B2":"30000000", "fft-length":"5767168", "program":
{"name":"gpuowl", "version":"v6.11-380-g79ea0cc"}, "user":"kriesel", "computer":"condor/gtx1060", "aid":"7DAA6CA7DFF308D0DF638276AF9B5028", "timestamp":"2021-02
-23 03:06:00 UTC"}
2021-02-22 21:14:20 condor/gtx1060 58755607 LL 100000 0.17%; 5807 us/it; ETA 3d 22:37; 39c251c47f602a3d
2021-02-22 21:24:01 condor/gtx1060 58755607 LL 200000 0.34%; 5807 us/it; ETA 3d 22:28; eb46c0fb8d0e94f8
2021-02-22 21:33:41 condor/gtx1060 58755607 LL 300000 0.51%; 5807 us/it; ETA 3d 22:18; ed993c4bb040ddef
2021-02-22 21:43:22 condor/gtx1060 58755607 LL 400000 0.68%; 5807 us/it; ETA 3d 22:07; 54e2c2904288419d
2021-02-22 21:53:03 condor/gtx1060 58755607 LL 500000 0.85%; 5808 us/it; ETA 3d 21:59; 16657e0fba393f7f
2021-02-22 22:02:43 condor/gtx1060 58755607 LL 600000 1.02%; 5808 us/it; ETA 3d 21:49; 7ca0fe4b4db9c724
2021-02-22 22:02:43 condor/gtx1060 58755607 OK 500000 (jacobi == -1)
2021-02-22 22:12:24 condor/gtx1060 58755607 LL 700000 1.19%; 5808 us/it; ETA 3d 21:40; 22aa1cb83c55294c
...
2021-02-25 05:41:26 condor/gtx1060 58755607 LL 35100000 59.74%; 5805 us/it; ETA 1d 14:09; 7810938d88993295
2021-02-25 05:41:26 condor/gtx1060 58755607 OK 35000000 (jacobi == -1)
2021-02-25 05:51:06 condor/gtx1060 58755607 LL 35200000 59.91%; 5804 us/it; ETA 1d 13:59; 5d55d69ab7ca60a9
2021-02-25 06:00:46 condor/gtx1060 58755607 LL 35300000 60.08%; 5804 us/it; ETA 1d 13:49; 5635fb50dc776ab9
2021-02-25 06:10:27 condor/gtx1060 58755607 LL 35400000 60.25%; 5804 us/it; ETA 1d 13:39; 2ef462f9a00916b2
2021-02-25 06:14:25 condor/gtx1060 Stopping, please wait..
2021-02-25 06:14:25 condor/gtx1060 58755607 LL 35441000 60.32%; 5813 us/it; ETA 1d 13:39; bdb95405e8027916
2021-02-25 06:14:25 condor/gtx1060 waiting for the Jacobi check to finish..
2021-02-25 06:15:12 condor/gtx1060 58755607 OK 35441000 (jacobi == -1)
[/CODE]Cudalucas v2.06 May 5 2017, same everything else, nominally 6.248 ms/iter, but actually higher because of oscillation between 3136K and 3200K fft length;
10:51 / 100k iterations = 6.51 msec/iter, 12% longer than Gpuowl, and no Jacobi check:[CODE]Using threads: square 512, splice 128.
Starting M58755607 fft length = 3200K
| Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done |
| Feb 25 06:20:15 | M58755607 50000 0x6b790995614a3aa2 | 3200K 0.19189 6.2483 312.41s | 4:05:53:31 0.08% |
Resettng fft.

Using threads: square 512, splice 32.

Continuing M58755607 @ iteration 50001 with fft length 3136K, 0.09% done

Round off error at iteration = 51500, err = 0.35938 > 0.35, fft = 3136K.
Restarting from last checkpoint to see if the error is repeatable.

Using threads: square 512, splice 32.

Continuing M58755607 @ iteration 50001 with fft length 3136K, 0.09% done

Round off error at iteration = 51500, err = 0.35938 > 0.35, fft = 3136K.
The error persists.
Trying a larger fft until the next checkpoint.

Using threads: square 512, splice 128.

Continuing M58755607 @ iteration 50001 with fft length 3200K, 0.09% done

| Feb 25 06:25:45 | M58755607 100000 0x39c251c47f602a3d | 3200K 0.18750 6.2484 312.41s | 4:05:48:26 0.17% |
Resettng fft.

Using threads: square 512, splice 32.

Continuing M58755607 @ iteration 100001 with fft length 3136K, 0.17% done

Round off error at iteration = 100700, err = 0.35156 > 0.35, fft = 3136K.
Restarting from last checkpoint to see if the error is repeatable.

Using threads: square 512, splice 32.

Continuing M58755607 @ iteration 100001 with fft length 3136K, 0.17% done

Round off error at iteration = 100700, err = 0.35156 > 0.35, fft = 3136K.
The error persists.
Trying a larger fft until the next checkpoint.

Using threads: square 512, splice 128.

Continuing M58755607 @ iteration 100001 with fft length 3200K, 0.17% done

| Feb 25 06:31:06 | M58755607 150000 0x71a49982b1d8c05d | 3200K 0.17969 6.2493 312.46s | 4:05:44:06 0.25% |
Resettng fft.

Using threads: square 512, splice 32.

Continuing M58755607 @ iteration 150001 with fft length 3136K, 0.26% done

Round off error at iteration = 158700, err = 0.375 > 0.35, fft = 3136K.
Restarting from last checkpoint to see if the error is repeatable.

Using threads: square 512, splice 32.

Continuing M58755607 @ iteration 150001 with fft length 3136K, 0.26% done

Round off error at iteration = 158700, err = 0.375 > 0.35, fft = 3136K.
The error persists.
Trying a larger fft until the next checkpoint.[/CODE]CUDALucas was a great program. Had a lot of fun with it. It has been surpassed and is not being actively maintained.

tdulcet 2021-02-25 14:44

[QUOTE=kriesel;572432]In support of chalsall's statement that Google offers Colab for interactive use, not bot use, the image interpretation task used to occur at least daily on one of my several Colab free accounts; same account every time. It doesn't happen often now, but it still comes up.[/QUOTE]

Our extension is not designed to act like a bot and I would actually consider that an abuse of it. It is only to assist users with the otherwise tedious task of checking if their notebooks will connect/reconnect, to help them maximize their runtime. It is also not designed to be used noninteractivity. By default, it will display a desktop notification whenever a notebook connects, reconnects or disconnects due to usage limits. Clicking these notifications opens the tab/window with the notebook so the user can easily monitor the progress and after it connects, they can check which GPU/CPU they got. Even with our extension installed, I still manually check my Colab tabs at least hourly to monitor the progress and check our notebooks for errors, as often as I would without the extension.

Note that there are existing add-ons that claim to be able to automatically solve these reCAPTCHAs (I have never tried any of them), such as [URL="https://addons.mozilla.org/en-US/firefox/addon/buster-captcha-solver/"]Buster: Captcha Solver for Humans[/URL], which could potentially be used if this ever becomes problematic in Colab.

[QUOTE=kriesel;572432]But I did enough testing to decide that all my local gpus that could run gpuowl would completely transition from already-established CUDALucas. I had thoroughly tested and tuned for numerous gpu models from Quadro2000 to GTX1080Ti in CUDALucas before that.[/QUOTE]

OK, I have no doubt that GpuOwl is faster on some Nvidia GPUs than CUDALucas and your results show that for your GTX 1080 and GTX 1060 GPUs. However, I was specifically referring the Tesla V100, P100, K80, T4 and P4 GPUs available on Colab and using my install script to build CUDALucas. I do not think anyone has tested yet with all of those.

For a wavefront first time primality test (with an exponent up to 115,080,019), here are the ms/iter speeds with CUDALucas on Colab using our GPU notebook (all 6272K FFT length):
[LIST][*]Tesla V100: 1.14 ms/iter[*]Tesla P100: 1.74 ms/iter[*]Tesla K80: 6.66 - 7.36 ms/iter[*]Tesla T4: 7.95 - 8.48 ms/iter[*]Tesla P4: 10.24 ms/iter[/LIST]We would be interested if someone had these ms/iter speeds with GpuOwl on Colab.

[QUOTE=kriesel;572432]And the loss of error checking; not even the relatively weaker Jacobi symbol check in CUDALucas, unless you've added it in your builds. The higher the exponent, the longer the run, and the less likely a run will complete correctly without GEC.[/QUOTE]

All the Tesla GPUs on Colab have ECC memory enabled, so Jacobi and Gerbicz error checking is not needed. You can see this from the [C]ECC Support?[/C] line near the top of the CUDALucas output. Adding Jacobi error checking to CUDALucas is listed in [URL="https://github.com/tdulcet/Distributed-Computing-Scripts#contributing"]the Contributing section[/URL] of the main README, but it would have no effect on Colab.

[QUOTE=kriesel;572508]Gpuowl v6.11-380 on GTX1060 ~5.806 ms/iter in 58.75M LL DC with Jacobi check[/QUOTE]

Note that the [URL="https://github.com/preda/gpuowl/releases"]latest version[/URL] of GpuOwl is v7.2, although it no longer supports any LL tests or the Jacobi error check. This would add a lot of complexity to our GPU notebook, if it were to support GpuOwl, as it would have to download and build both v6 and v7 to support both LL DC and PRP tests respectively and then someone would have to write a wrapper to run the correct version based on the next assignment in the worktodo file.

[QUOTE=kriesel;572508]CUDALucas was a great program. Had a lot of fun with it. It has been surpassed and is not being actively maintained.[/QUOTE]

As Daniel said in [URL="https://www.mersenneforum.org/showpost.php?p=572263&postcount=17"]post #17[/URL], pull requests are welcome!

[QUOTE=danc2;572447]I would be curious if Teal has seen this when using the extension or not. The extension can check (clicks on the play button of the first cell) every 5 seconds IIRC (customizable by the user). With this setup, I've never seen the interpretation task.[/QUOTE]

Yeah, I am not sure if the reason I have only seen this once is because of our extension. It dismisses all other popups, so it is possible that our extension just dismisses this popup, which would explain why Daniel and I never see it. I would need to see it again to know for sure, so that I can inspect it.

When our extension is set to automatically run the first cell of the notebook (disabled by default), it will check if the cell is running every minute by default. This is configurable, but I would not recommend that users use a value less than one minute to prevent Google from thinking they/we are [URL="https://en.wikipedia.org/wiki/Denial-of-service_attack"]DoSing[/URL] their servers.

Prime95 2021-02-25 16:59

[QUOTE=tdulcet;572514]All the Tesla GPUs on Colab have ECC memory enabled, so Jacobi and Gerbicz error checking is not needed.[/QUOTE]

There are other sources of hardware error than memory. Thus, Gerbicz error checking is still beneficial.

[quote]GpuOwl is v7.2, no longer supports any LL tests. This would add a lot of complexity to our GPU notebook, if it were to support GpuOwl, as it would have to download and build both v6 and v7 to support both LL DC and PRP tests respectively and then someone would have to write a wrapper to run the correct version based on the next assignment in the worktodo file.[/QUOTE]

The PrimeNet server will happily accept a PRP test with proof for LL-DC work. So, you only need to download one gpuowl version.
Another gpuowl advantage is it will run P-1 if necessary, potentially saving a lengthy PRP test altogether.

Also, in prime95 you can cut the amount of disk space required in half. I'll bet gpuowl has a similar option.

PhilF 2021-02-25 19:04

[QUOTE=Prime95;572522]The PrimeNet server will happily accept a PRP test with proof for LL-DC work.[/QUOTE]

I didn't know that! So, would one just manually reserve a LL-DC exponent, PRP test it, and then manually submit the result?

Prime95 2021-02-25 20:25

[QUOTE=PhilF;572527]I didn't know that! So, would one just manually reserve a LL-DC exponent, PRP test it, and then manually submit the result?[/QUOTE]

Yes, get an LL-DC assignment and then PRP it. Upload your PRP result and proof file as you normally would (either by prime95 or gpuowl's python script).

kriesel 2021-02-25 20:27

[QUOTE=tdulcet;572514]All the Tesla GPUs on Colab have ECC memory enabled, so Jacobi and Gerbicz error checking is not needed. You can see this from the [C]ECC Support?[/C] line near the top of the CUDALucas output. Adding Jacobi error checking to CUDALucas is listed in [URL="https://github.com/tdulcet/Distributed-Computing-Scripts#contributing"]the Contributing section[/URL] of the main README, but it would have no effect on Colab.[/QUOTE]To support George's statement that while ECC ram helps it does not make a hardware/software system error-immune, here is an existence proof of CUDALucas error despite ECC ram. Following is from my own bad-residue log. [CODE]2020-05-12 bad ll [M]122743793[/M] manual, condor quadro 2000; diverges May 1 2020 after 107M 87.1% by 108M 88%. run was almost 3 months Feb 21 to May 11. Roundoff error was a very comfortable 0.12; no error messages in the logged console output.CUDALucas v2.06 log excerpts:
| May 01 04:57:06 | M122743793 107000000 0x0f01f93746501744 | 6912K 0.10742 55.9884 559.88s | 10:04:52:44 87.17% |
ok to here
bad from here
| May 01 20:30:15 | M122743793 108000000 0x9b21e398524e0ebe | 6912K 0.11475 55.9881 559.88s | 9:13:19:29 87.98% |
see [URL]https://mersenneforum.org/showpost.php?p=545769&postcount=9[/URL] for interim residues from a matched run
[/CODE]Condor is a dual-Xeon HP Z600 workstation with ECC system ram. Quadro 2000 gpus have ECC gpu ram. The gpu is mounted directly in the PCIe socket (no extender involved). I think ECC ram protects from memory errors, but not from certain firmware bugs (Pentium fdiv etc anyone?), PCIe bus transmission errors, fft excessive roundoff error, coding error in either the gpu application or the libraries it may call, etc.

[QUOTE]Note that the [URL="https://github.com/preda/gpuowl/releases"]latest version[/URL] of GpuOwl is v7.2, although it no longer supports any LL tests or the Jacobi error check.[/QUOTE]I'm [URL="https://www.mersenneforum.org/showthread.php?t=25624&page=3"]well aware[/URL].[QUOTE]This would add a lot of complexity to our GPU notebook, if it were to support GpuOwl, as it would have to download and build both v6 and v7 to support both LL DC and PRP tests respectively and then someone would have to write a wrapper to run the correct version based on the next assignment in the worktodo file.[/QUOTE]Or run ~V6.11-364 with separate P-1 and PRP tasks. That would still outperform CUDALucas about 2:1 overall. The marginal utility of a first LL primality test is about zero. It necessitates either a lengthy LL DC or a PRP/GEC/proof which will run quicker than the CUDALucas LL or LL DC on the same hardware. There's no reason I know of, to believe that the combined efforts of Mihai Preda, George Woltman, and many others (including lots of testers) to make gpuowl ffts in the various lengths very efficient and reliable, made it so, on all documented gpu models capable of running it, except worse than half that of CUDALucas iters/second/run, for the Tesla models that Colab happens to offer, for which documentation is not yet posted for your specified fft length. (Such an fft length is not implemented in gpuowl!) But I have queued up a PRP task for T4. This is with an older slower gpuowl version, but should still make the point that gpuowl PRP/proof on T4 on Colab free beats CUDALucas LL & LLDC & occasional LLTC on T4 on Colab free.

As far as I know, no version gpuowl has a 6272K fft transform. But a relatively recent version has higher reach with the 6M transform; here for v7.2-53 and similar, excerpt from help output:
[CODE]FFT 6M [ 37.75M - 116.51M] 1K:12:256 1K:6:512 1K:3:1K 256:12:1K 512:12:512 512:6:1K 4K:3:256
FFT 6.50M [ 40.89M - 125.95M] 1K:13:256 256:13:1K 512:13:512[/CODE]For PRP=(aid),1,2,115545511,-1,78,0,3,1
and for 7M fft, the older gpuowl version (probably one of Fan Ming's compiles, Google drive file date jan 21 2020) is producing iteration times 7.75-8.46 ms/iter on T4. An old [B]less optimized version of gpuowl[/B], running a [B]longer fft length[/B], still a little [B]faster[/B] than the Colab CUDALucas T4 timings posted [URL="https://www.mersenneforum.org/showpost.php?p=572514&postcount=31"]recently[/URL]: [QUOTE]Tesla T4: 7.95 - 8.48 ms/iter[/QUOTE]UTC time stamped Colab gpuowl log excerpt:[CODE]2021-02-25 18:27:24 config.txt: -user kriesel -cpu colab/TeslaT4 -yield -maxAlloc 15000 -use NO_ASM
2021-02-25 18:27:25 config.txt:
2021-02-25 18:27:25 colab/TeslaT4 115545511 FFT 7168K: Width 256x4, Height 64x8, Middle 7; 15.74 bits/word
2021-02-25 18:27:26 colab/TeslaT4 OpenCL args "-DEXP=115545511u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=7u -DWEIGHT_STEP=0x1.322aaa7d291efp+0 -DIWEIGHT_STEP=0x1.ac1b50a86d588p-1 -DWEIGHT_BIGSTEP=0x1.306fe0a31b715p+0 -DIWEIGHT_BIGSTEP=0x1.ae89f995ad3adp-1 -DNO_ASM=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2021-02-25 18:27:28 colab/TeslaT4

2021-02-25 18:27:28 colab/TeslaT4 OpenCL compilation in 2109 ms
2021-02-25 18:27:46 colab/TeslaT4 115545511 OK 1000 0.00%; 7753 us/sq; ETA 10d 08:50; 947a2638dcd5659d (check 4.25s)
2021-02-25 18:34:34 colab/TeslaT4 115545511 50000 0.04%; 8324 us/sq; ETA 11d 03:03; 2abe8c5a456c9248
2021-02-25 18:40:42 colab/TeslaT4 Stopping, please wait..
2021-02-25 18:40:47 colab/TeslaT4 115545511 OK 93500 0.08%; 8455 us/sq; ETA 11d 07:09; 94321be129778fdc (check 4.62s)
2021-02-25 18:40:47 colab/TeslaT4 Exiting because "stop requested"
2021-02-25 18:40:47 colab/TeslaT4 Bye
2021-02-25 18:48:30 config.txt: -user kriesel -cpu colab/TeslaT4 -yield -maxAlloc 15000 -use NO_ASM
2021-02-25 18:48:30 config.txt:
2021-02-25 18:48:30 colab/TeslaT4 115545511 FFT 7168K: Width 256x4, Height 64x8, Middle 7; 15.74 bits/word
2021-02-25 18:48:30 colab/TeslaT4 OpenCL args "-DEXP=115545511u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=7u -DWEIGHT_STEP=0x1.322aaa7d291efp+0 -DIWEIGHT_STEP=0x1.ac1b50a86d588p-1 -DWEIGHT_BIGSTEP=0x1.306fe0a31b715p+0 -DIWEIGHT_BIGSTEP=0x1.ae89f995ad3adp-1 -DNO_ASM=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2021-02-25 18:48:30 colab/TeslaT4

2021-02-25 18:48:30 colab/TeslaT4 OpenCL compilation in 5 ms
2021-02-25 18:48:49 colab/TeslaT4 115545511 OK 94500 0.08%; 7770 us/sq; ETA 10d 09:11; 802418424467173d (check 4.22s)
2021-02-25 18:49:32 colab/TeslaT4 115545511 100000 0.09%; 7829 us/sq; ETA 10d 11:03; eec0fc882a58923c
2021-02-25 18:56:30 colab/TeslaT4 115545511 150000 0.13%; 8346 us/sq; ETA 11d 03:31; 857fa1746622daba
2021-02-25 19:03:32 colab/TeslaT4 115545511 200000 0.17%; 8442 us/sq; ETA 11d 06:29; 07065de43d5d6667
2021-02-25 19:10:39 colab/TeslaT4 115545511 OK 250000 0.22%; 8445 us/sq; ETA 11d 06:29; a491206a633e11cd (check 4.58s)
2021-02-25 19:17:41 colab/TeslaT4 115545511 300000 0.26%; 8450 us/sq; ETA 11d 06:31; 7dd17f25c99a3c46
2021-02-25 19:18:11 colab/TeslaT4 Stopping, please wait..
2021-02-25 19:18:15 colab/TeslaT4 115545511 OK 303500 0.26%; 8452 us/sq; ETA 11d 06:33; 6154addd71f541a2 (check 4.56s)
2021-02-25 19:18:15 colab/TeslaT4 Exiting because "stop requested"
2021-02-25 19:18:15 colab/TeslaT4 Bye[/CODE]Judging by executable file size and date, this was produced by gpuowl v6.11-11 from November 2019. There's been a lot of optimization since, which would be better represented by say v6.11-366 [URL]https://www.mersenneforum.org/showpost.php?p=555882&postcount=1020[/URL]
And v6.11-366 would run 115M at 6M fft length, per [URL]https://www.mersenneforum.org/showpost.php?p=499636&postcount=9[/URL], picking up additional speed by reducing fft length.

[QUOTE]When our extension is set to automatically run the first cell of the notebook (disabled by default), it will check if the cell is running every minute by default. This is configurable, but I would not recommend that users use a value less than one minute to prevent Google from thinking they/we are [URL="https://en.wikipedia.org/wiki/Denial-of-service_attack"]DoSing[/URL] their servers.[/QUOTE]There's also the possibility that the smart folks at Google could be tolerating for now action that is not really allowed by their offer of free use, as a learning exercise, and if it becomes a concern to them later, it could morph into a software arms race. I've seen a recent drop in cpu-only session duration, from the 12 hour maximum I was getting, to under 7 hours lately.

Bot: "a computer program that performs automatic repetitive tasks" [URL]https://www.merriam-webster.com/dictionary/bot[/URL]
Seems to me to match the behavior you described for your software, reactivating tabs at regular intervals, dismissing prompts when they appear, etc. Not meaning to be dismissive, pejorative, or other forms of negative, but not ignoring details either, of how the provider wants the service to be used.

Gpuowl does indeed support lower proof powers. (Confirmed by both source code inspection and a short test run on a small exponent.) I'm not sure how low the Primenet server and verification process supports. Please use a reasonably high proof power for efficiency. Each reduction of proof power by one doubles the verification effort.

danc2 2021-02-26 03:15

[B]Bot and Ethics Discussion[/B]
[QUOTE]Bot: "a computer program that performs automatic repetitive tasks"[/QUOTE]
To go by this definition of a bot, one must also realize the GIMPS project itself is also a bot and, by extension, every computer program that runs on Google Colab that performs some repetitive task (many computer programs) is allegedly violating Google's terms of service...

However, I think its important to note that "Bot" is not outlined anywhere in the terms of service as being discouraged (only Crypto mining). Please, anyone, quote the terms of service here to dispel any misinformation I am spouting if I am missing this.

Consider the following when considering if an extension breaches an ethical boundary:
✅ Colab is for research use (translation: should be used for research purposes, as often as is allowed)
✅ Colab is unique in that its hardware is not always available (translation: Google wants people to use their machines for research, but also wants to make a higher ROI whenever possible. Reconnecting & auto-starting does not go against this goal.).
✅ Colab has not banned or made public mention about any of the extensions that exist thus far to automatically reconnect or run Colab notebooks (to my knowledge) (translation: Google is unaware, lazy, collecting data for some grand purpose, or does not care about automatically using machines so long as they are used for research purposes)
✅ Colab, if they are doing an experiment/collecting data of some kind, should be thankful to someone who made an extension as they are getting free data. (translation: Google is happy whether an extension is violating an unwritten rule or not)
✅ Colab has never replied to Chris's requests to validate that running the GPU72 project is okay or not (translation: they likely do not care)

[B]GpuOwl[/B]
All this info on Gpuowl is really intriguing. It sounds like Gpuowl is really a preferred way to go for some people. I wonder how long it would take to add Gpuowl into this project. Maybe not that long. Though the Gpuowl vs. CUDALucas performance is interesting to talk about, since we already have CUDALucas implemented as the cruncher for GPU work, one may consider that we could [U][I]add[/I][/U] in Gpuowl (as opposed to swap out) to the project and give users the ability to decide which cruncher to use (have not talked to Teal about that yet, though we already said we wanted to add Gpuowl). This also would be nice for testing as we would be using Colab machines with identical GPUs and assumedly a generally identical environment to test on.

If we didn't make it clear before in the README or elsewhere in the forum, we would love to use Gpuowl. All that constrains us is time and resources as we both have jobs and other projects. We want our project to be used by the most people and, one day, find a prime number (or more) and maybe even be on the front page of mersenne. If anyone is wanting to be involved in an upgrade from CUDALucas to Gpuowl, please contact us here or elsewhere.

kriesel 2021-02-26 12:21

[QUOTE=Prime95;572530]Yes, get an LL-DC assignment and then PRP it. Upload your PRP result and proof file as you normally would (either by prime95 or gpuowl's python script).[/QUOTE]
Manual submission for the gpuowl case yielded this response on [URL]https://www.mersenne.org/manual_result/:[/URL]
[CODE][COLOR=darkgreen]processing: PRP (not-prime) for [URL="https://www.mersenne.org/M58834309"]M58834309[/URL][/COLOR]
[COLOR=orange]Result type (150=PRP_COMPOSITE) inappropriate for the assignment type (101=LL_PRIME). Processing result but not deleting assignment.[/COLOR]
[COLOR=blue]CPU credit is 124.6960 GHz-days.[/COLOR]
[/CODE]https://www.mersenne.org/report_exponent/?exp_lo=58834309&full=1 shows the LL DC assignment marked as expired the day after it was issued.


[QUOTE=danc2;572569][B]Bot and Ethics Discussion[/B]
[/QUOTE]

I believe the posted points to be actually and expressly contradicted or made irrelevant by Colab's occasional output of the attachment shown in [url]https://mersenneforum.org/showpost.php?p=572432&postcount=28[/url]

And "we're not the only ones doing it or providing a tool for it" is not a credible defense for something Google does not allow. Google offers Colab for interactive use of notebooks specifically by humans, and not by interactive use by programs/robots. Who or what runs the notebook is the distinction I think they are making.

chalsall 2021-02-26 17:18

[QUOTE=danc2;572569]Please, anyone, quote the terms of service here to dispel any misinformation I am spouting if I am missing this. ...

✅ Colab has never replied to Chris's requests to validate that running the GPU72 project is okay or not (translation: they likely do not care)[/QUOTE]

While it is true that Colab has never replied to any of my (multiple) or anyone else's (multiple) attempts to reach them, that's not really unexpected. Google is famous for not engaging in "Human-to-Human" interaction unless you're spending a *lot* of money with them. Usually, we're the product, not the other way around... :wink:

My personal opinion on the whole automation thing is that because of the multiple "prove you're a human" challenges we've all faced using their instances over the last several months we've been playing with this, clearly, the intent is to have the human (slave) in the loop.

If people want to try to get around this, it's at their own risk. Personally, I just manually restart the instances when I happen to flip to that virtual desktop during my day.

kriesel 2021-02-26 17:45

[QUOTE=Prime95;572530]Yes, get an LL-DC assignment and then PRP it. Upload your PRP result and proof file as you normally would (either by prime95 or gpuowl's python script).[/QUOTE]If someone had the time and inclination, adding a choice to the manual assignment page to generate PRP DC worktodo lines for LLDC candidates would help us humans avoid rewriting the lines and adding our errors

Uncwilly 2021-02-26 18:19

1 Attachment(s)
[QUOTE=kriesel;572603]If someone had the time and inclination, adding a choice to the manual assignment page to generate PRP DC worktodo lines for LLDC candidates would help us humans avoid rewriting the lines and adding our errors[/QUOTE]
I bet if you ask James real nice, he would add that functionality to Mersenne.ca
Drop in your LL-DC lines from your worktodo, get PRP lines out.
I could set you up with an excel or g-sheets sheet for this.
[edit] I just did it in excel, it imports into g-sheets with no problem and works there. See attachment[/edit]

kriesel 2021-02-26 19:33

Ooppss
 
[QUOTE=kriesel;572531]Quadro 2000 gpus have ECC gpu ram.[/QUOTE]
[URL="https://www.techpowerup.com/gpu-specs/quadro-2000.c900"]Quadro 2000[/URL] and 4000 while designed for pro use do not have ECC; [URL="https://www.techpowerup.com/gpu-specs/quadro-5000.c897"]Quadro [B]5[/B]000[/URL] has ECC vram.

Prime95 2021-02-27 00:29

[QUOTE=kriesel;572603]If someone had the time and inclination, adding a choice to the manual assignment page to generate PRP DC worktodo lines for LLDC candidates would help us humans avoid rewriting the lines and adding our errors[/QUOTE]

Try mersenne.org's manual assignment page

kriesel 2021-02-27 15:52

[QUOTE=Prime95;572622]Try mersenne.org's manual assignment page[/QUOTE]Thanks George for making that more efficient and reliable by minimizing the middleman's work. [URL="https://www.mersenne.org/report_exponent/?exp_lo=58847077&full=1"]First try[/URL] worked fine with gpuowl v7.2-63-ge47361b & drag & drop the resulting proof onto prime95 v30.4b9's working folder, and subsequent quick Cert completion, except for the claim an LLDC assignment expired the day after it wasn't really assigned, a PRP was. Is it practical to do a similar PRP substitution for PrimeNet API LL DC candidates, for prime95/mprime v30.3 or above, preferably without requiring a client software modification & end user software version updating times n systems? (Given some pending assignments and the occasional P-1 stage 2 will restart from the beginning, wait till it's done to upgrade, warning, rollouts take weeks.)

tdulcet 2021-03-07 16:51

[QUOTE=tdulcet;572334]Yes, that would be a trivial change we will consider for the next version of our notebooks.[/QUOTE]

As requested by @LaurV, both our notebooks now support all the worktypes for the CPU that MPrime currently supports! Feedback is welcome.

[QUOTE=Prime95;572522]The PrimeNet server will happily accept a PRP test with proof for LL-DC work. So, you only need to download one gpuowl version.
Another gpuowl advantage is it will run P-1 if necessary, potentially saving a lengthy PRP test altogether.

Also, in prime95 you can cut the amount of disk space required in half. I'll bet gpuowl has a similar option.[/QUOTE]

Unfortunately, even half the disk space would still not work for many Colab Pro users or people doing 100 million digit tests. Some Colab Pro users are running eight or more notebooks with a total of 12 or more GIMPS program instances. If they were doing first time PRP tests with proofs (such as eight instances of MPrime and four instances of GpuOwl), that would still require over 21 GiBs of space. A free Colab user running two notebooks with a total of three GIMPS program instances (two instances of MPrime and one instance of GpuOwl) doing 100 million digit PRP tests with proofs would still require over 16.5 GiBs of space. Remember that most user's accounts are limited to just 15 GiB, which is also shared by the Gmail, Drive and Photos services. This is why our GPU notebook, if it were to support GpuOwl, would need to download and build two versions to support both LL DC and PRP tests respectively.

[QUOTE=kriesel;572531]for 7M fft, the older gpuowl version (probably one of Fan Ming's compiles, Google drive file date jan 21 2020) is producing iteration times 7.75-8.46 ms/iter on T4. An old [B]less optimized version of gpuowl[/B], running a [B]longer fft length[/B], still a little [B]faster[/B] than the Colab CUDALucas T4 timings posted [URL="https://www.mersenneforum.org/showpost.php?p=572514&postcount=31"]recently[/URL][/QUOTE]

OK, I was able to compile and run GpuOwl 6.11 [URL="https://github.com/preda/gpuowl/tree/5c5dc6669d748460c57ff1962fdbbbc599bac0d0"]commit 5c5dc6669d748460c57ff1962fdbbbc599bac0d0[/URL] (the last version that successfully compiles with GCC 7) on Colab. On the Tesla V100 GPU, GpuOwl is 638 us/iter and CUDALucas is around 1,140 us/iter, so I can confirm that GpuOwl is about 78.6% faster. Here are the results from 10,000 iterations of an LL test:
[CODE]2021-03-04 06:34:54 Tesla V100-SXM2-16GB-0 106928347 LL 0 loaded: 0000000000000004
2021-03-04 06:35:57 Tesla V100-SXM2-16GB-0 106928347 LL 100000 0.09%; 638 us/it; ETA 0d 18:56; 95920d6941eafe3f
2021-03-04 06:35:57 Tesla V100-SXM2-16GB-0 waiting for the Jacobi check to finish..
2021-03-04 06:36:45 Tesla V100-SXM2-16GB-0 106928347 OK 100000 (jacobi == -1)[/CODE]and 10,000 iterations of a PRP test:
[CODE]2021-03-04 06:37:33 Tesla V100-SXM2-16GB-0 106928347 OK 0 loaded: blockSize 400, 0000000000000003
2021-03-04 06:37:34 Tesla V100-SXM2-16GB-0 106928347 OK 800 0.00%; 638 us/it; ETA 0d 18:57; 7d85dc41e3222beb (check 0.41s)
2021-03-04 06:38:37 Tesla V100-SXM2-16GB-0 Stopping, please wait..
2021-03-04 06:38:38 Tesla V100-SXM2-16GB-0 106928347 OK 100000 0.09%; 639 us/it; ETA 0d 18:59; 4d66b4eed5ea9ab3 (check 0.42s)[/CODE]They were both using the 6M FFT length. We of course still need to test with the other Tesla GPUs available on Colab and with the latest version of GpuOwl. If anyone wants to reproduce these results, here are the commands to download, build and run this version of GpuOwl on Colab:
[CODE]
sudo apt-get update
sudo apt-get install libgmp3-dev -y
wget -nv https://github.com/preda/gpuowl/archive/5c5dc6669d748460c57ff1962fdbbbc599bac0d0.tar.gz
tar -xzvf 5c5dc6669d748460c57ff1962fdbbbc599bac0d0.tar.gz
cd gpuowl-5c5dc6669d748460c57ff1962fdbbbc599bac0d0
sed -i 's/<filesystem>/<experimental\/filesystem>/' *.h *.cpp
sed -i 's/std::filesystem/std::experimental::filesystem/' *.h *.cpp
sed -i 's/-Wall -O2/-Wall -g -O3/' Makefile
make -j "$(nproc)"
./gpuowl -h
./gpuowl -ll 106928347 -iters 100000
./gpuowl -prp 106928347 -iters 100000
[/CODE]Daniel and I have been testing various iterations of our two notebooks and new PrimeNet script for over nine months. They were extremely well tested before Daniel officially announced them in this thread. I expect that adding GpuOwl support would take several more months to implement and thoroughly test. Here is everything that we think would need to be done in order for our GPU notebook to support GpuOwl (let us know if we are missing anything):
[LIST][*]Update our PrimeNet Python script to support GpuOwl.[LIST][*]Update it to report LL/PRP results, GpuOwl seems to use the same JSON format as Mlucas.[*]Update it to [URL="https://www.mersenneforum.org/showthread.php?p=541793#post541793"]report progress[/URL], the format seems to differ slightly based on LL vs PRP (see above) and based on version.[*]Add support for reporting P-1 results, including standalone assignments, when done before an LL test and when combined with a PRP test.[*]Add support for [URL="https://github.com/preda/gpuowl/blob/master/tools/upload.py"]uploading PRP proofs[/URL].[*]Replace the existing [C]-g/--gpu[/C] option with new [C]--cudalucas[/C] and [C]-g/--gpuowl[/C] options.[/LIST] [*]Create new GpuOwl install script.[LIST][*]Needs to handle AMD, Nvidia and Intel GPUs (not necessary for Colab, but still important so that it can be independently tested and used).[*]Needs to download, build and run both the latest version of GpuOwl for PRP tests with proofs and version ~6.11 for LL DC and standalone P-1 tests.[*]Needs to support both the GCC and Clang compilers, detect the compiler version and dynamically modify the GpuOwl code based on that (see first two [C]sed[/C] commands above).[*]Needs to detect if the GNU Multiple Precision (GMP) library and OpenCL are already installed and if not, install them.[/LIST] [*]Create new wrapper script to run the correct version of GpuOwl.[LIST][*]Needs to run the correct version based on the next assignment in the worktodo file, probably by maintaining two worktodo and two results files.[*]For LL assignments, needs to first start a P-1 factoring test if that has not yet been done.[*]Needs to automatically handle any idiosyncrasies between the versions for the user.[/LIST] [*]Update our GPU notebook to use GpuOwl (requires all of the above).[/LIST] As Daniel and I said before, pull requests are welcome if anyone wants to help do any of this!

We would also obviously need latest version of GpuOwl to always successfully build on Colab. To achieve this, [URL="https://github.com/preda/gpuowl/pull/217"]I submitted a pull request[/URL] to the GpuOwl repository, which adds Continuous Integration (CI) to automatically build GpuOwl on Linux (with both GCC and Clang) and Windows on every commit and pull request. It was merged a few days ago. This allows users to now see directly on the top of the [URL="https://github.com/preda/gpuowl"]GpuOwl README[/URL] if the latest version of GpuOwl builds by checking the badges. It should also [I]eventually[/I] eliminate the need for @kriesel to have to manually build and upload binaries for Windows users. See my pull request for more info.

[QUOTE=Prime95;572622]Try mersenne.org's manual assignment page[/QUOTE]
[QUOTE=kriesel;572666]Is it practical to do a similar PRP substitution for PrimeNet API LL DC candidates, for prime95/mprime v30.3 or above[/QUOTE]

This added a new [C]155[/C] worktype to the manual assignment page. I am curious if the worktype would also work with the PrimeNet API, as we could trivially add support for it to our PrimeNet script if/when we add support for uploading PRP proofs.

Prime95 2021-03-07 18:00

[QUOTE=tdulcet;573177]This added a new [C]155[/C] worktype to the manual assignment page. I am curious if the worktype would also work with the PrimeNet API, as we could trivially add support for it to our PrimeNet script if/when we add support for uploading PRP proofs.[/QUOTE]

It ought to work.

kriesel 2021-03-07 19:46

[QUOTE=tdulcet;573177]
OK, I was able to compile and run GpuOwl 6.11 [URL="https://github.com/preda/gpuowl/tree/5c5dc6669d748460c57ff1962fdbbbc599bac0d0"]commit 5c5dc6669d748460c57ff1962fdbbbc599bac0d0[/URL] (the last version that successfully compiles with GCC 7) on Colab. On the Tesla V100 GPU, GpuOwl is 638 us/iter and CUDALucas is around 1,140 us/iter, so I can confirm that GpuOwl is about 78.6% faster. Here are the results from 10,000 iterations of an LL test:
[CODE]2021-03-04 06:34:54 Tesla V100-SXM2-16GB-0 106928347 LL 0 loaded: 0000000000000004
2021-03-04 06:35:57 Tesla V100-SXM2-16GB-0 106928347 LL 100000 0.09%; 638 us/it; ETA 0d 18:56; 95920d6941eafe3f
2021-03-04 06:35:57 Tesla V100-SXM2-16GB-0 waiting for the Jacobi check to finish..
2021-03-04 06:36:45 Tesla V100-SXM2-16GB-0 106928347 OK 100000 (jacobi == -1)[/CODE]and 10,000 iterations of a PRP test:
[CODE]2021-03-04 06:37:33 Tesla V100-SXM2-16GB-0 106928347 OK 0 loaded: blockSize 400, 0000000000000003
2021-03-04 06:37:34 Tesla V100-SXM2-16GB-0 106928347 OK 800 0.00%; 638 us/it; ETA 0d 18:57; 7d85dc41e3222beb (check 0.41s)
2021-03-04 06:38:37 Tesla V100-SXM2-16GB-0 Stopping, please wait..
2021-03-04 06:38:38 Tesla V100-SXM2-16GB-0 106928347 OK 100000 0.09%; 639 us/it; ETA 0d 18:59; 4d66b4eed5ea9ab3 (check 0.42s)[/CODE][/QUOTE]That commit is from June 1 2020, somewhere between V6.11-295 and V6.11-318 (I think v6.11-307). There is somewhat more gpuowl performance to be had by reaching ~v6.11-364 to -380, from V6.11-318 upward. Also a number of fixes.

The description of Colab users running up to a dozen notebooks from one account and on large exponents seems like reaching to create a problem where little exists. Users can (a) use multiple free email accounts and spread their usage across several free Google Drive 15GB allotments, (b) run a mix of large, normal and small exponents (~56M DC could sure use the additional throughput), (c) rent additional Google Drive space, and possibly use other free cloud storage, (d) help out with P-1, which is chronically unable to keep up with the wavefront first-test demand, (e) run unusually low proof power PRP when other measures are not sufficient, (f) combine approaches.

It all seems somewhat moot though, since it's already been well established, that the automaton notebook / browser addon approach that is the subject of this thread, violates Google Colab's regularly expressed conditions of use, while GPU72 scripts as provided and used by chalsall, and notebooks provided by others, ran manually/interactively as Google intends and has clearly stated that intent, do not. I suggest a priority item missing from the [URL="https://mersenneforum.org/showpost.php?p=573177&postcount=44"]to-do list[/URL] is to bring the addon into (at least user-optional, if not intrinsic) compliance with the conditions of free Colab use. To continue to decline to, after notice from Google Colab, and periodic discussion here in a public forum, would leave you in what seems to me a quite vulnerable legal position should the Google legal team become interested. (Not a lawyer, but I've hired some on occasion. The need is best avoided.) [B]I won't be using your addon while that remains unaddressed.[/B]

[QUOTE]We would also obviously need latest version of GpuOwl to always successfully build on Colab. To achieve this, [URL="https://github.com/preda/gpuowl/pull/217"]I submitted a pull request[/URL] to the GpuOwl repository, which adds Continuous Integration (CI) to automatically build GpuOwl on Linux (with both GCC and Clang) and Windows on every commit and pull request. It was merged a few days ago. This allows users to now see directly on the top of the [URL="https://github.com/preda/gpuowl"]GpuOwl README[/URL] if the latest version of GpuOwl builds by checking the badges. It should also [I]eventually[/I] eliminate the need for @kriesel to have to manually build and upload binaries for Windows users. See my pull request for more info.[/QUOTE]I don't see where the successfully built Ubuntu v20 can be downloaded for use. It's a nice feature for Preda to see quickly what builds successfully and what does not, with no dependence on users to try and report back, or overhead to try it himself on multiple OSes.

Note that from my recent limited testing, and from some others, there appears to be a [URL="https://mersenneforum.org/showpost.php?p=573188&postcount=197"]speed regression in gpuowl[/URL]. Getting boxed in to run only the latest commit despite earlier commits being noticeably faster is not optimal. Some users want version control for performance, stability, or perhaps other reasons.

danc2 2021-03-07 21:35

[QUOTE]it's already been well established, that the automaton notebook / browser addon approach that is the subject of this thread, violates Google Colab's regularly expressed conditions of use,[/QUOTE]

I have a couple clarifications on these statements:
1. The "notebook / browser addon approach" is [B]not [/B] [I]the [/I]subject of this thread. Allowing users to run full primality tests on Google Colab for research purposes is (see title). Any other discussions are important, but ancillary.
2. I think people will have to make their own choices regarding using the extension in conjunction with this project and keep in mind those very good points about potential legal consequences.
3. One can support the notebook, but not the extension.
4. The notebook by itself does not have any automation that could be construed as going against Colab's terms of service IMO.

chalsall 2021-03-07 22:59

[QUOTE=danc2;573198]I have a couple clarifications on these statements:
1. The "notebook / browser addon approach" is [B]not [/B] [I]the [/I]subject of this thread. Allowing users to run full primality tests on Google Colab for research purposes is (see title). Any other discussions are important, but ancillary.[/QUOTE]

Just for clarity... tdulcet (your "partner in crime" (that's meant to be funny)) did say in message number 4 of this thread that your browser add-on could be used by anyone using Colab running any Notebook.

Several of us pointed out that that /might/ not be the best idea.

[QUOTE=danc2;573198]2. I think people will have to make their own choices regarding using the extension in conjunction with this project and keep in mind those very good points about potential legal consequences.[/QUOTE]

The "legal consequences" would be at worst a ban of that user from using Google Colab services.

Personally, I can't take that risk (I use this for other purposes as well as the GPU72 Notebook). Others might not have that constraint.

[QUOTE=danc2;573198]3. One can support the notebook, but not the extension.
4. The notebook by itself does not have any automation that could be construed as going against Colab's terms of service IMO.[/QUOTE]

Completely agree.

You have done good work here gentlemen. And have taken constructive criticism appropriately.

It is great seeing different approaches being taken with these "free" resources. It's particularly cool that you have a Git sharing all of your code. :tu:

danc2 2021-03-08 04:00

[QUOTE]partner in crime[/QUOTE]
Maybe a double-entendre, haha. Either way, I'm glad to have that title :smile:.

[QUOTE]tdulcet [said] that your browser add-on could be used by anyone using Colab running any Notebook.[/QUOTE]

I don't think this conflicts with what I said (the emphasis being that the extension is not [B][U]the[/U][/B] (pronounced thee) point of the project). I think Teal's word choice ("can" instead of "should" or "have to") also supports this point of view.

[QUOTE]The "legal consequences" would be at worst a ban of that user from using Google Colab services.[/QUOTE]

I 100% agree/I think you convinced me of this when you made your point about Google only really inquiring further when consumers are high-paying or costing them lots (paraphrased). Though I am not equating low consequences being a sign that something is morally good and I am sure no one in this forum is saying that as well.

[QUOTE]It's particularly cool that you have a Git sharing all of your code.[/QUOTE]

Thanks! I feel like Teal is a naturally humble person and always takes less credit than he deserves as a result, but he is a huge brain in this project and is responsible for a lot of the good in these projects and usually if there is a bad, it's a mistake of mine, haha. Other forum users can verify his resourcefulness as well. We were glad to contribute to this cool project and have learned and continue to learn a lot by working on it and interacting with users here!

tdulcet 2021-03-08 17:18

[QUOTE=kriesel;573190]That commit is from June 1 2020, somewhere between V6.11-295 and V6.11-318 (I think v6.11-307). There is somewhat more gpuowl performance to be had by reaching ~v6.11-364 to -380, from V6.11-318 upward. Also a number of fixes.[/QUOTE]

Yeah, my potential GpuOwl install script would need to work with the default GCC version. Colab uses Ubuntu 18.04, which includes GCC 7.5. As I said, that was the last version/commit that successfully compiles with GCC 7. The next commit unfortunately adds C++20 syntax which requires at least GCC 8. Colab is working on upgrading to Ubuntu 20.04 (see [URL="https://github.com/googlecolab/colabtools/issues/1880"]here[/URL]), which includes GCC 9.3, so hopefully they will do that soon, so we would be able to use a later commit of GpuOwl v6, as well as the latest version

[QUOTE=kriesel;573190]The description of Colab users running up to a dozen notebooks from one account and on large exponents seems like reaching to create a problem where little exists.[/QUOTE]

These situations actually apply to both Daniel and I. Particularly to Daniel, where they both apply.

[QUOTE=kriesel;573190]It all seems somewhat moot though, since it's already been well established, that the automaton notebook / browser addon approach that is the subject of this thread, violates Google Colab's regularly expressed conditions of use, while GPU72 scripts as provided and used by chalsall, and notebooks provided by others, ran manually/interactively as Google intends and has clearly stated that intent, do not.[/QUOTE]

As Daniel said, the extension is completely separate from our notebooks and they both can be used independently. Our notebooks by themselves must be run manually/interactively, just like the GPU72 notebook and all other Colab notebooks.

[QUOTE=kriesel;573190]I suggest a priority item missing from the [URL="https://mersenneforum.org/showpost.php?p=573177&postcount=44"]to-do list[/URL] is to bring the addon into (at least user-optional, if not intrinsic) compliance with the conditions of free Colab use.[/QUOTE]

Usage of the extension/add-on is completely optional. Even if people decide to use it, it will NOT run the first cell of their notebooks by default. They must explicitly enable this optional feature. (The extension does not yet have an options page, so users would actually need to set [URL="https://github.com/tdulcet/Colab-Autorun-and-Connect/blob/master/background.js#L5"]a global variable[/URL] in the source code to enable it.) If there is some specific missing feature you would like added, please let us know and we will consider it for the next version.

Please note that both Daniel and I completely 100% disagree with your premise that the extension/add-on is not allowed or is somehow violating Colab's terms of service. Daniel [URL="https://www.mersenneforum.org/showpost.php?p=572569&postcount=36"]explicitly asked[/URL] for anyone to quote the terms of service to refute his arguments and no one has. The only augment that has been made centers on those popups that say, "Colab is for [URL="https://research.google.com/colaboratory/faq.html#usage-limits"]interactive use[/URL]". As I explained [URL="https://www.mersenneforum.org/showpost.php?p=572514&postcount=31"]in great detail[/URL], the extension is NOT designed to be used noninteractivity. The "interactive use" link in the popups just go to the Colab FAQ section on "usage limits", which only says, "GPUs and TPUs are sometimes prioritized for users who use Colab interactively rather than for long-running computations". Daniel and I take this to mean that even noninteractive use of Colab is OK, users will just get less runtime, but again, the extension is NOT designed for this. We would NOT have created the extension or be using it if we thought we were breaking any rules or risking anything.

As Daniel alluded to, there are currently three other open source Colab browser add-ons, all published on Google's own Chrome Web Store, which would have had to been reviewed and accepted by Google:
[LIST][*][URL="https://chrome.google.com/webstore/detail/colab-auto-reconnect/nbcihfbfamjlfiopdcemmohoojdecjid"]Colab Auto Reconnect[/URL][*][URL="https://chrome.google.com/webstore/detail/colab-alive/eookkckfbbgnhdgcbfbicoahejkdoele"]Colab Alive[/URL][*][URL="https://chrome.google.com/webstore/detail/colab-auto-reconnect/ifilpgffgdbhafnaebocnofaehicbkem"]Colab Auto Reconnect[/URL][/LIST]If Google did not want people using these types of Colab browser add-ons, they would not have accepted them to their Chrome Web Store. However, if you or anyone still feels that the extension is not allowed, then just do not use it. They can still use and benefit from our notebooks.

Note that our extension [URL="https://github.com/tdulcet/Colab-Autorun-and-Connect/blob/master/LICENSE"]is licensed[/URL] with the MPL, which protects Daniel and me from any liability from other people misusing/abusing it.

[QUOTE=kriesel;573190]I don't see where the successfully built Ubuntu v20 can be downloaded for use.[/QUOTE]

It cannot be downloaded currently. The keyword was the "eventually" in italics. I purposely did not enable this feature in my pull request, as I was waiting until the Windows builds compile successfully. See the pull request for more info. Note that the Linux builds are not of much use since it is much more difficult to transfer the Linux binaries between different systems. The Ubuntu 20.04 builds would likely only work on 64-bit x86 systems with Ubuntu 20.04 or higher. I would highly recommend that Linux users build GpuOwl themselves, which it what my potential install script would do.

[QUOTE=kriesel;573190]Note that from my recent limited testing, and from some others, there appears to be a [URL="https://mersenneforum.org/showpost.php?p=573188&postcount=197"]speed regression in gpuowl[/URL]. Getting boxed in to run only the latest commit despite earlier commits being noticeably faster is not optimal. Some users want version control for performance, stability, or perhaps other reasons.[/QUOTE]

Interesting. My potential GpuOwl install script and thus the GPU notebook were going to follow the recommended installation instructions for GpuOwl, which even going by [URL="https://www.mersenneforum.org/showpost.php?p=532454&postcount=21"]your own post[/URL], suggests using the latest commit of GpuOwl. The latest commit also likely has important bug fixes. We are open to suggestions for alternative approaches, although letting users select arbitrary versions could easily break the install script itself and/or our PrimeNet script. It would likely be better for users to report any performance regressions, as you did, so that they can be fixed. Hopefully this performance regression in GpuOwl will be fixed soon.

ewmayer 2021-03-09 20:53

[QUOTE=tdulcet;573177]We would also obviously need latest version of GpuOwl to always successfully build on Colab. To achieve this, [URL="https://github.com/preda/gpuowl/pull/217"]I submitted a pull request[/URL] to the GpuOwl repository, which adds Continuous Integration (CI) to automatically build GpuOwl on Linux (with both GCC and Clang) and Windows on every commit and pull request. It was merged a few days ago. This allows users to now see directly on the top of the [URL="https://github.com/preda/gpuowl"]GpuOwl README[/URL] if the latest version of GpuOwl builds by checking the badges. It should also [I]eventually[/I] eliminate the need for @kriesel to have to manually build and upload binaries for Windows users. See my pull request for more info.[/QUOTE]

Possible issue - how to handle savefile incompatibility between client versions? This is not just an issue for gpuowl, but I got dinged by it when I upgraded my build of that client on one of my multi-GPU systems last December - on restart, every one of my partially-done PRP tests restarted from scratch. My own convention with Mlucas is that such savefile incompatibility may only possibly arise between versions having differing major-rev numbers, and even there, I try to make things upward-compatible whenever possible. E.g. v19 added some fields to the savefiles to support the Gerbicz-check, but v18 LL and DC savefiles were compatible for use by v19. v20 will add some bytes for cumulative counts of several types of errors as well as some stuff needed for p-1 run restart, but will be able to read in v19 LL|PRP-test savefiles. But it seems some kind of agreed-upon convention amongst the authors of the various clients here would be good.

In the presence of some kind of tagging system with regard to savefile-compatibility, one then needs a mechanism for handling work when a new version breaks compatibility, which keeps the older build around to finish any ongoing assignments, and then switches to the newer client.

tdulcet 2021-03-10 14:17

[QUOTE=ewmayer;573315]Possible issue - how to handle savefile incompatibility between client versions? This is not just an issue for gpuowl[/QUOTE]

Interesting problem, thanks for bringing it up. The potential new GPU notebook that uses GpuOwl would only download and build GpuOwl the first time it was run, as it would save the binary to the user's Google Drive, just as the notebook currently does with CUDALucas. If the user wanted to upgrade GpuOwl, they would likely need to delete a [C]GIMPS/gpuowl[/C] directory from their Google Drive and rerun the notebook. This is similar to how users will probably need to delete the [C]mlucas_v19.1[/C] directory and rerun the Mlucas install script to upgrade to Mlucas v20.

[QUOTE=ewmayer;573315]In the presence of some kind of tagging system with regard to savefile-compatibility, one then needs a mechanism for handling work when a new version breaks compatibility, which keeps the older build around to finish any ongoing assignments, and then switches to the newer client.[/QUOTE]

Adding an upgrade feature as you described to my three existing install scripts and potential GpuOwl install script would be very nice, but it would obviously add a lot of complexity. An agreed upon convention between GIMPS program developers would of course be the best solution. Prime95/MPrime has a "Quit GIMPS" feature, where it will not download any new assignments, but will still finish and report the existing ones. This would be trivial to add to [URL="https://www.mersenneforum.org/showthread.php?t=26574"]our PrimeNet script[/URL] and I think would in the meantime solve most of the problem you described, as users could then easily finish their current assignments before upgrading the GIMPS program to a version with incompatible savefiles. For GpuOwl in particular, it would also be good if someone wrote a short test suite that could be run by the CI and could check for savefile incompatibilities.

drkirkby 2021-03-13 17:31

[QUOTE=kriesel;572256]
Google Colaboratory has resorted at times to requiring ostensibly human image analysis before authorizing a Colab session.[/QUOTE]


I tried to sign up to get an account on Amazon AWS today. I simply can not get past their capcha.

danc2 2021-03-14 07:20

drkirkby,

This is a notebook made for Google Colab. We have not tested it with other notebook instances so there is no guarantees. Besides that, it sounds like your issue is with Amazon AWS captchas upon signup, not the notebooks. Those bother me as well...I hope you can get that resolved.

bayanne 2021-03-21 09:15

Currently getting about 10 minutes at a time running gpu ...

LaurV 2021-03-21 16:00

Question: Is it normal under Linux that the performance of the CPU degrades (to about a half) when [STRIKE]gpuowl[/STRIKE] cudaLucas is running? This is what I experience with the two "tricks" that are the subject of the current thread. The one "cpu only" gets some ms per iteration, but when I am lucky enough and get a gpu, the cpu in that case gets a double (almost) amount of ms/iteration. Local, under windoze, I remember there was a version of the [STRIKE]AMD[/STRIKE] Nvidia drivers some time ago that caused [STRIKE]gpuOwl[/STRIKE] cudaLucas to steal CPU clocks (like one full core or so). Later on, this was fixed either by [STRIKE]Mihai[/STRIKE] FlashJH (?) Dubslow (?), or by upgrading the drivers, no idea (i.e. I don't remember how it was fixed and by who), but now, for the local machine(s) we don't experience any millisecond difference in P95, when [STRIKE]gpuOwl[/STRIKE] cudaLucas runs or not in any of the [STRIKE]Rvii's[/STRIKE] 2080Ti, or in all together. So, is this a "colab" thing? Linux thing? Can you use different [STRIKE]AMD[/STRIKE] Nvidia drivers in your notebooks? Or is it a "me only" thing? How to get rid of it?

To be clear, right now, for me, as a "non-US" (therefore can't get the Pro, I tried!), it doesn't worth much to run the "CPU and GPU" toy. I never get a GPU which is good for PRP, so every time I connect, I use the script which they give as example in the intro:
[CODE][COLOR=#000000][FONT=monospace][COLOR=#000000]gpu_info = !nvidia-smi[/COLOR]
[COLOR=#000000]gpu_info = [/COLOR][COLOR=#a31515]'\n'[/COLOR][COLOR=#000000].join(gpu_info)[/COLOR]
[COLOR=#af00db]if[/COLOR][COLOR=#000000] gpu_info.find([/COLOR][COLOR=#a31515]'failed'[/COLOR][COLOR=#000000]) >= [/COLOR][COLOR=#09885a]0[/COLOR][COLOR=#000000]:[/COLOR]
[COLOR=#795e26]print[/COLOR][COLOR=#000000]([/COLOR][COLOR=#a31515]'Select the Runtime > "Change runtime type" menu to enable a GPU accelerator, '[/COLOR][COLOR=#000000])[/COLOR]
[COLOR=#795e26]print[/COLOR][COLOR=#000000]([/COLOR][COLOR=#a31515]'and then re-execute this cell.'[/COLOR][COLOR=#000000])[/COLOR]
[COLOR=#af00db]else[/COLOR][COLOR=#000000]:[/COLOR]
[COLOR=#795e26]print[/COLOR][COLOR=#000000](gpu_info)[/COLOR]
[/FONT][/COLOR][/CODE]to see what GPU they gave me (because this is running much faster than waiting for the other toys to install stuff, especially Chris' gpu72 toy is soooooo slow in doing that! Chris, you should first display the GPU, and only after proceed to download stuff!) and in case I got a T4, I will run gpu72 script. In case I got anything else (which is the half of k80, or the p4) then I will ignore it and run Teal's "CPU only" toy. It doesn't worth to run the "GPU" version in those GPUs, they are extremely slow, never finish, and they'll slow the CPU too, to half of the speed. So, no go.

danc2 2021-03-21 21:34

[QUOTE]Currently getting about 10 minutes at a time running gpu ...[/QUOTE]
Availability depends largely upon region. If you are getting disconnected a lot and/or would like to make the process automatic, I would try and use the extension.

[QUOTE]Is is normal under Linux that the performance of the CPU degrades (to about a half) when gpuowl is running?[/QUOTE]
The current project does not use gpuowl yet (plans, work to do so have been started). Instead, CUDALucas is used. I am not well-versed in why this would happen, but I can say that the cpu that you are offered is not always the same per VM, so it could be that you are getting a lower-quality CPU when being assigned a backend that has a GPU. This could be either intentional (i.e., Google is giving high-end GPU machines low-end CPUs with the expectation that a user will use the GPU more often than the CPU) or coincidental. Or I could be wrong completely.

What I would do in your situation is to have a notebook that runs a "GPU and CPU" notebook (you may run two at a time also if you want) and have 1-4 "CPU only" notebooks running as Google allows you multiple notebooks and you can make progress this way on multiple assignments at a time and make progress despite whatever hardware idiosyncrasies exist (I assume your goal is to make the most progress as possible).

[QUOTE]...as a "non-US" (therefore can't get the Pro, I tried!)[/QUOTE]
This is really weird Google has not opened this up to other users from different countries. I'm not sure why. You could try using a VPN and signing up for a Gmail/Colab pro account, but maybe it is based on the location of your payment plan...though that could be risky. I'm sorry this does not work for you.

chalsall 2021-03-21 22:23

[QUOTE=danc2;574315]Availability depends largely upon region. If you are getting disconnected a lot and/or would like to make the process automatic, I would try and use the extension.[/QUOTE]

I, on the other hand, would advise against this.

At least, for anyone who takes their relationship with Google seriously.

Some do; others don't. By definition, sentients have free will, and so will make and are responsible for their own decisions.

danc2 2021-03-22 00:13

For those interested and who have not followed the thread, see the following posts which mention arguments for and against the use of the extension (mods feel free to add others):


[B]For[/B]:
[URL="https://www.mersenneforum.org/showpost.php?p=573252&postcount=50"]Post #1[/URL]

[B]Against[/B]:
[URL="https://www.mersenneforum.org/showpost.php?p=572256&postcount=15"]Post #1[/URL]

chalsall 2021-03-22 01:27

[QUOTE=danc2;574323]For those interested and who have not followed the thread, see the following posts which mention arguments for and against the use of the extension (mods feel free to add others):[/QUOTE]

Personally, I find this post disingenuous.

For those who want to actually understand the various positions argued, please read this thread from the top.

Various "new phrases" came to mind during this reply. But I deleted them before hitting the "Submit Reply".

P.S. They might have included a certain person who has been said walked on water to do unimaginable things to themself...

LaurV 2021-03-22 02:30

[QUOTE=danc2;574315]The current project does not use gpuowl yet[/QUOTE]
Sorry. :bow: Total brain fart on my side. Please substitute in all my post in your mind, when you read it, "gpuOwl" with "cudaLucas". I was talking about cudaLucas and Nvidia drivers. The situation I described (GPU work taking a CPU core) happened with cudaLucas and Nvidia drivers in the past. But I got it messed up somehow. I guess that's happening when posting at midnight, with 39°C fever (being arguing with a flu for few days).

The topic stands. When I am using "Colab CPU Notebook" I am getting 2.5-3.5 ms/iter, depending of the exponent and CPU given (PRP-CF work), but when I am using "Colab GPU and CPU Notebook", I am getting 4.5-5.5 ms/iter for the CPU work. In this case, a GPU like P4 or the "half k80" doesn't help, they are very slow, even for LLDC work, they don't last long, and I don't get them so often to be able to finish any assignment in time, so the GPU work will not compensate for the lost CPU work. And a T4 is a waste for PRP/LL anyhow, as that is so good for TF.

danc2 2021-03-22 03:18

[QUOTE]Personally, I find this post disingenuous.[/QUOTE]
Although that may be the perception, it was not the intention. I personally gain no benefit from advocating a free extension. Users are busy, including myself, and I was hoping to avoid reading 6+ pages of forum data, in which non-related topics are discussed. Because specifics were not discussed in your post either, maybe it would have been more beneficial to say "read from the top" as what was said adds no meat to the discussion beyond what has already been said. Please feel free to pm me any specific posts if you'd like my previous post to appear more ingenuous.

[QUOTE]The topic stands.[/QUOTE]
Understood. I am not qualified to respond to the hardware specifics beyond what supposition I supplied. I hope someone else may be able to answer your question better!

LaurV 2021-03-22 06:46

Nah, wait, wait, don't go! :smile:

It wasn't anything about hardware. You didn't reply my question: does anybody experiences increasing in iteration time for the CPUs on colab, when the GPU is running, with [U]your[/U] "gpu and cpu notebook"? Because if so, then you may be using the wrong driver, or you would need a tweak of it or of cudaLucas (or dld a different/older driver when you install the notebook). Or this is a Linux issue of which I am not aware? Running local (windows, again, I have no linux experience) does not show this behavior. With or without cL running, the P95 iterations take the same amount of time (regardless if 2080Ti, 1080Ti, Titan, 580 fermi, these are the only cards I have now).

Anybody else can answer/weight in? Or is only me running Teal's toys?

Edit: I edited the brain farting post, just to have it right for the future readers. My problem, as expressed in above post:

When I am using "Colab CPU Notebook" I am getting 2.5-3.5 ms/iter, depending of the exponent and CPU given (PRP-CF work), but when I am using "Colab GPU and CPU Notebook", I am getting 4.5-5.5 ms/iter for the CPU work. In this case, a GPU like P4 or the "half k80" doesn't help, they are very slow, even for LLDC work, they don't last long, and I don't get them so often to be able to finish any assignment in time, so the GPU work will not compensate for the lost CPU work. And a T4 is a waste for PRP/LL anyhow, as that is so good for TF.

So, if you want me to use your "GPU and CPU" toy, you have to fix the (whatever?) to make the GPU work come as additional work, and not as "instead of CPU" work. 101 miles per hour is better that 100 miles per hour, but 101 miles per hour and headache is not. :razz:

LaurV 2021-03-22 07:15

Also, for Chris, you maybe didn't read the ending of my post, there was there an argument that you should display the GPU [U][B]before[/B][/U] proceeding to download all the stuff (which takes ages). The reason is that if I see I don't have the "right" GPU, then I can click the "factory default" button before waiting an eon and a bit more for all the stuff to download, and get a new (possible better) GPU. This will save a lot of time at startup. In fact, based on Teal's idea, you should check if the stuff is downloaded, and don't download it every time. Keep it in drive (i.e. not in "home" as it is currently, home is gone when reset).

tdulcet 2021-03-22 12:02

[QUOTE=LaurV;574292]Question: Is it normal under Linux that the performance of the CPU degrades (to about a half) when [STRIKE]gpuowl[/STRIKE] cudaLucas is running?[/QUOTE]

No. All the Colab VMs have one hperthreaded CPU core with two threads. MPrime does not use the hyperthreading by default when doing LL/PRP tests, so it will only use one of those CPU threads. On the GPU notebook, CUDALucas will use the other CPU thread, which will cause a small performance reduction for MPrime, but definitely not half. As with the GPUs, there is a large range in the performance of the CPUs provided, which I bet accounts for the majority of what you experienced. For wavefront first time primality tests with both notebooks, I have gotten everything between around 25 ms/iter for the AVX-512 CPUs to 48 ms/iter for the FMA3 ones, although I usually get 35-40 ms/iter. I suspect that if you ran both our notebooks for a few weeks, you would find that your average ms/iter times were about the same.

[QUOTE=LaurV;574292]I never get a GPU which is good for PRP, so every time I connect, I use the script which they give as example in the intro:[/QUOTE]

Our "GPU and CPU" notebook [URL="https://github.com/tdulcet/Distributed-Computing-Scripts/blob/master/google-colab/GoogleColabGPU.ipynb?short_path=0926445#L154-L161"]includes nearly identical code[/URL] to that and it will output the name of the current GPU on the very first line. Both our notebooks will output the name of the current CPU and [URL="https://github.com/tdulcet/Linux-System-Information"]more system information[/URL]. They will also output counts of all previous GPUs and CPUs respectively.

[QUOTE=LaurV;574292]It doesn't worth to run the "GPU" version in those GPUs, they are extremely slow, never finish, and they'll slow the CPU too, to half of the speed. So, no go.[/QUOTE]

In my experience doing primality testing, even the slowest GPU (the Tesla P4) is over twice as fast as the fastest CPU available (one of the AVX-512 ones).

[QUOTE=LaurV;574332]The situation I described (GPU work taking a CPU core) happened with cudaLucas and Nvidia drivers in the past. [/QUOTE]

Yes, CUDALucas will generally use 100% of one of the two CPU threads on Colab. However, from my limited testing of GpuOwl on Colab, it will also use 100% of one of the CPU threads, so I do not think there is a driver issue. Our GPU notebook does not install any drivers, as that is handled by Colab.

[QUOTE=LaurV;574332]In this case, a GPU like P4 or the "half k80" doesn't help, they are very slow, even for LLDC work, they don't last long, and I don't get them so often to be able to finish any assignment in time, so the GPU work will not compensate for the lost CPU work.[/QUOTE]

For a category 4 LL DC test, you should have [URL="https://www.mersenne.org/thresholds/"]360 days to complete it[/URL] since the notebook is using [URL="https://www.mersenneforum.org/showthread.php?t=26574"]our PrimeNet script[/URL], which uses the PrimeNet API. Regardless, even the slowest GPU on Colab should be able to complete an LL DC test in less than two weeks. The faster GPUs (the Tesla P100 and V100) should be able to do it in less than a day. There also should not be any "lost CPU work", as users can run both notebooks at the same time, so it would just be "additional CPU work".

[QUOTE=LaurV;574344]Or is only me running Teal's toys?[/QUOTE]

FYI, both our notebooks are just as much Daniel's as they are mine, although I appreciate the complement. :smile: It was actually [URL="https://github.com/tdulcet/Distributed-Computing-Scripts/issues/3"]Daniel's idea[/URL] to create them in the first place and he did most of the initial work.

[QUOTE=LaurV;574344]So, if you want me to use your "GPU and CPU" toy, you have to fix the (whatever?) to make the GPU work come as additional work, and not as "instead of CPU" work.[/QUOTE]

I am not sure what you mean by this... With our "GPU and CPU" notebook, the GPU work is completely separate from the CPU work and users can select the worktypes independently. Both our notebooks are also designed to be used at the same time.

BTW, I have finished most of the [URL="https://www.mersenneforum.org/showpost.php?p=573177&postcount=44"]necessary work[/URL] for our GPU notebook to use GpuOwl. We are just waiting on Colab [URL="https://github.com/googlecolab/colabtools/issues/1880"]to upgrade to Ubuntu 20.04[/URL] so we can start testing and finish updating our PrimeNet script...

LaurV 2021-03-22 13:44

This very nicely answers all my questions.

Summary: cudaLucas taking one full CPU core and reducing the CPU performance to half (that is what I experience, regardless of what you, being in US and using Pro account, say), is a colab thing, and nothing can be done about it (drivers are handled by colab).

The part with using both notebooks in the same time does not apply to me, I can't do that unless I use multiple accounts, and that is what I was referring to when I said "headache". Also, "a gpu will complete whatever work in two weeks" [U]if you get it[/U]. If you get two hours today, two after 3 days, then that work will never complete, and (due to separate CPU pools of assignments) bottleneck the CPU work. That is what I was referring as "combine them together" in one of my first posts in this thread, and if I get a GPU, do that, if not, do this. But use a common pool. Then the GPU work comes as additional, the 101st mile, not as a showstopper.

Anyhow, thanks a lot for the notebooks, and for the answers. Good job.

(my last sentence was intentionally formulated so you will "feel threatened" and reply :razz:)

tdulcet 2021-03-22 15:08

[QUOTE=LaurV;574359]cudaLucas taking one full CPU core and reducing the CPU performance to half [/QUOTE]

The Colab VMs have one CPU core with two CPU threads and CUDALucas only uses one of those threads. As an experiment, you can always try commenting out one of the lines in our GPU notebook that starts CUDALucas and seeing if the performance improves. If you have the [C]output_type[/C] set to "CPU (Prime95)", then temporally comment out [URL="https://github.com/tdulcet/Distributed-Computing-Scripts/blob/master/google-colab/GoogleColabGPU.ipynb?short_path=0926445#L81"]this line[/URL] and rerun the cell.

[QUOTE=LaurV;574359]that is what I experience, regardless of what you, being in US and using Pro account, say[/QUOTE]

We have tested with both the free Colab and Colab Pro. Colab Pro seems to make no difference on the CPUs assigned, as I get the AVX-512 CPUs as often either way.

[QUOTE=LaurV;574359]The part with using both notebooks in the same time does not apply to me, I can't do that unless I use multiple accounts, and that is what I was referring to when I said "headache".[/QUOTE]

Users can run both notebooks with the free Colab and a single account. I am actually running both notebooks right now with the free Colab. Colab Pro just allows users to consistently run more than one copy of both the notebooks, usually up to four copies of each.

[QUOTE=LaurV;574359]Also, "a gpu will complete whatever work in two weeks" [U]if you get it[/U]. If you get two hours today, two after 3 days, then that work will never complete, and (due to separate CPU pools of assignments) bottleneck the CPU work. That is what I was referring as "combine them together" in one of my first posts in this thread, and if I get a GPU, do that, if not, do this. But use a common pool. Then the GPU work comes as additional, the 101st mile, not as a showstopper.[/QUOTE]

Oh, OK, I understand what you are requesting now. We will consider it for the next version of our notebooks. It would be trivial to implement, but would likely be confusing for users, as they would not be able to run both the notebooks at the same time without selecting different [C]computer_number[/C] values for each. BTW, I am not sure if [URL="https://www.mersenneforum.org/showpost.php?p=573177&postcount=44"]you saw[/URL], but I did implement your last requested change to support all the worktypes for the CPU that MPrime currently supports.

[QUOTE=LaurV;574359]Anyhow, thanks a lot for the notebooks, and for the answers. Good job.[/QUOTE]

Thanks for the feedback! No problem, happy to clear up any confusion.

LaurV 2021-03-22 15:16

[QUOTE=tdulcet;574366]BTW, I am not sure if [URL="https://www.mersenneforum.org/showpost.php?p=573177&postcount=44"]you saw[/URL], but I did implement your last requested change to support all the worktypes for the CPU that MPrime currently supports.
[/QUOTE]
Saw, saw... Big thumb up!

Already reported different types of work completed (including a GPU LLDC done in the 60M). The things are not so bad as I describe them, but if I paint them as minor things, you will never care. Now, if I paint them in black, I will make you angry, and you will try to prove me wrong... :razz:
But we like the toys, otherwise we would just ignore them and not use them. We also learned a couple of things or two from them.

So, :tu:

chalsall 2021-03-22 21:11

[QUOTE=danc2;574336]I personally gain no benefit from advocating a free extension. Users are busy, including myself, and I was hoping to avoid reading 6+ pages of forum data, in which non-related topics are discussed.[/QUOTE]

To put on the table...

I, like most, am very busy. But I have to read hundreds of pages of language (some human, some deterministic) every single day.

One /possible/ motivation of you promoting your Free Extension which many of us have argued is again the "spirit" of the Colab Terms of Service is it /might/ assist in getting your Notebook to find the next MP by someone who is both using your Notebook and your extension.

I could, of course, be entirely incorrect in that assessment. I'm simply posting based on my own position, and what I observe.

Personally, I tend to error on the side of caution in situtations like this.

LaurV 2021-04-05 07:12

1 Attachment(s)
Hey Teal, Dan,

How do I pass a key from my keyboard to your colab cudaLucas copy? Beside of taking the pliers, pull the key out of the keyboard and throw it hard around the globe to reach google hq's. :smile:

Why do I bother? Well...

Here attached there is a digest of the FFT sizes, with times per iteration, for all the five cards that colab offers. The cards are so different, and the optimum FFT sizes for them are different too. If you start a LL test with some card, but later you got offered another card, you may lose up to 50% for the speed, because the FFT chosen by the first card is not the optimum value for the second one, and there is no (easy) way to change it.

For example (see excel file inside the zip), your K80 just finished a test, and starts the next one, which by any chance is an exponent in, say, 112M, the K80 will start doing this with a FFT=6144, as that is the best choice for a K80 for this exponent size, with about 7.25 ms/iter (line 61 in the excel file). Then your time expires, and next time you are extremely lucky to get a P100, the P100 will continue the test with FFT=6144, which is a terrible unlucky choice of a size for it, getting about 2.1 ms/iter, when a larger FFT could be used: FFT=6272 with 1.7ms/iter. If you continue the test with the P100, then you got a huge penalty.

This happens the other way around too. If you start a 65M test with P100, it will chose the size 3584, but after few minutes you are out, and next time you get a K80, you will continue with this size, at about 4.2 ms/iter, when a smaller FFT could be used for this card, for only 3.8 ms/iter.

Another example, say you pay your money to Gugu, and get only good cards, and you decide to do a current 100M-digits assignment. Then you get a P100 which will chose the FFT=19683 (line 111 in the table), the smallest and fastest it can use for a 332M exponent, for which it spends about 5.8 ms/iter. Next time you get a V100, which will continue testing at this size, getting about 4.6 ms/iter for the next 20 days (line 286 in the "Threads" table in the excel file, second sheet), when you could use a larger FFT=20736 at 3.73 ms/iter and finish your job in 16 days instead of 20. On the reverse case, I can find a much worse example, but you got the idea.

Now, cudaLucas is very clever, when it runs locally with the "-k" command line switch, we can use the keyboard to increase/decrease FFT size (and other parameters, like how often the screen output, how often to save checkpoints, etc), and we always can chose the best FFT [U][B]on the fly![/B][/U] by pressing few keys (uppercase F, lowercase f, :smile:). In fact, in the past, before gpuOwl era, I was using it intensively like that, always trying to push the FFT as low as possible, to get the fastest times, and back off when the rounding got in the dangerous area. Most of the tests can be run with lower/faster FFT, if you know what you are doing, the limits are "for safety", and to cover strange cases, but in real life, strange cases are few.

So.

Can you implement a similar feature, for example, I can write some text file directly in the drive's folders from which cudaLucas will read (as it can't read my keyboard) periodically, and adjust its parameters? Or offer a way to pass the text I type to it (yes, I can click in the window and type some commands in the square box that appears, but I 'ave no idea where those commands go, if that's actually possible, please enlighten me/us).

Prime95 2021-04-05 07:26

Warning: Server will soon refuse to give first time LL tests. I haven't thought through all the details, most likely a double-check will be assigned.

The server will still accept first-time LL results.

tdulcet 2021-04-05 13:57

[QUOTE=LaurV;575215]Here attached there is a digest of the FFT sizes, with times per iteration, for all the five cards that colab offers.[/QUOTE]

Thanks, your spreadsheet does make it easier to compare the ms/iter times. It looks like you created it from the [C]*fft.txt[/C] and [C]*threads.txt[/C] files [URL="https://github.com/tdulcet/Distributed-Computing-Scripts/tree/master/google-colab/gpu_optimizations"]in our repository[/URL].

[QUOTE=LaurV;575215]The cards are so different, and the optimum FFT sizes for them are different too. If you start a LL test with some card, but later you got offered another card, you may lose up to 50% for the speed, because the FFT chosen by the first card is not the optimum value for the second one, and there is no (easy) way to change it.[/QUOTE]

Yeah, this is a bug with CUDALucas. It does not redetermine the fastest FFT length when the GPU changes. This is actually the only bug we are aware of that is effecting our notebooks. I was going to try to find a solution, but that was around the time Daniel officially announced the notebooks in this thread and people said that GpuOwl was potentially faster. I decided my limited time was better spent [URL="https://www.mersenneforum.org/showpost.php?p=573177&postcount=44"]working on updating our GPU notebook to use GpuOwl[/URL].

Your examples provide another good reason to switch to GpuOwl. We did not initially notice this issue with CUDALucas, since when doing wavefront first time primality tests on Colab Pro, both the P100 and V100 GPUs happen to be optimal at the 6272K FFT length.

[QUOTE=LaurV;575215]Can you implement a similar feature, for example, I can write some text file directly in the drive's folders from which cudaLucas will read (as it can't read my keyboard) periodically, and adjust its parameters? Or offer a way to pass the text I type to it (yes, I can click in the window and type some commands in the square box that appears, but I 'ave no idea where those commands go, if that's actually possible, please enlighten me/us).[/QUOTE]

Yes, you can trivially update the GPU notebook to pass the [C]-k[/C] flag to CUDALucas and then type any keys into that box and press enter. We will include this change with the next version of our notebooks, as it is a good workaround for the issue with CUDALucas for advanced users. Thanks for the feedback!

[QUOTE=Prime95;575218]Warning: Server will soon refuse to give first time LL tests. I haven't thought through all the details, most likely a double-check will be assigned.[/QUOTE]

Thanks for the warning. This would be extremely unfortunate, especially for Colab Pro users...

[QUOTE=Prime95;575218]The server will still accept first-time LL results.[/QUOTE]

I am assuming you are referring to already assigned first time LL tests or do you mean our PrimeNet script could rewrite new first time PRP assignments into LL tests and the server would still accept the results? We completely understand that this is not what you want users to do, but [URL="https://www.mersenneforum.org/showpost.php?p=573177&postcount=44"]as I explained[/URL], unfortunately many Colab Pro users and people doing 100 million digit tests do not have much other choice. Our only other option would be to allow users to set the proof power [URL="https://www.mersenneforum.org/showpost.php?p=572522&postcount=32"]as you suggested[/URL]. However, that would obviously be very unfair to whoever has to do the proof certifications since these users would need to use proof powers of 5 or 6, which is why our notebooks currently do not support it.

Prime95 2021-04-05 16:43

[QUOTE=tdulcet;575241]or do you mean our PrimeNet script could rewrite new first time PRP assignments into LL tests and the server would still accept the results? We completely understand that this is not what you want users to do,
...
Our only other option would be to allow users to set the proof power [URL="https://www.mersenneforum.org/showpost.php?p=572522&postcount=32"]as you suggested[/URL]. However, that would obviously be very unfair to whoever has to do the proof certifications since these users would need to use proof powers of 5 or 6, which is why our notebooks currently do not support it.[/QUOTE]

You read between the lines well. The server cannot prevent someone from taking a PRP assignment and turning it into an LL test.

I'd prefer you do double-checks instead -- first time LL requests will get turned into LL double-check assignments.

Proof power 5 or 6 is still an excellent option for the disk-constrained. A certification at 1/32 or 1/64th the cost of a first time test is still a huge savings.

LaurV 2021-04-06 04:01

[QUOTE=tdulcet;575241]Thanks, your spreadsheet does make it easier to compare the ms/iter times. It looks like you created it from the [C]*fft.txt[/C] and [C]*threads.txt[/C] files [URL="https://github.com/tdulcet/Distributed-Computing-Scripts/tree/master/google-colab/gpu_optimizations"]in our repository[/URL].
[/QUOTE]
Sure, it was only parsing the text in the files you provided, there was no re-run of the -cufftbench on my side. What for? I trust your run :razz:
[QUOTE]
Yeah, this is a bug with CUDALucas.
[/QUOTE]Nope. No bug. It works as intended, it should keep the FFT as I tell it to use, and don't change it on the fly, unless mandatory (keyboard command, rounding error, etc). Where did you get the idea that I am complaining about a bug in cudaLucas? :razz:

[QUOTE]
Your examples provide another good reason to switch to GpuOwl. We did not initially notice this issue with CUDALucas, since when doing wavefront first time primality tests on Colab Pro, both the P100 and V100 GPUs happen to be optimal at the 6272K FFT length.
[/QUOTE]The issue will remain with gpuOwl. Moreover, gpuOwl doesn't provide a way to switch to another FFT size on the fly.

[QUOTE]
Yes, you can trivially update the GPU notebook to pass the [C]-k[/C] flag to CUDALucas and then type any keys into that box and press enter. We will include this change with the next version of our notebooks, as it is a good workaround for the issue with CUDALucas for advanced users. Thanks for the feedback!
[/QUOTE]Thanks! Waiting for it. I don''t know how to do that by myself, my skill there is null. :blush:

[QUOTE=Prime95;575260]A certification at 1/32 or 1/64th the cost of a first time test is still a huge savings.[/QUOTE]
If it is credited accordingly, per time spent. Otherwise, if CERTs take too long, people will prefer to do PRP instead. BTW, how are CERTs credited right now? as PRP-DCs?
And for PRP-CF CERTs? PRP-CF-DC CERTs?

Prime95 2021-04-06 04:44

[QUOTE=LaurV;575292]
If it is credited accordingly, per time spent. Otherwise, if CERTs take too long, people will prefer to do PRP instead. BTW, how are CERTs credited right now? as PRP-DCs?
And for PRP-CF CERTs? PRP-CF-DC CERTs?[/QUOTE]

It is credited as PRP-DC based on the time spent.

tdulcet 2021-04-06 15:43

[QUOTE=Prime95;575260]You read between the lines well. The server cannot prevent someone from taking a PRP assignment and turning it into an LL test.
...
Proof power 5 or 6 is still an excellent option for the disk-constrained. A certification at 1/32 or 1/64th the cost of a first time test is still a huge savings.[/QUOTE]

Thanks for the info. Daniel and I will have to decide what approach to take if/when you make the change...

[QUOTE=LaurV;575292]Nope. No bug. It works as intended, it should keep the FFT as I tell it to use, and don't change it on the fly, unless mandatory (keyboard command, rounding error, etc). Where did you get the idea that I am complaining about a bug in cudaLucas? :razz:[/QUOTE]

All other GIMPS programs that I have used will automatically redetermine the optimal FFT length when you switch devices, including Prime95/MPrime...

[QUOTE=LaurV;575292]The issue will remain with gpuOwl. Moreover, gpuOwl doesn't provide a way to switch to another FFT size on the fly.[/QUOTE]

Interesting, we have not yet been able to do any testing of GpuOwl on Colab... Hopefully it will be less of an issue with GpuOwl, since there are significantly fewer available FFT lengths.

[QUOTE=LaurV;575292]Thanks! Waiting for it. I don''t know how to do that by myself, my skill there is null. :blush:[/QUOTE]

No problem. I updated our GPU notebook with your requested change. As suggested by @Prime95, I also added an option to both notebooks so users can select the PRP proof power. Feedback is welcome.

LaurV 2021-04-11 17:03

Wow! it works! :shock: You (two) are my heroes for this weekend!

Albeit a little bit too complicate, first it didn't work, as I had the "CPU and GPU" output (sure! I want to see what BOTH of them are doing!), then I looked in the code and seen that you use the "-k" switch only when the output is "GPU Only", so, ok, stop the test, switch to "GPU Only" mode, restart the test, press the "f/F/t/T/etc" until "OCD satisfied", then let it run for 20 minutes to see that the output and speed is indeed what I want, stop, switch back to "CPU and GPU" output, restart the test. It works a marvell, as [URL="https://www.youtube.com/channel/UC2DjFE7Xf11URZqWBigcVOQ"]Dave[/URL] would say! Now the tests will be in average ~10% to ~15% faster if I am clever enough to tune the FFT every time the GPU changes. I didn't want to modify the code, as I don't understood the implications, it may be an omission on your side, or you may have a very good reason why the "-k" is active only for the "GPU Only" output, but I didn't have the time (and skill) to look deeper into it.

It works. Full stop.

Thanks.

tdulcet 2021-04-12 12:33

[QUOTE=LaurV;575713]Wow! it works! :shock: You (two) are my heroes for this weekend![/QUOTE]

Great! We are glad it works for you.

[QUOTE=LaurV;575713]Albeit a little bit too complicate, first it didn't work, as I had the "CPU and GPU" output (sure! I want to see what BOTH of them are doing!), then I looked in the code and seen that you use the "-k" switch only when the output is "GPU Only"[/QUOTE]

Yes, sorry, I guess I should have mentioned that. I did not realize anyone was using the "GPU and CPU" output type, as it is very verbose. I added it shortly before we officially announced the notebooks, as I saw it was requested a few times on the main Colab thread and it was easy to implement. When using that option, both CUDALucas and MPrime are run in the background, while the [C]tail -f[/C] command is run the foreground, so there is no easy way to pass input to CUDALucas.

I updated [URL="https://www.mersenneforum.org/showthread.php?t=26574"]our PrimeNet script[/URL] on Saturday to support still getting first time LL tests using [URL="https://www.mersenneforum.org/showpost.php?p=575260&postcount=73"]the method[/URL] described by @Prime95 above, so that users can still use CUDALucas while we work on upgrading our GPU notebook to use GpuOwl. (@LaurV - You will no longer have [URL="https://www.mersenneforum.org/showpost.php?p=575673&postcount=11"]to do this manually[/URL]. :wink:) Anyone who wants to continue doing first time LL tests on the GPU would need to reset up their GPU notebooks after they finish any current assignments. I also included many of the [URL="https://www.mersenneforum.org/showpost.php?p=573177&postcount=44"]changes needed[/URL] for our PrimeNet script to support GpuOwl, including adding support for reporting LL/PRP and P-1 results. Going forward we decided we are going to recommend users do PRP tests, which will be the default, although we will still provide the option of doing LL tests on the GPU for users with very limited Drive space, [URL="https://www.mersenneforum.org/showpost.php?p=573177&postcount=44"]as explained above[/URL]. Prime95/MPrime of course has its PrimeNet functionality builtin, so unfortunately there is not much we can do about the CPU for users with limited Drive space. Those users will need to do LL DC tests on the CPU, although as George said, there is "a chance that a new Mersenne prime is hidden in all those double-checks".

moebius 2021-05-07 21:36

[QUOTE=danc2;572211]I realize we did not post any output or pictures, just links.

Since we have this dedicated thread, here is example output from a GPU notebook running the Tesla V100-SMX2-16GB (a $6,195.00 GPU according to Amazon).
[/QUOTE]LL-test runs much slower than with gpuowl -LL, the same Exponent and the Tesla V100 gpu

mognuts 2021-07-07 11:04

Colab now using AMD CPUs
 
This is the first time I've ever had an AMD!!

[QUOTE]Previous CPU counts
15 Intel(R) Xeon(R) CPU @ 2.30GHz 63
9 Intel(R) Xeon(R) CPU @ 2.00GHz 85
8 Intel(R) Xeon(R) CPU @ 2.20GHz 79
1 [COLOR="Red"]AMD EPYC[/COLOR] 7B12 49[/QUOTE]

danc2 2021-07-07 18:34

@mognuts
Yeah, I was pretty surprised when I first saw that on my machines also!

[QUOTE]
Previous CPU counts
111 Intel(R) Xeon(R) CPU @ 2.30GHz 63
97 Intel(R) Xeon(R) CPU @ 2.20GHz 79
29 Intel(R) Xeon(R) CPU @ 2.00GHz 85
15 AMD EPYC 7B12 49
[/QUOTE]

PhilF 2021-07-07 20:05

[QUOTE=mognuts;582760]This is the first time I've ever had an AMD!![/QUOTE]

I was told if you snag one of those to throw it back because the performance is lower than the others. But that was a while back, that advice might have been referring to a different AMD model.

chalsall 2021-07-07 21:02

[QUOTE=PhilF;582788]I was told if you snag one of those to throw it back because the performance is lower than the others. But that was a while back, that advice might have been referring to a different AMD model.[/QUOTE]

Busy, but quickly...

The AMD CPUs have been given out for quite a while now. And, at least for P-1'ing, they're faster than all the Intel instances (~20% or so).

Flaukrotist 2021-07-07 21:50

[QUOTE=chalsall;582791]And, at least for P-1'ing, they're faster than all the Intel instances (~20% or so).[/QUOTE]

I cannot confirm that. Using Prime95 v30.4 and exponents in range 104M with bounds determined by Prime95, I get the following ranking for the time needed for P-1 stage 1 + 2 in total:

[CODE]
Model [B]63[/B], Intel(R) Xeon(R) CPU @ 2.30GHz: [B]36.09[/B] h
Model [B]79[/B], Intel(R) Xeon(R) CPU @ 2.20GHz: [B]31.58 [/B]h
Model [B]49[/B], AMD EPYC 7B12: [B]31.36[/B] h
Model [B]85[/B], Intel(R) Xeon(R) CPU @ 2.00GHz: [B]25.27 [/B]h[/CODE]So, the Intel Model 85 is clearly fastest.

chalsall 2021-07-07 22:18

[QUOTE=Flaukrotist;582793]I cannot confirm that. ...snip... So, the Intel Model 85 is clearly fastest.[/QUOTE]

I could very well be wrong. My observations were subjective. Would be worth collecting hard data on this.

slandrum 2021-07-08 22:06

There are 3 versions of the Intel chipset on Colab (that I've received on free accounts). The 2.30 GHz model 63 is the worst, followed by the 2.20 GHz model 79, and the 2.00 GHz model 85 with AVX512 is by far the best. The AMD chipset's times overlap with the times I get with the 2.00GHz Intel - the worst times for the 2.00Ghz model 85 Intel are slightly worse that the worst times with the AMD, but the best times with the 2.00 Ghz model 85 Intel are much better than the best times with the AMD. This is for running tests with mprime (LL, PRP, PM1, CERT).

For around 110M PRP, iteration times on 2.30 and 2.20 GHz Intels are around 40ms ranging from the mid 30 to mid 40 - timings on the two overlap but the 2.30 GHz model 63 averages the worst. For the 2.00 GHz model 85 Intel I've seen from 21ms to 32ms. For the AMD I see 26 to 31ms. The iterations times can vary through 6-12 hour session, sometimes by a lot, but most instances seem to stay pretty close to the same ms/iteration throughout the session. The average times on the model 85 are better than the average times on the AMD model 49.

There are far more 2.30 and 2.20 GHz Intels available to me at any given time than either the 2.00 GHz Intel or the AMD.


All times are UTC. The time now is 02:35.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.