mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   First steps in GPU GIMPS (https://www.mersenneforum.org/showthread.php?t=28394)

Jurzal 2023-01-14 12:36

First steps in GPU GIMPS
 
Hi guys,

I was wondering if you could help me bit with simplifying the how-to-gpu calculate prime numbers.
Currently I am running p95 for CPU, have completed 4 assigments with it, I was wondering if I can use my GPU too for it. I checked the gpuowl and mfaktc threads, but they seem overly complicated and contradicting with information said in main website mersenne.org. P95 for CPU is very straight forward, but GPU not so much.

I have 5900X CPU, 3060 Ti GPU and 32 GB of RAM working on Windows 10 system and I would like to contribute a bit for this project on days when electricity prices are cheap in my country (some days close to 0.00 EUR/kWh)
If any gentle soul could guide me through the first steps, I would be very grateful!

Thanks!

kriesel 2023-01-14 14:14

[QUOTE=Jurzal;622502]Hi guys,

I was wondering if you could help me bit with simplifying the how-to-gpu calculate prime numbers.
...

I have 5900X CPU, 3060 Ti GPU and 32 GB of RAM working on Windows 10 system and I would like to contribute a bit for this project on days when electricity prices are cheap in my country[/QUOTE]
Welcome to the forum and the project.
3060Ti is an NVIDIA GPU that is much more powerful at TF than it is at other type GIMPS computations.
You'll want mfaktc for that GPU. Reference info thread for mfaktc is here: [url]https://www.mersenneforum.org/showthread.php?t=23386[/url] with first post including some getting started info.
Learn how to run mfaktc with manual assignments and result reporting, then perhaps explore the choices of client management software, GPUto72, etc.
Reference info index page is here: [url]https://mersenneforum.org/showthread.php?t=24607[/url]

moebius 2023-01-14 14:23

some useful links:

[URL="https://www.mersenne.ca/mfaktc.php"]Trial factoring estimated performance[/URL]

[URL="https://www.mersenne.org/manual_assignment/"]get manual assignments here[/URL]

[URL="https://mersenneforum.org/showthread.php?p=621936#post621936"]Download mfakct 0.21 thread[/URL]
You need to install cuda toolkit 6.0 to 9.0

[URL="https://download.mersenne.ca/mfaktc/mfaktc-0.21"]Download mfaktc for other cuda versions[/URL]

Jurzal 2023-01-14 14:36

Thanks for the links!

Will take a coffee and go through. I was wondering thou, why GPU tasks are so manual and not automated like p95 for CPU is? what holds back the dev to make it same easy-to-use as for CPU?

Thanks!

Jurzal 2023-01-14 15:16

1 Attachment(s)
Hi, so first obstacle here in instructions.

Where should I change the ID's to assign my account? Instruction is half-completed.
Other thing, running selfhelp test is not possible, because the files in folder does not have a help.txt file or anything related to it. What should I do?

Seems like this GPU p95 is only for very advanced users, not for anybody who is just on enthusiast level.
Thanks!

kriesel 2023-01-14 15:34

[QUOTE=Jurzal;622516]Hi, so first obstacle here in instructions.

Where should I change the ID's to assign my account? Instruction is half-completed.
Other thing, running selfhelp test is not possible, because the files in folder does not have a help.txt file or anything related to it. What should I do?

Seems like this GPU p95 is only for very advanced users, not for anybody who is just on enthusiast level.
Thanks![/QUOTE]

In a command prompt box set with default directory in the folder where you installed mfaktc, running mfaktc with the option -h and output redirection >> help.txt CREATES the help file.
(program-name) -h >>(destination-file-name).txt

That's separate from running the selftest.

It's expected that if a user is going to run command-line programs, he will known or learn how to run command-line programs. GPU GIMPS applications are text-mode at the command line, not graphical apps like Windows prime95.

Setting user id, computer name, etc are performed in the mfaktc.ini file. Use a text editor for that.

See also [url]https://www.mersenneforum.org/showpost.php?p=596919&postcount=8[/url] for basic OS related knowledge needed or desirable.

Jurzal 2023-01-14 15:38

Thanks, I will try again!
New to this, so baby steps for me. :)

retina 2023-01-14 15:51

Be careful with the double greater-than ">>" because that means [u]append[/u], It creates anew if it's not already there, and appends to the existing if it is already there. So if you run the "-h" command more than once you get multiple copies of the help text in the file.

You can change this behaviour by using a single greater-than ">" instead. Then it will always create a new file and overwrite anything that might already be there.

kriesel 2023-01-14 17:21

[QUOTE=retina;622522]Be careful with the ... single greater-than ">" instead. Then it will always create a new file and overwrite anything that might already be there.[/QUOTE]A too-easy way to blow away a large log file from past runs.

Jurzal 2023-01-18 08:49

1 Attachment(s)
Self test passed :smile:

Jurzal 2023-01-18 08:50

CUDA version info
binary compiled for CUDA 11.20
CUDA runtime version 11.20
CUDA driver version 12.0

CUDA device info
name NVIDIA GeForce RTX 3060 Ti
compute capability 8.6
max threads per block 1024
max shared memory per MP 102400 byte
number of multiprocessors 38
clock rate (CUDA cores) 1860MHz
memory clock rate: 7001MHz
memory bus width: 256 bit

Automatic parameters
threads per grid 622592
GPUSievePrimes (adjusted) 82486
GPUsieve minimum exponent 1055144

I used cudart64_110.dll and mfaktc-0.21.win_cuda11.2-2047.zip

Jurzal 2023-01-18 09:46

First 10 assignments at default settings are running atm. TF ETA calculation says that one task should take approximately 40 minutes. Will see.
Test info says that the GPU is running around 2900-3100 GHzDays/Day speed.

How do I pause/continue the work if needed?
With GUI in p95 is easy, how it is here? There are commands for that?

In case I wanna play a game and pause the calculations, then resume after I am done gaming.

Thanks a lot for help!

LaurV 2023-01-18 10:09

CTRL+C (only once!)
wait for the window to close (it will write a checkpoint file on your disk)
next time start again, it will resume properly by itself

if you close from the red X button, or use CTRL+C multiple times (two or more), it will abort immediately, without writing checkpoint file, and next time it will resume from the last saved checkpoint, and not from where it was interrupted

checkpoint files are saved regularly, like every 30 minutes or so (customizable in options/ini file/command line, depends on the program you use), so it will not be a big loss if you accidentally close it or the electricity decides to go around your computer, instead of through it

If you do TF and windows, we highly recommend MISFIT - once properly set this will take care of all assignments, work organizing, reports, etc, it is graphic, bah blah, extremely useful especially when you have many GPUs, possibly spread over multiple computers (but not easy to set when you start - there are some tutorials around, read about, ask here, etc).

welcome to the fray!

kriesel 2023-01-18 10:15

[QUOTE=Jurzal;622812]How do I pause/continue the work if needed? [/QUOTE]
I think there are at least 3 methods to choose from. One implies planning ahead. (Are there more methods?)
1) launch the batch job with lower Windows priority than normal interactive use. (Start /BELOWNORMAL cmd /k batchfile)
2) Ctrl-c in the mfaktc command prompt window. Relaunch later will resume from last saved checkpoint file.
3) Select some characters in the mfaktc command prompt window. Leave them selected during the desired pause period. Unselect them afterward.

Over time you may find #3 is easy to initiate accidentally. And resume in either method 2 or 3 too easy to forget to do. Some GIMPS software has an interactive key command input option. Mfaktc appears not to.
But hey, we're all volunteers here. Nobody will get fired from the project for forgetting to run a GIMPS app nearly continuously.

Re LaurV's post, whether the window closes in response to Ctrl-C depends on whether it was launched cmd /k, cmd /c, or bare mfaktc. /k will stick around for the user to read error messages, and provides a place to relaunch conveniently (already in the working directory etc. Usually my relaunches are just an up-arrow or two, then enter.)
Ctrl-c once tells mfaktc to finish the current factor class before writing a checkpoint file and terminating. Depending on GPU card speed and assignment that can take very little time, seconds or less. In unusual extreme cases (high bit levels on quite large exponents, say 92 bits on OBD) a single class can take half an hour or more. Following is from a log for 91-92 bits on M3,321,928,373 on an RTX 2080 Super, showing ~101. minutes per class.
[CODE]Jan 17 13:11 | 2416 52.5% | 6061.3 31d23h | 2241.55 82485 n.a.%
checking for "worktodo.add"... not found
Jan 17 14:52 | 2427 52.6% | 6063.5 31d22h | 2240.71 82485 n.a.%[/CODE]
At the other extreme, several classes may complete per second or per minute.
Following is ~200M 74-75 bit on a ~120GHD/day old GPU. Timings would be ~30 times faster on a modern GPU, so 2+ classes per second.
[CODE]Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Jan 13 15:49 | 4431 95.9% | 13.636 8m52s | 119.77 82485 n.a.%
Jan 13 15:49 | 4440 96.0% | 13.645 8m39s | 119.69 82485 n.a.%
Jan 13 15:50 | 4443 96.1% | 13.639 8m25s | 119.74 82485 n.a.%
Jan 13 15:50 | 4448 96.3% | 13.641 8m11s | 119.73 82485 n.a.%
Jan 13 15:50 | 4451 96.4% | 13.641 7m57s | 119.73 82485 n.a.%
Jan 13 15:50 | 4455 96.5% | 13.638 7m44s | 119.75 82485 n.a.%
Jan 13 15:50 | 4463 96.6% | 13.636 7m30s | 119.77 82485 n.a.%
Jan 13 15:51 | 4464 96.7% | 13.643 7m17s | 119.71 82485 n.a.%
[/CODE]

Jurzal 2023-01-18 10:58

Thanks, both of you!

Will try your suggestions, will peek that that MISFIT soft for management.

Cheers!

Jurzal 2023-01-18 19:00

2 Attachment(s)
[QUOTE=LaurV;622819]CTRL+C (only once!)

If you do TF and windows, we highly recommend MISFIT - once properly set this will take care of all assignments, work organizing, reports, etc, it is graphic, bah blah, extremely useful especially when you have many GPUs, possibly spread over multiple computers (but not easy to set when you start - there are some tutorials around, read about, ask here, etc).

welcome to the fray![/QUOTE]

Since you suggested, do you mind helping out with the automation of MISFIT?
I downloaded 2.11.0 MISFIT version, but it fails to connect with website to fetch or upload any assignments or results. Do you have an idea, what would be the issue?

Thanks!

Mark Rose 2023-01-18 19:22

[QUOTE=Jurzal;622866]Since you suggested, do you mind helping out with the automation of MISFIT?
I downloaded 2.11.0 MISFIT version, but it fails to connect with website to fetch or upload any assignments or results. Do you have an idea, what would be the issue?

Thanks![/QUOTE]

If you're looking to do TF, it's probably trying to fetch assignments from GPU72. You'll need to create an account there as well: [url]https://www.gpu72.com/signup/[/url]

GPU72 coordinates just-in-time Trial Factoring effort.

Jurzal 2023-01-18 19:31

[QUOTE=Mark Rose;622873]If you're looking to do TF, it's probably trying to fetch assignments from GPU72. You'll need to create an account there as well: [url]https://www.gpu72.com/signup/[/url]

GPU72 coordinates just-in-time Trial Factoring effort.[/QUOTE]

Trying to register, but I am not receiving authorization email. Wrote to support, maybe I will hear back something.

Thanks!

Jurzal 2023-01-25 15:21

Hi, some updates! :)

I have managed to successfully run TF assignments using MISFIT 2.11.0 and link with GPU72 site that automatically fetches assignments for me. So far have done 295 assignments worth 21,481.357 GHz-days.

So far so good. I am trying to solve the issue with auto-uploading the results. Currently MISFIT has some error with uploading the results and whenever scheduler triggers auto-upload, it just fails and does nothing, saying that underlying connection has failed. I tried the spider from GPU72, that did nothing too, don't know how to adjust their LINUX code to Windows 10 platform. Any ideas?

Thanks for reading and advices!

Mark Rose 2023-01-25 17:03

If Misfit is having trouble sending results it's likely a problem with your PrimeNet/mersenne.org credentials.

Jurzal 2023-01-25 18:54

[QUOTE=Mark Rose;623463]If Misfit is having trouble sending results it's likely a problem with your PrimeNet/mersenne.org credentials.[/QUOTE]

Right, you may be right. I remade all credentials very carefully and it worked out, upload was auto-sent.
I did before 3 times and sometimes was not working correctly, so I may have sleepwalked that Caps lock or something.

Thanks! Will test how it goes.

Jurzal 2023-01-25 19:20

1 Attachment(s)
[QUOTE=Mark Rose;623463]If Misfit is having trouble sending results it's likely a problem with your PrimeNet/mersenne.org credentials.[/QUOTE]

Well, seems like upload did not take and same result as before. Posted a screenshot.
Where is the issue? Something is blocking it, can't imagine what.

Mark Rose 2023-01-25 19:49

I'm afraid I can't help more as I've never used Misfit personally (it's only for Windows).

Jurzal 2023-01-28 08:45

1 Attachment(s)
All working now like a charm, automatic fetching and automatic uploading, while hidden in system tray :)
Thanks all for helping out!

Screenshot with current progress, some tasks have been chugged out. Almost 30k GHz days already done. ^^

Andrew Usher 2023-01-28 14:01

That's great, but did you actually [I]want[/I] to do only TF? It seems no one asked, but if you just want to do the most to help the project, primality tests on your CPU may still be the better choice.

VBCurtis 2023-01-28 16:50

[QUOTE=Andrew Usher;623634]That's great, but did you actually [I]want[/I] to do only TF? It seems no one asked, but if you just want to do the most to help the project, primality tests on your CPU may still be the better choice.[/QUOTE]

This looks like lousy advice. Please explain your reasoning.

You used "may still be", which suggests you actually don't know if your own advice is useful- so why did you even post?

kriesel 2023-01-28 17:10

1 Attachment(s)
Amen. It takes a minute or two to learn what Jurzal or any specific user has been doing in the way of primality testing, providing forum id = primenet id. [spoiler](Instead of guessing, wrongly, and offering poor and vague advice.)[/spoiler]
See also [url]https://mersenneforum.org/showpost.php?p=622502&postcount=1;[/url]
[url]https://www.mersenne.org/report_LL/?user_id=jurzal[/url] 8 in LL DC 68M this year
[url]https://www.mersenne.org/report_PRP/?user_id=jurzal[/url] 2 in 116M this year
(I don't know if he's also done P-1; there's no P-1 results report for other users, equivalent to the preceding.)

Thank you Jurzal for your contributions, and carry on. TF is the best use for the specific GPU model listed in post one.
(But feel free to experiment/play with other computation types on it too.) And TF or whatever on the GPU will have little or no impact on primality testing or P-1 on the CPU. It's standard to run both simultaneously. Most of my systems run apps on CPU and on GPU(s) simultaneously; in most cases some mix of Mlucas, prime95, gpuowl, mfaktc, mmff. And some of them have Google Colab free cloud computing sessions in a web browser also. Most of the throughput is provided by the GPUs.

Jurzal 2023-01-28 20:02

2 Attachment(s)
[QUOTE=Andrew Usher;623634]That's great, but did you actually [I]want[/I] to do only TF? It seems no one asked, but if you just want to do the most to help the project, primality tests on your CPU may still be the better choice.[/QUOTE]

CPU is chugging too, 2 workers, 6 cores each from 5900X :)

Jurzal 2023-01-28 20:07

[QUOTE=kriesel;623658]Amen. It takes a minute or two to learn what Jurzal or any specific user has been doing in the way of primality testing, providing forum id = primenet id. [spoiler](Instead of guessing, wrongly, and offering poor and vague advice.)[/spoiler]
See also [url]https://mersenneforum.org/showpost.php?p=622502&postcount=1;[/url]
[url]https://www.mersenne.org/report_LL/?user_id=jurzal[/url] 8 in LL DC 68M this year
[url]https://www.mersenne.org/report_PRP/?user_id=jurzal[/url] 2 in 116M this year
(I don't know if he's also done P-1; there's no P-1 results report for other users, equivalent to the preceding.)

Thank you Jurzal for your contributions, and carry on. TF is the best use for the specific GPU model listed in post one.
(But feel free to experiment/play with other computation types on it too.) And TF or whatever on the GPU will have little or no impact on primality testing or P-1 on the CPU. It's standard to run both simultaneously. Most of my systems run apps on CPU and on GPU(s) simultaneously; in most cases some mix of Mlucas, prime95, gpuowl, mfaktc, mmff. And some of them have Google Colab free cloud computing sessions in a web browser also. Most of the throughput is provided by the GPUs.[/QUOTE]

Thanks! I haven't done P-1 yet. My settings in GIMPS are set to give me assignments that GIMPS think are most useful and I let it be as is.

Andrew Usher 2023-01-29 02:40

Thanks; I was just asking, but that answers the question. I'd be afraid to run both a CPU and GPU continuously like that, due to possible power or cooling overload; but I might be just out of date as usual.

I see you've found some factors, too; that, rather than GHz-days, is the true measure of factoring production.

Finally, you might get some P-1 work if you increase your prime95 memory allocation enough, as P-1 uses lots of memory, but leaving it this way is also fine.

kracker 2023-01-29 05:34

[QUOTE=Andrew Usher;623699]Thanks; I was just asking, but that answers the question. I'd be afraid to run both a CPU and GPU continuously like that, due to possible power or cooling overload; but I might be just out of date as usual.

I see you've found some factors, too; that, rather than GHz-days, is the true measure of factoring production.

Finally, you might get some P-1 work if you increase your prime95 memory allocation enough, as P-1 uses lots of memory, but leaving it this way is also fine.[/QUOTE]

I'm struggling to not say something a lot more harsh, but the amount of pure assumptions you so often make along with you confidently accepting those assumptions as fact... is honestly astounding to me.

Contributors have(hopefully) the freedom of crunching what/how they want. Maybe given time, they can try different things through experimentation or suggestions from others... but at the end of the day, that's their choice and we're lucky to have them here.

Jurzal 2023-01-29 05:51

[QUOTE=Andrew Usher;623699]Thanks; I was just asking, but that answers the question. I'd be afraid to run both a CPU and GPU continuously like that, due to possible power or cooling overload; but I might be just out of date as usual.

I see you've found some factors, too; that, rather than GHz-days, is the true measure of factoring production.

Finally, you might get some P-1 work if you increase your prime95 memory allocation enough, as P-1 uses lots of memory, but leaving it this way is also fine.[/QUOTE]

Both CPU and GPU are undervolted, fitted into large, airflow case with 5x140 mm inflow fans giving enough airflow.
CPU is lapped, with 360 mm AIO and push/pull fan config. GPU is ASUS Rog Strix OC, it is so overbuilt, that it sits at 45-47c while chugging TF. CPU sitting average at 64-66c. Total power consumption from the wall is around 450-500w, temperatures are not a problem. Memory allocation is 24 GB from 32 available. So far it uses only a fraction of that, for PRP.

Problem thou, what I notice now is with larger PRP calculations, proof file is getting rewritten all the time on my SSD, putting a heavy toll on it's life longevity. Lot of Terabytes written on that SSD already from P95. In 9 hours, 43 GB written on the SSD, I am using SN850 1 TB drive. It is rated for 600 TBW. I will monitor the SSD usage for a bit, but if the SSD will get destroyed like that, I may opt out of PRP work for CPU.

Mark Rose 2023-01-29 06:16

[QUOTE=Jurzal;623710]Both CPU and GPU are undervolted, fitted into large, airflow case with 5x140 mm inflow fans giving enough airflow.
CPU is lapped, with 360 mm AIO and push/pull fan config. GPU is ASUS Rog Strix OC, it is so overbuilt, that it sits at 45-47c while chugging TF. CPU sitting average at 64-66c. Total power consumption from the wall is around 450-500w, temperatures are not a problem. Memory allocation is 24 GB from 32 available. So far it uses only a fraction of that, for PRP.

Problem thou, what I notice now is with larger PRP calculations, proof file is getting rewritten all the time on my SSD, putting a heavy toll on it's life longevity. Lot of Terabytes written on that SSD already from P95. In 9 hours, 43 GB written on the SSD, I am using SN850 1 TB drive. It is rated for 600 TBW. I will monitor the SSD usage for a bit, but if the SSD will get destroyed like that, I may opt out of PRP work for CPU.[/QUOTE]

If you want to save power, setting a lower power limit on your GPU is useful. I cap my 270 watt 3070s at 200 watts and get 95% of the performance.

You have a good amount of RAM to run P-1 work.

600000 / (43 * (24/9)) / 365 = 14.3 years

Jurzal 2023-01-29 06:26

[QUOTE=Mark Rose;623711]If you want to save power, setting a lower power limit on your GPU is useful. I cap my 270 watt 3070s at 200 watts and get 95% of the performance.

You have a good amount of RAM to run P-1 work.

600000 / (43 * (24/9)) / 365 = 14.3 years[/QUOTE]

Undervolting is wayyyy better for GPU than powerlimiting itself. Also, while undervolting, it will reduced the max power draw with it, so it does the same thing. My 240W GPU is sitting at 160W, while powerlimit is not reached. 0.925V at 1965 MHz boost, memory at 1900 MHz.

What does the formula mean, that you gave 14.3 years? I have never done P-1, so I am missing the context.

Mark Rose 2023-01-29 07:18

[QUOTE=Jurzal;623712]Undervolting is wayyyy better for GPU than powerlimiting itself. Also, while undervolting, it will reduced the max power draw with it, so it does the same thing. My 240W GPU is sitting at 160W, while powerlimit is not reached. 0.925V at 1965 MHz boost, memory at 1900 MHz.

What does the formula mean, that you gave 14.3 years? I have never done P-1, so I am missing the context.[/QUOTE]

I haven't played with GPU undervolting since it's not straight forward on Linux.

14.3 years is how long it would take to use up 600TB of endurance at the 43 GB per 9 hours rate you posted.

Jurzal 2023-01-29 07:32

[QUOTE=Mark Rose;623715]I haven't played with GPU undervolting since it's not straight forward on Linux.

14.3 years is how long it would take to use up 600TB of endurance at the 43 GB per 9 hours rate you posted.[/QUOTE]

[url]https://linustechtips.com/topic/1259546-how-to-undervolt-nvidia-gpus-in-linux/[/url] this may help

Thanks for explaining what 14 years meant, yes, maybe I don't need to worry about it :D


All times are UTC. The time now is 14:46.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.