mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Msieve (https://www.mersenneforum.org/forumdisplay.php?f=83)
-   -   Msieve GPU Linear Algebra (https://www.mersenneforum.org/showthread.php?t=27042)

EdH 2022-05-01 18:50

Thanks! I think it would be too costly to bring the Core2 up to 16 GB, so I'll look at other options. I appreciate all the help!

EdH 2022-05-09 20:09

Sorry to annoy, but I'm having troubles getting an M40 to run. The system sees it, but not [C]nvidia-smi[/C] or Msieve. This machine runs the k20X and an NVS-510 fine. Do I need to reinstall CUDA with the M40 in place, perhaps?

frmky 2022-05-09 22:08

If nvidia-smi doesn't see it, then msieve won't. Perhaps you need to reinstall the CUDA driver with the M40 installed?

EdH 2022-05-10 01:22

[QUOTE=frmky;605578]If nvidia-smi doesn't see it, then msieve won't. Perhaps you need to reinstall the CUDA driver with the M40 installed?[/QUOTE]Reinstalled driver and CUDA in different variations and no joy. The computer says it's there, but CUDA says it isn't. I put the K20Xm back in and it sees it every time. Both are PCIEx16 v3.0.

Giving up for now. . .

ETA: Msieve compiled with 5.2, but couldn't find the cqard, as expected.

Thanks for the help.

EdH 2022-05-10 17:52

I guess I have found my answer for the M40:[code][ 1562.849818] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR1 is 0M @ 0x0 (PCI:0000:01:00.0)
[ 1562.849819] NVRM: The system BIOS may have misconfigured your GPU.
[ 1562.849824] nvidia: probe of 0000:01:00.0 failed with error -1
[ 1562.849839] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 1562.849840] NVRM: None of the NVIDIA devices were initialized.[/code]And, no newer BIOS updates addressing any PCI issues.

frmky 2022-05-10 18:13

That's a BIOS issue. Google says to look for options deep in the BIOS menus like PCI Express 64-bit BAR Support, large BARs, or above 4G decoding.

EdH 2022-05-10 20:45

[QUOTE=frmky;605610]That's a BIOS issue. Google says to look for options deep in the BIOS menus like PCI Express 64-bit BAR Support, large BARs, or above 4G decoding.[/QUOTE]Thank you for all the help with everythig. I do appreciate it, but I'm going to leave it sit for now. I did search the BIOS and all I found were two things: a Robust Graphics Booster with Auto/Fast/Turbo setting, for which there is a red message (for all three settings), "[COLOR=Red]Warning: VGA Graphics card is not guaranteed to operate normally[/COLOR]," and a PCIE frequency adjustment with a warning about setting it above 100MHz. The messages are displayed for the K20Xm as well. I guess I should consider myself lucky that one works.

Thank you, again, for all your help.

EdH 2022-07-15 14:38

A small follow-up:

I now have the Tesla M40 24GB running and am quite pleased. But, there is room for improvement. It is throttling due to insufficient cooling. It gets to 87C and cuts its processing. I have a push fan and a pull fan, but the throughput is just not there. I will have to pursue an alternate method. Would hate to wait until winter to get the full capability.

RichD 2022-07-16 22:51

[QUOTE=frmky;606227]Yep. With the managed memory option, the program stores portions of the sparse matrix blocks in main memory if necessary and moves them to the GPU when they are needed in each iteration. This significantly increases traffic on the PCIe bus. The GPU spends much more time waiting for data, but it can still be faster than running on the CPU.[/QUOTE]
I am thinking of tackling a much larger job where the matrix might be 5-6 times the GPU memory I have on a GTX 1660 (6GB) card. I know it helps on smaller jobs where the memory requirements are less than 2X. Would it better to utilize the GPU or just go for it and report my results here? (Using use_managed=1)

frmky 2022-07-16 22:58

There's a good chance that won't work. The vectors are always kept on the card and may take most of the GPU memory, leaving little for the matrix blocks and spmv scratch space. Nothing beats experiment, though, so give it a try and see what happens.

RichD 2022-07-31 01:54

[QUOTE=frmky;609666]There's a good chance that won't work. The vectors are always kept on the card and may take most of the GPU memory, leaving little for the matrix blocks and spmv scratch space. Nothing beats experiment, though, so give it a try and see what happens.[/QUOTE]
Attempting a ridiculous LA with the matrix needing more than five times the GPU memory, even trying with [C]use-managed=1[/C], was a no-go as expected.
[CODE]matrix is 33782739 x 33783144 (13141.4 MB) with weight 3041417453 (90.03/col)
sparse part has weight 2904400096 (85.97/col)
using GPU 0 (NVIDIA GeForce GTX 1660)
selected card has CUDA arch 7.5
Nonzeros per block: 1750000000
Storing matrix in managed memory
converting matrix to CSR and copying it onto the GPU
Killed[/CODE]Maybe a 2-3 times the size needed won't so obnoxious. :smile:


All times are UTC. The time now is 17:24.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.