mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2016-01-26, 02:39   #12
bgbeuning
 
Dec 2014

3×5×17 Posts
Default

Quote:
Originally Posted by ixfd64 View Post
Slightly off-topic, but do you have one of these?

http://www.amax.com/hpc/productdetai..._id=XG-4802Gk8
My GPU box is a Dell C410x.
It has 16x GPU (Tesla M2070) , 4x PSU (1400W, 220VAC), 8x iPASS ports and no CPU.

http://www.dell.com/us/business/p/poweredge-c410x/pd

An iPASS port connects with a cable to a PCI card in the host computer.
So the box with the CPU, MB, and RAM does not have any of the GPU boards.

It has some kind of PCI switching inside and can be configured as
2 GPU per iPASS port (uses all 8 ports)
4 GPU per iPASS port (uses 4 ports)
8 GPU per iPASS port (uses 2 ports, this is my setup)
bgbeuning is offline   Reply With Quote
Old 2016-01-26, 02:55   #13
bgbeuning
 
Dec 2014

3×5×17 Posts
Default

Quote:
Originally Posted by TheJudger View Post
P.S. perhaps the system is better used for CUDALucas?
Never tried CUDAlucas. I need to.
When I got all the pieces working together (it really needs X windows
even though none of the GPU have a video output), I just put it to
work knowing what I knew how to setup.

About loose power connection. The GPU are inside a metal shell
(apparently the Dell engineers call it a "taco"). Kind of like a disk drive
hot swap tray. Have not tried opening one up yet to look inside.
The power connection will be inside.

The first time I hooked it up, one host saw 7 (of 8) GPU and
the second host saw 5 (of 8). After rebooting everything,
the second host started seeing 7. I need to try moving boards
around to different slots to see if anything changes. I know
which boards are not working because 2 of 16 GPU have a blinking
LED while the rest have a solid LED. When they are idle, all
16 LED blink.
bgbeuning is offline   Reply With Quote
Old 2016-01-26, 03:09   #14
bgbeuning
 
Dec 2014

111111112 Posts
Default

Quote:
Originally Posted by airsquirrels View Post
Is the system running headless? Linux generally doesn't require an actual VGA display, that's what I do.

I also have some scripts that generate a fake display for the purpose of accessing certain nvidia-settings options.
My day job is C++ programming, so most of these questions I am not familiar with.
Most of my machines do not usually have a display connected, but when having
trouble I connect one. Most of my Linux boxes run Ubuntu server so they do not
have X windows installed. But to get the right Cuda driver I needed to install
desktop edition with X windows for the two machines talking with the GPUs.

Quote:
Also, check nvidia-xconfig/xorg.conf if that is incorrectly setup it can block some GPUs from appearing. Better yet, stop X altogether and see if you get the same device count.
nvidia-xconfig said a config file was missing.
And find(1) did not locate xorg.conf on the machine.

Quote:
somewhere in the dmesg log there should be an entry indicating a problem with the card., if any.
I think we may have a winner!

Quote:
[ 116.520549] NVRM: Xid (PCI:0000:10:00): 58, FB memtest failed, so resetting DRAMs
[ 116.578095] NVRM: Xid (PCI:0000:10:00): 58, FB memtest failed, so resetting DRAMs
[ 116.635601] NVRM: Xid (PCI:0000:10:00): 58, FB memtest failed, so resetting DRAMs
Quote:
Do all 8 show in lspci -vvv? Any differences in link speed?
The output is about 100K so I don't want to post it here.
Since the GPU cards are not in the host PC, I think it will show
the speed of the PCI card connected to the iPASS port.

Quote:
Finally, what host CPU(s)? Is this an external PCIe backplane with a lane switch, or is it using CPU lanes. lspci will give us link speeds for each card.
The GPU are not in the PC, so I am going to go with "external PCI backplane".
bgbeuning is offline   Reply With Quote
Old 2016-01-31, 17:46   #15
bgbeuning
 
Dec 2014

3·5·17 Posts
Default CUDALucas-2.05.1-CUDA4.2-linux-x86_64: No such file or directory

Quote:
Originally Posted by TheJudger View Post

P.S. perhaps the system is better used for CUDALucas?
I am trying out CUDALucas but having a basic Linux problem.
The error is

Quote:
-bash: ./CUDALucas-2.05.1-CUDA4.2-linux-x86_64: No such file or directory
I eliminated some simple cases
1. file is executable
2. file is 64-bit executable on 64-bit OS
3. all the needed shared library files are found

Quote:
% ls -l ./CUDALucas-2.05.1-CUDA4.2-linux-x86_64
-rwxr-xr-x 1 bgb bgb 425256 Feb 11 2015 ./CUDALucas-2.05.1-CUDA4.2-linux-x86_64
% file ./CUDALucas-2.05.1-CUDA4.2-linux-x86_64
./CUDALucas-2.05.1-CUDA4.2-linux-x86_64: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=88125348406e20d621c63145faf083ae3829921c, not stripped
% uname -a
Linux c6100a1 3.19.0-25-generic #26~14.04.1-Ubuntu SMP Fri Jul 24 21:16:20 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
% ldd ./CUDALucas-2.05.1-CUDA4.2-linux-x86_64
linux-vdso.so.1 => (0x00007ffed5bc6000)
libcufft.so.4 => /home/bgb/lib/libcufft.so.4 (0x00007f7133ec8000)
libcudart.so.4 => /home/bgb/lib/libcudart.so.4 (0x00007f7133c6a000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f7133964000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f713359f000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f713339b000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f713317d000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f7132e79000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f7132c63000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f7132a5b000)
/lib/ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2 (0x00007f7135eec000)
I also tried starting CUDALucas in gdb(1) in the hopes gdb would give a better error message.
Turns out gdb uses bash to start the executable, so no help there.

Any thing else I should check?

Last fiddled with by bgbeuning on 2016-01-31 at 17:47 Reason: Changed dollar sign prompts to percent so math package doesn't eat them
bgbeuning is offline   Reply With Quote
Old 2016-01-31, 19:58   #16
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

13×239 Posts
Default

I would recompile it.
Mark Rose is offline   Reply With Quote
Old 2016-01-31, 21:34   #17
bgbeuning
 
Dec 2014

111111112 Posts
Default

Recompile worked.

It made a runtime 7.5 version while I had been running mfaktc with runtime 4.2.

Thanks!
bgbeuning is offline   Reply With Quote
Old 2016-01-31, 22:13   #18
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

13·239 Posts
Default

Quote:
Originally Posted by bgbeuning View Post
Recompile worked.

It made a runtime 7.5 version while I had been running mfaktc with runtime 4.2.

Thanks!
What GPU do you have? Do note there is a bug in CUDA 7.0 through 7.5 that causes mfaktc to produce bad results on Maxwell GPUs.
Mark Rose is offline   Reply With Quote
Old 2016-01-31, 22:44   #19
bgbeuning
 
Dec 2014

3×5×17 Posts
Default

They are Tesla M2070 (Fermi).

It does not give time units, but it looks like 1 day 19 hours
for an M45,000,000 double check LL.
bgbeuning is offline   Reply With Quote
Old 2016-02-05, 19:37   #20
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

2·557 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
What GPU do you have? Do note there is a bug in CUDA 7.0 through 7.5 that causes mfaktc to produce bad results on Maxwell GPUs.
read: "produce bad results" = fails builtin selftest and refuses further work
TheJudger is offline   Reply With Quote
Old 2016-05-19, 19:02   #21
anonymous
 

11111001101002 Posts
Default

bgbeuning, please can you tell me which motherboard or server you have? I have a similar C410X with M2070s (also coincidentally purchased from someone who claimed that it came with M2090s) and I would like to use a server that you have now found to work correctly.
  Reply With Quote
Old 2016-05-19, 21:35   #22
bgbeuning
 
Dec 2014

3·5·17 Posts
Default

Sounds like we bought from the same guy. I was expecting M2090 also.
Two of the slots in mine are dead. When moving cards around, the same
two slots don't work. The main board is available on e-bay for $160 but
it doesn't seem worth it.

First it requires 220 VAC to plug it into. Using 120 VAC will turn on the
front lights but it will not run.

Also expect a lot of heat. 16x M2070 at 200W each is over 3000W of heat.
Mine is off until next fall or I figure out an economical way to cool my basement.
In the winter it heats my house.

I bought a C6100 from ebay to connect to the C410x. It is the only box
certified to work with the C410x, but I would expect other PC would work
too. Note that all 16 GPU can not be connected to one PC. Something
about each GPU uses 4096 IO ports and the x86 only has 64K IO ports so
there are not enough.

Mine is configured with 8 cards per external PCI cable / bridge.
There is a web interface to configure it. But apparently there are
jumpers inside the box to configure it also.

If you have any more questions, I will do my best to answer them.
bgbeuning is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Running multiple ecm's johnadam74 GMP-ECM 21 2019-10-27 18:04
Multiple GPU's Windows 10 cardogab7341 GPU Computing 5 2015-08-09 13:57
Using multiple PCs numbercruncher Information & Answers 18 2014-04-17 00:17
Running multiple copies of mprime on Linux hc_grove Software 3 2004-10-10 15:34
Multiple systems/multiple CPUs. Best configuration? BillW Software 1 2003-01-21 20:11

All times are UTC. The time now is 16:23.


Wed Feb 8 16:23:43 UTC 2023 up 174 days, 13:52, 1 user, load averages: 1.15, 0.87, 0.91

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔