mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2019-05-09, 08:40   #1
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

23·41 Posts
Default Nvidia Jetson Nano

I finally received the Nvidia Jetson Nano devboard ($99, a bit more in EU with tax and shipping) that I ordered back in March. Well actually I got it a couple weeks ago, but now I finally got around to taking it into use.

It has a cut down Tegra X1 with four Cortex-A57 cores running at 1.428 GHz, plus 128 CU in the GPU portion that is running at 921.6 MHz. Maxwell architecture, so it's compute capability 5.3 - getting a bit old by now.

So, there are three things I tested on it, just for fun I guess.

Mlucas 18.0 timings:
Code:
18.0
      2048  msec/iter =   49.13  ROE[avg,max] = [0.000311985, 0.375000000]  radices = 128 16 16 32  0  0  0  0  0  0
      2304  msec/iter =   55.65  ROE[avg,max] = [0.000273731, 0.375000000]  radices = 144 16 16 32  0  0  0  0  0  0
      2560  msec/iter =   60.10  ROE[avg,max] = [0.000236003, 0.312500000]  radices = 160 16 16 32  0  0  0  0  0  0
      2816  msec/iter =   68.26  ROE[avg,max] = [0.000259256, 0.343750000]  radices = 176 16 16 32  0  0  0  0  0  0
      3072  msec/iter =   75.20  ROE[avg,max] = [0.000267585, 0.375000000]  radices = 192 16 16 32  0  0  0  0  0  0
      3328  msec/iter =   80.51  ROE[avg,max] = [0.000282796, 0.375000000]  radices = 208 32 16 16  0  0  0  0  0  0
      3584  msec/iter =   86.82  ROE[avg,max] = [0.000254826, 0.343750000]  radices = 224 16 16 32  0  0  0  0  0  0
      3840  msec/iter =   93.43  ROE[avg,max] = [0.000247071, 0.312500000]  radices = 240 16 16 32  0  0  0  0  0  0
      4096  msec/iter =  100.19  ROE[avg,max] = [0.000227303, 0.312500000]  radices = 256 16 16 32  0  0  0  0  0  0
      4608  msec/iter =  112.94  ROE[avg,max] = [0.000248429, 0.312500000]  radices = 288 16 16 32  0  0  0  0  0  0
      5120  msec/iter =  128.99  ROE[avg,max] = [0.000234485, 0.281250000]  radices = 160 32 32 16  0  0  0  0  0  0
      5632  msec/iter =  146.71  ROE[avg,max] = [0.000257845, 0.343750000]  radices = 176 32 32 16  0  0  0  0  0  0
      6144  msec/iter =  161.94  ROE[avg,max] = [0.000247003, 0.312500000]  radices = 192 32 32 16  0  0  0  0  0  0
      6656  msec/iter =  172.41  ROE[avg,max] = [0.000266479, 0.375000000]  radices = 208 32 32 16  0  0  0  0  0  0
      7168  msec/iter =  186.51  ROE[avg,max] = [0.000226100, 0.281250000]  radices = 224 32 32 16  0  0  0  0  0  0
      7680  msec/iter =  200.75  ROE[avg,max] = [0.000236377, 0.312500000]  radices = 240 32 32 16  0  0  0  0  0  0
      8192  msec/iter =  221.50  ROE[avg,max] = [0.000237378, 0.312500000]  radices = 256 32 32 16  0  0  0  0  0  0
As a side note, running stuff concurrently on the GPU doesn't affect Mlucas timings at all. I'm running it in the higher power mode, of course (10W) and have installed a 40mm fan on the heat sink. It is PWM controlled so in normal use it doesn't even spin up. But even running CPU+GPU at full load doesn't make the fan run at anywhere close to full RPM, so there's not really any detectable noise.

mfaktc 0.21 - really just compiled it and did a short test run, exponent 92257213 on various bit levels (65-66, 66-67, 67-68) seems to give about 29 GHz-d/day.

CUDALucas 2.06beta - I had some trouble compiling it, but the problem was solved by using the stub version of the nvidia-ml library. It runs, but complains a bit on startup. I knew the performance was going to be LOLworthy, so I only ran the cufftbench / threadbench for one FFT size:
2048K fft: 60.4402 ms per iteration (square: 32, splice: 256)

In summary, this won't be good for any number crunching if it's the primary purpose, for that price. I got it for other uses though, I've been (ab)using it as a media/browser box in the living room. Works fine with Youtube etc. Much quicker and nicer to use than a Raspberry Pi, but that is hardly surprising. I didn't measure the actual power consumption though, if it's really staying at 10W then the performance isn't completely terrible, but I suspect it goes a bit higher than that when using both the CPU and GPU at the same time.
nomead is offline   Reply With Quote
Old 2019-05-10, 22:45   #2
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

9,791 Posts
Default

Cool, especially if you have some 'real purpose' for the board and the GIMPS crunching is a bonus. But can it make phone calls? :) (The Mlucas timings you posted are similar to what I get from all-4-core running on one of my Galaxy S7s.)

By way if another micro-board comparison, my the main (4-core a73) processor of my odroid N2 gets timings ~20% faster than the ones you posted. So somewhat better on a bang-for-buck basis, but similar ballpark overall. And I've made no use of the GPU portion of the N2, no idea if that's useful for any GIMPS work.
ewmayer is offline   Reply With Quote
Old 2019-05-11, 01:52   #3
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

23·41 Posts
Default

Quote:
Originally Posted by ewmayer View Post
By way if another micro-board comparison, my the main (4-core a73) processor of my odroid N2 gets timings ~20% faster than the ones you posted. So somewhat better on a bang-for-buck basis, but similar ballpark overall. And I've made no use of the GPU portion of the N2, no idea if that's useful for any GIMPS work.
Yeah the A73 is a couple years newer, and faster (by clock speed) than the A57, so it is not surprising. Though the core was optimized for different uses (mobile / power efficiency), and the A72 should actually be a bit faster at the same clocks than the A73. Why do I say this? Well... the A72 has a longer pipeline, wider instruction decode, wider out-of-order dispatch, and finally, one execution port more. But if you compare the Odroid N1 and N2, things get a bit more complicated because of RAM differences, DDR3 vs. DDR4. Things also get complicated when comparing SoCs from big vendors (Qualcomm / Samsung / HiSilicon), because they have the license to make their own modifications to the core to suit their projected purposes.

Now, if the Jetson board was able to run at full Tegra X1 clock speeds (1.9 GHz instead of 1.428), things might be a bit different. But there's a list of available clock speed steps, and they've somehow locked things up so that setting any other speed won't stick. Too bad.

Now that I've seen the performance, it is no wonder that the ARM server chips (Opteron A1100 series) that AMD made a few years back pretty much sank without a trace... they also had four or eight A57 cores.

I have some doubts about the N2 GPU section. The Mali-G52 (from the N2) is listed as 40.8 FP32 GFLOPS per core, so double that. The Maxwell cores still available on the Jetson Nano are 236 GFLOPS total. No idea about INT32 performance though. But probably not worth the effort. I actually lost any interest in adapting mfakto for the VideoCore IV on the Raspberry Pi because of this. That thing is supposed to be just 28.8 GFLOPS on the Pi 3. So it doesn't make any sense to put any effort into it, other than for academic / learning purposes. Maybe if there's nothing at all else to do at some point... but I feel that watching paint dry could be more entertaining.
nomead is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
nvidia rtx-2080 firejuggler GPU Computing 139 2019-10-05 13:43
Nvidia GTX 745 4GB ??? petrw1 GPU Computing 3 2016-08-02 15:23
Nvidia Pascal, a third of DP firejuggler GPU Computing 12 2016-02-23 06:55
AMD + Nvidia TheMawn GPU Computing 7 2013-07-01 14:08
What can I do with my nvidia GPU? Surge Software 4 2010-09-29 11:36

All times are UTC. The time now is 07:29.

Wed Oct 21 07:29:42 UTC 2020 up 41 days, 4:40, 0 users, load averages: 1.36, 1.57, 1.46

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.