mersenneforum.org Nvidia Jetson Nano
 Register FAQ Search Today's Posts Mark Forums Read

 2019-05-09, 08:40 #1 nomead     "Sam Laur" Dec 2018 Turku, Finland 23·41 Posts Nvidia Jetson Nano I finally received the Nvidia Jetson Nano devboard (\$99, a bit more in EU with tax and shipping) that I ordered back in March. Well actually I got it a couple weeks ago, but now I finally got around to taking it into use. It has a cut down Tegra X1 with four Cortex-A57 cores running at 1.428 GHz, plus 128 CU in the GPU portion that is running at 921.6 MHz. Maxwell architecture, so it's compute capability 5.3 - getting a bit old by now. So, there are three things I tested on it, just for fun I guess. Mlucas 18.0 timings: Code: 18.0 2048 msec/iter = 49.13 ROE[avg,max] = [0.000311985, 0.375000000] radices = 128 16 16 32 0 0 0 0 0 0 2304 msec/iter = 55.65 ROE[avg,max] = [0.000273731, 0.375000000] radices = 144 16 16 32 0 0 0 0 0 0 2560 msec/iter = 60.10 ROE[avg,max] = [0.000236003, 0.312500000] radices = 160 16 16 32 0 0 0 0 0 0 2816 msec/iter = 68.26 ROE[avg,max] = [0.000259256, 0.343750000] radices = 176 16 16 32 0 0 0 0 0 0 3072 msec/iter = 75.20 ROE[avg,max] = [0.000267585, 0.375000000] radices = 192 16 16 32 0 0 0 0 0 0 3328 msec/iter = 80.51 ROE[avg,max] = [0.000282796, 0.375000000] radices = 208 32 16 16 0 0 0 0 0 0 3584 msec/iter = 86.82 ROE[avg,max] = [0.000254826, 0.343750000] radices = 224 16 16 32 0 0 0 0 0 0 3840 msec/iter = 93.43 ROE[avg,max] = [0.000247071, 0.312500000] radices = 240 16 16 32 0 0 0 0 0 0 4096 msec/iter = 100.19 ROE[avg,max] = [0.000227303, 0.312500000] radices = 256 16 16 32 0 0 0 0 0 0 4608 msec/iter = 112.94 ROE[avg,max] = [0.000248429, 0.312500000] radices = 288 16 16 32 0 0 0 0 0 0 5120 msec/iter = 128.99 ROE[avg,max] = [0.000234485, 0.281250000] radices = 160 32 32 16 0 0 0 0 0 0 5632 msec/iter = 146.71 ROE[avg,max] = [0.000257845, 0.343750000] radices = 176 32 32 16 0 0 0 0 0 0 6144 msec/iter = 161.94 ROE[avg,max] = [0.000247003, 0.312500000] radices = 192 32 32 16 0 0 0 0 0 0 6656 msec/iter = 172.41 ROE[avg,max] = [0.000266479, 0.375000000] radices = 208 32 32 16 0 0 0 0 0 0 7168 msec/iter = 186.51 ROE[avg,max] = [0.000226100, 0.281250000] radices = 224 32 32 16 0 0 0 0 0 0 7680 msec/iter = 200.75 ROE[avg,max] = [0.000236377, 0.312500000] radices = 240 32 32 16 0 0 0 0 0 0 8192 msec/iter = 221.50 ROE[avg,max] = [0.000237378, 0.312500000] radices = 256 32 32 16 0 0 0 0 0 0 As a side note, running stuff concurrently on the GPU doesn't affect Mlucas timings at all. I'm running it in the higher power mode, of course (10W) and have installed a 40mm fan on the heat sink. It is PWM controlled so in normal use it doesn't even spin up. But even running CPU+GPU at full load doesn't make the fan run at anywhere close to full RPM, so there's not really any detectable noise. mfaktc 0.21 - really just compiled it and did a short test run, exponent 92257213 on various bit levels (65-66, 66-67, 67-68) seems to give about 29 GHz-d/day. CUDALucas 2.06beta - I had some trouble compiling it, but the problem was solved by using the stub version of the nvidia-ml library. It runs, but complains a bit on startup. I knew the performance was going to be LOLworthy, so I only ran the cufftbench / threadbench for one FFT size: 2048K fft: 60.4402 ms per iteration (square: 32, splice: 256) In summary, this won't be good for any number crunching if it's the primary purpose, for that price. I got it for other uses though, I've been (ab)using it as a media/browser box in the living room. Works fine with Youtube etc. Much quicker and nicer to use than a Raspberry Pi, but that is hardly surprising. I didn't measure the actual power consumption though, if it's really staying at 10W then the performance isn't completely terrible, but I suspect it goes a bit higher than that when using both the CPU and GPU at the same time.
 2019-05-10, 22:45 #2 ewmayer ∂2ω=0     Sep 2002 República de California 9,791 Posts Cool, especially if you have some 'real purpose' for the board and the GIMPS crunching is a bonus. But can it make phone calls? :) (The Mlucas timings you posted are similar to what I get from all-4-core running on one of my Galaxy S7s.) By way if another micro-board comparison, my the main (4-core a73) processor of my odroid N2 gets timings ~20% faster than the ones you posted. So somewhat better on a bang-for-buck basis, but similar ballpark overall. And I've made no use of the GPU portion of the N2, no idea if that's useful for any GIMPS work.
2019-05-11, 01:52   #3

"Sam Laur"
Dec 2018
Turku, Finland

23·41 Posts

Quote:
 Originally Posted by ewmayer By way if another micro-board comparison, my the main (4-core a73) processor of my odroid N2 gets timings ~20% faster than the ones you posted. So somewhat better on a bang-for-buck basis, but similar ballpark overall. And I've made no use of the GPU portion of the N2, no idea if that's useful for any GIMPS work.
Yeah the A73 is a couple years newer, and faster (by clock speed) than the A57, so it is not surprising. Though the core was optimized for different uses (mobile / power efficiency), and the A72 should actually be a bit faster at the same clocks than the A73. Why do I say this? Well... the A72 has a longer pipeline, wider instruction decode, wider out-of-order dispatch, and finally, one execution port more. But if you compare the Odroid N1 and N2, things get a bit more complicated because of RAM differences, DDR3 vs. DDR4. Things also get complicated when comparing SoCs from big vendors (Qualcomm / Samsung / HiSilicon), because they have the license to make their own modifications to the core to suit their projected purposes.

Now, if the Jetson board was able to run at full Tegra X1 clock speeds (1.9 GHz instead of 1.428), things might be a bit different. But there's a list of available clock speed steps, and they've somehow locked things up so that setting any other speed won't stick. Too bad.

Now that I've seen the performance, it is no wonder that the ARM server chips (Opteron A1100 series) that AMD made a few years back pretty much sank without a trace... they also had four or eight A57 cores.

I have some doubts about the N2 GPU section. The Mali-G52 (from the N2) is listed as 40.8 FP32 GFLOPS per core, so double that. The Maxwell cores still available on the Jetson Nano are 236 GFLOPS total. No idea about INT32 performance though. But probably not worth the effort. I actually lost any interest in adapting mfakto for the VideoCore IV on the Raspberry Pi because of this. That thing is supposed to be just 28.8 GFLOPS on the Pi 3. So it doesn't make any sense to put any effort into it, other than for academic / learning purposes. Maybe if there's nothing at all else to do at some point... but I feel that watching paint dry could be more entertaining.

 Similar Threads Thread Thread Starter Forum Replies Last Post firejuggler GPU Computing 139 2019-10-05 13:43 petrw1 GPU Computing 3 2016-08-02 15:23 firejuggler GPU Computing 12 2016-02-23 06:55 TheMawn GPU Computing 7 2013-07-01 14:08 Surge Software 4 2010-09-29 11:36

All times are UTC. The time now is 07:29.

Wed Oct 21 07:29:42 UTC 2020 up 41 days, 4:40, 0 users, load averages: 1.36, 1.57, 1.46