![]() |
|
|
#1 |
|
"Sam Laur"
Dec 2018
Turku, Finland
317 Posts |
I finally received the Nvidia Jetson Nano devboard ($99, a bit more in EU with tax and shipping) that I ordered back in March. Well actually I got it a couple weeks ago, but now I finally got around to taking it into use.
It has a cut down Tegra X1 with four Cortex-A57 cores running at 1.428 GHz, plus 128 CU in the GPU portion that is running at 921.6 MHz. Maxwell architecture, so it's compute capability 5.3 - getting a bit old by now. So, there are three things I tested on it, just for fun I guess. Mlucas 18.0 timings: Code:
18.0
2048 msec/iter = 49.13 ROE[avg,max] = [0.000311985, 0.375000000] radices = 128 16 16 32 0 0 0 0 0 0
2304 msec/iter = 55.65 ROE[avg,max] = [0.000273731, 0.375000000] radices = 144 16 16 32 0 0 0 0 0 0
2560 msec/iter = 60.10 ROE[avg,max] = [0.000236003, 0.312500000] radices = 160 16 16 32 0 0 0 0 0 0
2816 msec/iter = 68.26 ROE[avg,max] = [0.000259256, 0.343750000] radices = 176 16 16 32 0 0 0 0 0 0
3072 msec/iter = 75.20 ROE[avg,max] = [0.000267585, 0.375000000] radices = 192 16 16 32 0 0 0 0 0 0
3328 msec/iter = 80.51 ROE[avg,max] = [0.000282796, 0.375000000] radices = 208 32 16 16 0 0 0 0 0 0
3584 msec/iter = 86.82 ROE[avg,max] = [0.000254826, 0.343750000] radices = 224 16 16 32 0 0 0 0 0 0
3840 msec/iter = 93.43 ROE[avg,max] = [0.000247071, 0.312500000] radices = 240 16 16 32 0 0 0 0 0 0
4096 msec/iter = 100.19 ROE[avg,max] = [0.000227303, 0.312500000] radices = 256 16 16 32 0 0 0 0 0 0
4608 msec/iter = 112.94 ROE[avg,max] = [0.000248429, 0.312500000] radices = 288 16 16 32 0 0 0 0 0 0
5120 msec/iter = 128.99 ROE[avg,max] = [0.000234485, 0.281250000] radices = 160 32 32 16 0 0 0 0 0 0
5632 msec/iter = 146.71 ROE[avg,max] = [0.000257845, 0.343750000] radices = 176 32 32 16 0 0 0 0 0 0
6144 msec/iter = 161.94 ROE[avg,max] = [0.000247003, 0.312500000] radices = 192 32 32 16 0 0 0 0 0 0
6656 msec/iter = 172.41 ROE[avg,max] = [0.000266479, 0.375000000] radices = 208 32 32 16 0 0 0 0 0 0
7168 msec/iter = 186.51 ROE[avg,max] = [0.000226100, 0.281250000] radices = 224 32 32 16 0 0 0 0 0 0
7680 msec/iter = 200.75 ROE[avg,max] = [0.000236377, 0.312500000] radices = 240 32 32 16 0 0 0 0 0 0
8192 msec/iter = 221.50 ROE[avg,max] = [0.000237378, 0.312500000] radices = 256 32 32 16 0 0 0 0 0 0
mfaktc 0.21 - really just compiled it and did a short test run, exponent 92257213 on various bit levels (65-66, 66-67, 67-68) seems to give about 29 GHz-d/day. CUDALucas 2.06beta - I had some trouble compiling it, but the problem was solved by using the stub version of the nvidia-ml library. It runs, but complains a bit on startup. I knew the performance was going to be LOLworthy, so I only ran the cufftbench / threadbench for one FFT size: 2048K fft: 60.4402 ms per iteration (square: 32, splice: 256) ![]() In summary, this won't be good for any number crunching if it's the primary purpose, for that price. I got it for other uses though, I've been (ab)using it as a media/browser box in the living room. Works fine with Youtube etc. Much quicker and nicer to use than a Raspberry Pi, but that is hardly surprising. I didn't measure the actual power consumption though, if it's really staying at 10W then the performance isn't completely terrible, but I suspect it goes a bit higher than that when using both the CPU and GPU at the same time.
|
|
|
|
|
|
#2 |
|
∂2ω=0
Sep 2002
República de California
22×2,939 Posts |
Cool, especially if you have some 'real purpose' for the board and the GIMPS crunching is a bonus. But can it make phone calls? :) (The Mlucas timings you posted are similar to what I get from all-4-core running on one of my Galaxy S7s.)
By way if another micro-board comparison, my the main (4-core a73) processor of my odroid N2 gets timings ~20% faster than the ones you posted. So somewhat better on a bang-for-buck basis, but similar ballpark overall. And I've made no use of the GPU portion of the N2, no idea if that's useful for any GIMPS work. |
|
|
|
|
|
#3 | |
|
"Sam Laur"
Dec 2018
Turku, Finland
31710 Posts |
Quote:
Now, if the Jetson board was able to run at full Tegra X1 clock speeds (1.9 GHz instead of 1.428), things might be a bit different. But there's a list of available clock speed steps, and they've somehow locked things up so that setting any other speed won't stick. Too bad. Now that I've seen the performance, it is no wonder that the ARM server chips (Opteron A1100 series) that AMD made a few years back pretty much sank without a trace... they also had four or eight A57 cores. I have some doubts about the N2 GPU section. The Mali-G52 (from the N2) is listed as 40.8 FP32 GFLOPS per core, so double that. The Maxwell cores still available on the Jetson Nano are 236 GFLOPS total. No idea about INT32 performance though. But probably not worth the effort. I actually lost any interest in adapting mfakto for the VideoCore IV on the Raspberry Pi because of this. That thing is supposed to be just 28.8 GFLOPS on the Pi 3. So it doesn't make any sense to put any effort into it, other than for academic / learning purposes. Maybe if there's nothing at all else to do at some point... but I feel that watching paint dry could be more entertaining. |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| nvidia rtx-2080 | firejuggler | GPU Computing | 140 | 2021-04-20 00:03 |
| Nvidia GTX 745 4GB ??? | petrw1 | GPU Computing | 3 | 2016-08-02 15:23 |
| Nvidia Pascal, a third of DP | firejuggler | GPU Computing | 12 | 2016-02-23 06:55 |
| AMD + Nvidia | TheMawn | GPU Computing | 7 | 2013-07-01 14:08 |
| What can I do with my nvidia GPU? | Surge | Software | 4 | 2010-09-29 11:36 |