TX2

系统环境

Jetpack：v3.0
CUDA：8.0
cuDNN：5.1.10

编译安装bazel

bazel是google开发的一套开发管理工具，功能类似makefile和maven，特点是速度快，编译tensorflow时需要用到这个工具。

在TX2上安装bazel需要对bazel源代码做一点修改以支持该平台。下载代码后修改文件 “bazel/src/main/java/com/google/devtools/build/lib/util/CPU.java”，修改如下：

public enum CPU {
  X86_32("x86_32", ImmutableSet.of("i386", "i486", "i586", "i686", "i786", "x86")),
  X86_64("x86_64", ImmutableSet.of("amd64", "x86_64", "x64")),
  PPC("ppc", ImmutableSet.of("ppc", "ppc64", "ppc64le")),
-  ARM("arm", ImmutableSet.of("arm", "armv7l")),
+  ARM("arm", ImmutableSet.of("arm", "armv7l", "aarch64")),
  S390X("s390x", ImmutableSet.of("s390x", "s390")),
  UNKNOWN("unknown", ImmutableSet.<String>of());

修改好之后在代码目录运行 “compile.sh” 进行编译，编译好后将程序拷贝到执行环境：

$ sudo cp output/bazel /usr/local/bin

安装tensorflow

下载tensorflow源码

写这篇文章的时候tensorflow已经发展到了v1.3，下载release版本代码：

$ wget https://github.com/tensorflow/tensorflow/archive/v1.3.0.tar.gz

编译tensorflow

配置configure

首先configure编译环境：

nvidia@tegra-ubuntu:~/tensorflow/tensorflow-1.3.0$ ./configure
You have bazel 0.4.5- installed.
Please specify the location of python. [Default is /usr/bin/python]:
Found possible Python library paths:
  /usr/local/lib/python2.7/dist-packages
  /usr/lib/python2.7/dist-packages
Please input the desired Python library path to use.  Default is [/usr/local/lib/python2.7/dist-packages]

Using python library path: /usr/local/lib/python2.7/dist-packages
Do you wish to build TensorFlow with MKL support? [y/N]
No MKL support will be enabled for TensorFlow
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:
Do you wish to use jemalloc as the malloc implementation? [Y/n]
jemalloc enabled
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N]
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N]
No Hadoop File System support will be enabled for TensorFlow
Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N]
No XLA support will be enabled for TensorFlow
Do you wish to build TensorFlow with VERBS support? [y/N]
No VERBS support will be enabled for TensorFlow
Do you wish to build TensorFlow with OpenCL support? [y/N]
No OpenCL support will be enabled for TensorFlow
Do you wish to build TensorFlow with CUDA support? [y/N] y
CUDA support will be enabled for TensorFlow
Do you want to use clang as CUDA compiler? [y/N]
nvcc will be used as CUDA compiler
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 8.0]: 8.0
Please specify the location where CUDA 8.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 6.0]: 5.1.10
Please specify the location where cuDNN 5.1.10 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
./configure: line 669: /usr/local/cuda/extras/demo_suite/deviceQuery: No such file or directory
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]: 6.2
Do you wish to build TensorFlow with MPI support? [y/N]
MPI support will not be enabled for TensorFlow
Configuration finished

这里主要说一下配置”compute capability”的方法，默认值为”3.5,5.2”，但到底该填写什么值可以通过一个jetpack自带的程序查询出来：

nvidia@tegra-ubuntu:~/NVIDIA_CUDA-8.0_Samples/1_Utilities/deviceQuery$ ./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GP10B"
  CUDA Driver Version / Runtime Version          8.5 / 8.0
  CUDA Capability Major/Minor version number:    6.2
  Total amount of global memory:                 7854 MBytes (8235577344 bytes)
  ( 2) Multiprocessors, (128) CUDA Cores/MP:     256 CUDA Cores
  GPU Max Clock rate:                            1301 MHz (1.30 GHz)
  Memory Clock rate:                             13 Mhz
  Memory Bus Width:                              64-bit
  L2 Cache Size:                                 524288 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            Yes
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 0 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.5, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GP10B
Result = PASS

主意上面的内容中有下面一行内容，这行的内容就是”compute capability”：

CUDA Capability Major/Minor version number:    6.2

编译tensorflow

执行下面的命令进行编译，并指定使用cuda

nvidia@tegra-ubuntu:~/tensorflow$ bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

生成pip安装包

执行完之后会生成pip包生成脚本 “./bazel-bin/tensorflow/tools/pip_package/build_pip_package”，可以执行这个脚本生成pip安装包：

nvidia@tegra-ubuntu:~/tensorflow/tensorflow-1.3.0$ ./bazel-bin/tensorflow/tools/pip_package/build_pip_package ~/tensorflow
Wed Sep 13 14:41:13 UTC 2017 : === Using tmpdir: /tmp/tmp.F109O2nAzd
~/tensorflow/tensorflow-1.3.0/bazel-bin/tensorflow/tools/pip_package/build_pip_package.runfiles ~/tensorflow/tensorflow-1.3.0
~/tensorflow/tensorflow-1.3.0
/tmp/tmp.F109O2nAzd ~/tensorflow/tensorflow-1.3.0
Wed Sep 13 14:41:20 UTC 2017 : === Building wheel
warning: no files found matching '*.dll' under directory '*'
warning: no files found matching '*.lib' under directory '*'
warning: no files found matching '*.h' under directory 'tensorflow/include/tensorflow'
warning: no files found matching '*' under directory 'tensorflow/include/Eigen'
warning: no files found matching '*' under directory 'tensorflow/include/external'
warning: no files found matching '*.h' under directory 'tensorflow/include/google'
warning: no files found matching '*' under directory 'tensorflow/include/third_party'
warning: no files found matching '*' under directory 'tensorflow/include/unsupported'
~/tensorflow/tensorflow-1.3.0
Wed Sep 13 14:41:52 UTC 2017 : === Output wheel file is in: /home/nvidia/tensorflow

安装tensorflow

执行下面的命令安装：

$ pip install tensorflow-1.3.0-cp27-cp27mu-linux_aarch64.whl

eigen导致编译错误处理

在编译tensorflow的过程中碰到了几个问题，主要是由于eigen引起。

错误1: Jacobi.h has no member named ‘pmul’

...

In file included from external/eigen_archive/Eigen/Jacobi:27:0,
                 from external/eigen_archive/Eigen/Eigenvalues:16,
                 from ./third_party/eigen3/Eigen/Eigenvalues:1,
                 from tensorflow/core/kernels/self_adjoint_eig_v2_op.cc:19:
external/eigen_archive/Eigen/src/Jacobi/Jacobi.h: In instantiation of 'void Eigen::internal::apply_rotation_in_the_plane(Eigen::DenseBase<Derived>&, Eigen::DenseBase<Derived>&, const Eigen::J
acobiRotation<OtherScalar>&) [with VectorX = Eigen::Block<Eigen::Map<Eigen::Matrix<std::complex<float>, -1, -1>, 0, Eigen::Stride<0, 0> >, -1, 1, true>; VectorY = Eigen::Block<Eigen::Map<Eige
n::Matrix<std::complex<float>, -1, -1>, 0, Eigen::Stride<0, 0> >, -1, 1, true>; OtherScalar = float]':
external/eigen_archive/Eigen/src/Jacobi/Jacobi.h:297:40:   required from 'void Eigen::MatrixBase<Derived>::applyOnTheRight(Eigen::Index, Eigen::Index, const Eigen::JacobiRotation<OtherScalar>
&) [with OtherScalar = float; Derived = Eigen::Map<Eigen::Matrix<std::complex<float>, -1, -1>, 0, Eigen::Stride<0, 0> >; Eigen::Index = long int]'
external/eigen_archive/Eigen/src/Eigenvalues/SelfAdjointEigenSolver.h:861:7:   required from 'void Eigen::internal::tridiagonal_qr_step(RealScalar*, RealScalar*, Index, Index, Scalar*, Index)
 [with int StorageOrder = 0; RealScalar = float; Scalar = std::complex<float>; Index = long int]'
external/eigen_archive/Eigen/src/Eigenvalues/SelfAdjointEigenSolver.h:520:87:   required from 'Eigen::ComputationInfo Eigen::internal::computeFromTridiagonal_impl(DiagType&, SubDiagType&, Eig
en::Index, bool, MatrixType&) [with MatrixType = Eigen::Matrix<std::complex<float>, -1, -1>; DiagType = Eigen::Matrix<float, -1, 1>; SubDiagType = Eigen::Matrix<float, -1, 1>; Eigen::Index =
long int]'

....

external/eigen_archive/Eigen/src/Jacobi/Jacobi.h:386:35: error: 'struct Eigen::internal::conj_helper<__vector(4) float, Eigen::internal::Packet2cf, false, false>' has no member named 'pmul'
external/eigen_archive/Eigen/src/Jacobi/Jacobi.h:415:22: error: 'struct Eigen::internal::conj_helper<__vector(4) float, Eigen::internal::Packet2cf, false, false>' has no member named 'pmul'
       pstore(px, padd(pm.pmul(pc,xi),pcj.pmul(ps,yi)));
                      ^
external/eigen_archive/Eigen/src/Jacobi/Jacobi.h:415:22: error: 'struct Eigen::internal::conj_helper<__vector(4) float, Eigen::internal::Packet2cf, false, false>' has no member named 'pmul'
external/eigen_archive/Eigen/src/Jacobi/Jacobi.h:416:22: error: 'struct Eigen::internal::conj_helper<__vector(4) float, Eigen::internal::Packet2cf, false, false>' has no member named 'pmul'
       pstore(py, psub(pcj.pmul(pc,yi),pm.pmul(ps,xi)));
                      ^
external/eigen_archive/Eigen/src/Jacobi/Jacobi.h:416:22: error: 'struct Eigen::internal::conj_helper<__vector(4) float, Eigen::internal::Packet2cf, false, false>' has no member named 'pmul'
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 2366.425s, Critical Path: 2221.96s

错误2: tensorflow/core/lib/core/threadpool.cc NonBlockingThreadPoolTempl()参数错误

ERROR: /home/nvidia/tensorflow/tensorflow-1.3.0/tensorflow/core/BUILD:1244:1: C++ compilation of rule '//tensorflow/core:lib_internal' failed: crosstool_wrapper_driver_is_not_gcc failed: error executing command external/              local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter ... (remaining 115 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
tensorflow/core/lib/core/threadpool.cc: In constructor 'tensorflow::thread::ThreadPool::Impl::Impl(tensorflow::Env*, const tensorflow::ThreadOptions&, const string&, int, bool)':
tensorflow/core/lib/core/threadpool.cc:91:56: error: no matching function for call to 'Eigen::NonBlockingThreadPoolTempl<tensorflow::thread::EigenEnvironment>::NonBlockingThreadPoolTempl(int&, bool&, tensorflow::thread::              EigenEnvironment)'
             EigenEnvironment(env, thread_options, name)) {}
                                                        ^
In file included from external/eigen_archive/unsupported/Eigen/CXX11/ThreadPool:58:0,
                 from external/eigen_archive/unsupported/Eigen/CXX11/Tensor:72,
                 from ./third_party/eigen3/unsupported/Eigen/CXX11/Tensor:1,
                 from tensorflow/core/lib/core/threadpool.cc:19:
external/eigen_archive/unsupported/Eigen/CXX11/src/ThreadPool/NonBlockingThreadPool.h:22:3: note: candidate: Eigen::NonBlockingThreadPoolTempl<Environment>::NonBlockingThreadPoolTempl(int, Environment) [with Environment = tensorflow::thread::EigenEnvironment]
   NonBlockingThreadPoolTempl(int num_threads, Environment env = Environment())
   ^
external/eigen_archive/unsupported/Eigen/CXX11/src/ThreadPool/NonBlockingThreadPool.h:22:3: note:   candidate expects 2 arguments, 3 provided
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 292.433s, Critical Path: 68.80s

解决方法是使用正确版本的eigen，其中“问题一”是用v3.3.4的eigen可以解决，“问题二”需要使用最新的eigen：

修复的方法是编辑”tensorflow/workspace.bzl”，并指定最新的eigen：

  native.new_http_archive(
      name = "eigen_archive",
      #urls = [
      #    "http://mirror.bazel.build/bitbucket.org/eigen/eigen/get/f3a22f35b044.tar.gz",
      #    "https://bitbucket.org/eigen/eigen/get/f3a22f35b044.tar.gz",
      #],
      #sha256 = "ca7beac153d4059c02c8fc59816c82d54ea47fe58365e8aded4082ded0b820c4",
      #strip_prefix = "eigen-eigen-f3a22f35b044",
      urls = [
          "https://bitbucket.org/eigen/eigen/get/tip.tar.gz",
      ],
      sha256 = "6fe7af8244ab5d9c314a26bc8615adc61269896cfd66f1ae2cce3d6ee91a5b88",
      strip_prefix = "eigen-eigen-034fba127699",
      build_file = str(Label("//third_party:eigen.BUILD")),
  )

其中“sha256”和“strip_prefix”需要根据新的eigen修正。

在 Nvidia Jetson TX2 上编译安装tensorflow

系统环境

编译安装bazel

安装tensorflow

下载tensorflow源码

编译tensorflow

eigen导致编译错误处理

错误1: Jacobi.h has no member named ‘pmul’

错误2: tensorflow/core/lib/core/threadpool.cc NonBlockingThreadPoolTempl()参数错误

singleye

在 Nvidia Jetson TX2 上编译安装tensorflow

系统环境

编译安装bazel

安装tensorflow

下载tensorflow源码

编译tensorflow

eigen导致编译错误处理

错误1: Jacobi.h has no member named ‘pmul’

错误2: tensorflow/core/lib/core/threadpool.cc NonBlockingThreadPoolTempl()参数错误

singleye

OCR 项目上线了

SLAM 算法传感器融合方法

欧拉角、旋转矩阵、四元数、轴角相互转换

基于 Kalman filter 的目标跟踪

在 Apple silicon (M3 Max) 上对 Llama2 进行微调

tmux AI 助手

使用 ros::waitForShutdown() 导致 dynamic_reconfigure::Server 无法正常获取配置更新的问题

PCL 3D 空间检测平行四边形

javascript var/let/const 比较

django-rest-framework 和 simplejwt 的类关系