在 Nvidia Jetson TX2 上编译安装tensorflow
系统环境
- Jetpack:v3.0
- CUDA:8.0
- cuDNN:5.1.10
编译安装bazel
bazel是google开发的一套开发管理工具,功能类似makefile和maven,特点是速度快,编译tensorflow时需要用到这个工具。
在TX2上安装bazel需要对bazel源代码做一点修改以支持该平台。下载代码后修改文件 “bazel/src/main/java/com/google/devtools/build/lib/util/CPU.java”,修改如下:
public enum CPU {
X86_32("x86_32", ImmutableSet.of("i386", "i486", "i586", "i686", "i786", "x86")),
X86_64("x86_64", ImmutableSet.of("amd64", "x86_64", "x64")),
PPC("ppc", ImmutableSet.of("ppc", "ppc64", "ppc64le")),
- ARM("arm", ImmutableSet.of("arm", "armv7l")),
+ ARM("arm", ImmutableSet.of("arm", "armv7l", "aarch64")),
S390X("s390x", ImmutableSet.of("s390x", "s390")),
UNKNOWN("unknown", ImmutableSet.<String>of());
修改好之后在代码目录运行 “compile.sh” 进行编译,编译好后将程序拷贝到执行环境:
$ sudo cp output/bazel /usr/local/bin
安装tensorflow
下载tensorflow源码
写这篇文章的时候tensorflow已经发展到了v1.3,下载release版本代码:
$ wget https://github.com/tensorflow/tensorflow/archive/v1.3.0.tar.gz
编译tensorflow
- 配置configure
首先configure编译环境:
nvidia@tegra-ubuntu:~/tensorflow/tensorflow-1.3.0$ ./configure
You have bazel 0.4.5- installed.
Please specify the location of python. [Default is /usr/bin/python]:
Found possible Python library paths:
/usr/local/lib/python2.7/dist-packages
/usr/lib/python2.7/dist-packages
Please input the desired Python library path to use. Default is [/usr/local/lib/python2.7/dist-packages]
Using python library path: /usr/local/lib/python2.7/dist-packages
Do you wish to build TensorFlow with MKL support? [y/N]
No MKL support will be enabled for TensorFlow
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:
Do you wish to use jemalloc as the malloc implementation? [Y/n]
jemalloc enabled
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N]
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N]
No Hadoop File System support will be enabled for TensorFlow
Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N]
No XLA support will be enabled for TensorFlow
Do you wish to build TensorFlow with VERBS support? [y/N]
No VERBS support will be enabled for TensorFlow
Do you wish to build TensorFlow with OpenCL support? [y/N]
No OpenCL support will be enabled for TensorFlow
Do you wish to build TensorFlow with CUDA support? [y/N] y
CUDA support will be enabled for TensorFlow
Do you want to use clang as CUDA compiler? [y/N]
nvcc will be used as CUDA compiler
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 8.0]: 8.0
Please specify the location where CUDA 8.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 6.0]: 5.1.10
Please specify the location where cuDNN 5.1.10 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
./configure: line 669: /usr/local/cuda/extras/demo_suite/deviceQuery: No such file or directory
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]: 6.2
Do you wish to build TensorFlow with MPI support? [y/N]
MPI support will not be enabled for TensorFlow
Configuration finished
这里主要说一下配置”compute capability”的方法,默认值为”3.5,5.2”,但到底该填写什么值可以通过一个jetpack自带的程序查询出来:
nvidia@tegra-ubuntu:~/NVIDIA_CUDA-8.0_Samples/1_Utilities/deviceQuery$ ./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GP10B"
CUDA Driver Version / Runtime Version 8.5 / 8.0
CUDA Capability Major/Minor version number: 6.2
Total amount of global memory: 7854 MBytes (8235577344 bytes)
( 2) Multiprocessors, (128) CUDA Cores/MP: 256 CUDA Cores
GPU Max Clock rate: 1301 MHz (1.30 GHz)
Memory Clock rate: 13 Mhz
Memory Bus Width: 64-bit
L2 Cache Size: 524288 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: Yes
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.5, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GP10B
Result = PASS
主意上面的内容中有下面一行内容,这行的内容就是”compute capability”:
CUDA Capability Major/Minor version number: 6.2
- 编译tensorflow
执行下面的命令进行编译,并指定使用cuda
nvidia@tegra-ubuntu:~/tensorflow$ bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
- 生成pip安装包
执行完之后会生成pip包生成脚本 “./bazel-bin/tensorflow/tools/pip_package/build_pip_package”,可以执行这个脚本生成pip安装包:
nvidia@tegra-ubuntu:~/tensorflow/tensorflow-1.3.0$ ./bazel-bin/tensorflow/tools/pip_package/build_pip_package ~/tensorflow
Wed Sep 13 14:41:13 UTC 2017 : === Using tmpdir: /tmp/tmp.F109O2nAzd
~/tensorflow/tensorflow-1.3.0/bazel-bin/tensorflow/tools/pip_package/build_pip_package.runfiles ~/tensorflow/tensorflow-1.3.0
~/tensorflow/tensorflow-1.3.0
/tmp/tmp.F109O2nAzd ~/tensorflow/tensorflow-1.3.0
Wed Sep 13 14:41:20 UTC 2017 : === Building wheel
warning: no files found matching '*.dll' under directory '*'
warning: no files found matching '*.lib' under directory '*'
warning: no files found matching '*.h' under directory 'tensorflow/include/tensorflow'
warning: no files found matching '*' under directory 'tensorflow/include/Eigen'
warning: no files found matching '*' under directory 'tensorflow/include/external'
warning: no files found matching '*.h' under directory 'tensorflow/include/google'
warning: no files found matching '*' under directory 'tensorflow/include/third_party'
warning: no files found matching '*' under directory 'tensorflow/include/unsupported'
~/tensorflow/tensorflow-1.3.0
Wed Sep 13 14:41:52 UTC 2017 : === Output wheel file is in: /home/nvidia/tensorflow
- 安装tensorflow
执行下面的命令安装:
$ pip install tensorflow-1.3.0-cp27-cp27mu-linux_aarch64.whl
eigen导致编译错误处理
在编译tensorflow的过程中碰到了几个问题,主要是由于eigen引起。
错误1: Jacobi.h has no member named ‘pmul’
...
In file included from external/eigen_archive/Eigen/Jacobi:27:0,
from external/eigen_archive/Eigen/Eigenvalues:16,
from ./third_party/eigen3/Eigen/Eigenvalues:1,
from tensorflow/core/kernels/self_adjoint_eig_v2_op.cc:19:
external/eigen_archive/Eigen/src/Jacobi/Jacobi.h: In instantiation of 'void Eigen::internal::apply_rotation_in_the_plane(Eigen::DenseBase<Derived>&, Eigen::DenseBase<Derived>&, const Eigen::J
acobiRotation<OtherScalar>&) [with VectorX = Eigen::Block<Eigen::Map<Eigen::Matrix<std::complex<float>, -1, -1>, 0, Eigen::Stride<0, 0> >, -1, 1, true>; VectorY = Eigen::Block<Eigen::Map<Eige
n::Matrix<std::complex<float>, -1, -1>, 0, Eigen::Stride<0, 0> >, -1, 1, true>; OtherScalar = float]':
external/eigen_archive/Eigen/src/Jacobi/Jacobi.h:297:40: required from 'void Eigen::MatrixBase<Derived>::applyOnTheRight(Eigen::Index, Eigen::Index, const Eigen::JacobiRotation<OtherScalar>
&) [with OtherScalar = float; Derived = Eigen::Map<Eigen::Matrix<std::complex<float>, -1, -1>, 0, Eigen::Stride<0, 0> >; Eigen::Index = long int]'
external/eigen_archive/Eigen/src/Eigenvalues/SelfAdjointEigenSolver.h:861:7: required from 'void Eigen::internal::tridiagonal_qr_step(RealScalar*, RealScalar*, Index, Index, Scalar*, Index)
[with int StorageOrder = 0; RealScalar = float; Scalar = std::complex<float>; Index = long int]'
external/eigen_archive/Eigen/src/Eigenvalues/SelfAdjointEigenSolver.h:520:87: required from 'Eigen::ComputationInfo Eigen::internal::computeFromTridiagonal_impl(DiagType&, SubDiagType&, Eig
en::Index, bool, MatrixType&) [with MatrixType = Eigen::Matrix<std::complex<float>, -1, -1>; DiagType = Eigen::Matrix<float, -1, 1>; SubDiagType = Eigen::Matrix<float, -1, 1>; Eigen::Index =
long int]'
....
external/eigen_archive/Eigen/src/Jacobi/Jacobi.h:386:35: error: 'struct Eigen::internal::conj_helper<__vector(4) float, Eigen::internal::Packet2cf, false, false>' has no member named 'pmul'
external/eigen_archive/Eigen/src/Jacobi/Jacobi.h:415:22: error: 'struct Eigen::internal::conj_helper<__vector(4) float, Eigen::internal::Packet2cf, false, false>' has no member named 'pmul'
pstore(px, padd(pm.pmul(pc,xi),pcj.pmul(ps,yi)));
^
external/eigen_archive/Eigen/src/Jacobi/Jacobi.h:415:22: error: 'struct Eigen::internal::conj_helper<__vector(4) float, Eigen::internal::Packet2cf, false, false>' has no member named 'pmul'
external/eigen_archive/Eigen/src/Jacobi/Jacobi.h:416:22: error: 'struct Eigen::internal::conj_helper<__vector(4) float, Eigen::internal::Packet2cf, false, false>' has no member named 'pmul'
pstore(py, psub(pcj.pmul(pc,yi),pm.pmul(ps,xi)));
^
external/eigen_archive/Eigen/src/Jacobi/Jacobi.h:416:22: error: 'struct Eigen::internal::conj_helper<__vector(4) float, Eigen::internal::Packet2cf, false, false>' has no member named 'pmul'
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 2366.425s, Critical Path: 2221.96s
错误2: tensorflow/core/lib/core/threadpool.cc NonBlockingThreadPoolTempl()参数错误
ERROR: /home/nvidia/tensorflow/tensorflow-1.3.0/tensorflow/core/BUILD:1244:1: C++ compilation of rule '//tensorflow/core:lib_internal' failed: crosstool_wrapper_driver_is_not_gcc failed: error executing command external/ local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter ... (remaining 115 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
tensorflow/core/lib/core/threadpool.cc: In constructor 'tensorflow::thread::ThreadPool::Impl::Impl(tensorflow::Env*, const tensorflow::ThreadOptions&, const string&, int, bool)':
tensorflow/core/lib/core/threadpool.cc:91:56: error: no matching function for call to 'Eigen::NonBlockingThreadPoolTempl<tensorflow::thread::EigenEnvironment>::NonBlockingThreadPoolTempl(int&, bool&, tensorflow::thread:: EigenEnvironment)'
EigenEnvironment(env, thread_options, name)) {}
^
In file included from external/eigen_archive/unsupported/Eigen/CXX11/ThreadPool:58:0,
from external/eigen_archive/unsupported/Eigen/CXX11/Tensor:72,
from ./third_party/eigen3/unsupported/Eigen/CXX11/Tensor:1,
from tensorflow/core/lib/core/threadpool.cc:19:
external/eigen_archive/unsupported/Eigen/CXX11/src/ThreadPool/NonBlockingThreadPool.h:22:3: note: candidate: Eigen::NonBlockingThreadPoolTempl<Environment>::NonBlockingThreadPoolTempl(int, Environment) [with Environment = tensorflow::thread::EigenEnvironment]
NonBlockingThreadPoolTempl(int num_threads, Environment env = Environment())
^
external/eigen_archive/unsupported/Eigen/CXX11/src/ThreadPool/NonBlockingThreadPool.h:22:3: note: candidate expects 2 arguments, 3 provided
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 292.433s, Critical Path: 68.80s
解决方法是使用正确版本的eigen,其中“问题一”是用v3.3.4的eigen可以解决,“问题二”需要使用最新的eigen:
修复的方法是编辑”tensorflow/workspace.bzl”,并指定最新的eigen:
native.new_http_archive(
name = "eigen_archive",
#urls = [
# "http://mirror.bazel.build/bitbucket.org/eigen/eigen/get/f3a22f35b044.tar.gz",
# "https://bitbucket.org/eigen/eigen/get/f3a22f35b044.tar.gz",
#],
#sha256 = "ca7beac153d4059c02c8fc59816c82d54ea47fe58365e8aded4082ded0b820c4",
#strip_prefix = "eigen-eigen-f3a22f35b044",
urls = [
"https://bitbucket.org/eigen/eigen/get/tip.tar.gz",
],
sha256 = "6fe7af8244ab5d9c314a26bc8615adc61269896cfd66f1ae2cce3d6ee91a5b88",
strip_prefix = "eigen-eigen-034fba127699",
build_file = str(Label("//third_party:eigen.BUILD")),
)
其中“sha256”和“strip_prefix”需要根据新的eigen修正。