TensorFlow: setup

Configurazione attuale

  • Scheda Video NVidia GeForce GTX 1050 Ti
  • Ubuntu 18.04
  • Java: assente
  • Nvidia drivers: assente
  • gcc 7.5

Steps

Requisiti Cuda 11 li ho presi per buoni anche per Cuda 10.1.

  1. Verifico versione di Ubuntu
    1
    2
    
    ~$ lsb_release -d
    Description:	Ubuntu 18.04.4 LTS
    
  2. Verifico versione kernel di Ubuntu
    1
    2
    
    ~$ uname -r
    4.15.0-112-generic
    
  3. Verifico scheda video se adatta a CUDA
    1
    2
    
    ~$ sudo lshw -C display
    product: GP107 GeForce GTX 1050 Ti
    
  4. Verifico Java (nessun messaggio di output)
    1
    
    ~$ java --version
    
  5. Non ho java installato, quindi lo installo
    Le due versioni principali sono la 8 e la 11, istallo quella più recente
    1
    2
    
    ~$ sudo apt update
    ~$ sudo apt install openjdk-11-jdk
    
  6. Verifico Java (non so se sarà un problema la 11 invece della 8)
    1
    2
    3
    4
    
    ~$ java --version
    openjdk 11.0.7 2020-04-14
    OpenJDK Runtime Environment (build 11.0.7+10-post-Ubuntu-2ubuntu218.04)
    OpenJDK 64-Bit Server VM (build 11.0.7+10-post-Ubuntu-2ubuntu218.04, mixed mode, sharing)
    
  7. Verifico che gcc sia installato (non so se sarà un problema la 7.5 invece della 7.4 come requisito)
    1
    2
    
    ~$ gcc –version
    gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
    

Nvidia Drivers

  1. Aggiungo repository grafici
    1
    2
    3
    
    ~$ sudo add-apt-repository ppa:graphics-drivers/ppa
    ~$ sudo apt update
    ~$ sudo apt upgrade
    
  2. Driver disponibili
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    
    ~$ ubuntu-drivers devices
    == /sys/devices/pci0000:00/0000:00:03.1/0000:1c:00.0 ==
    modalias : pci:v000010DEd00001C82sv00001458sd00003732bc03sc00i00
    vendor   : NVIDIA Corporation
    model    : GP107 [GeForce GTX 1050 Ti]
    driver   : nvidia-driver-410 - third-party free
    driver   : nvidia-driver-440 - distro non-free
    driver   : nvidia-driver-435 - distro non-free
    driver   : nvidia-driver-390 - distro non-free
    driver   : nvidia-driver-415 - third-party free
    driver   : nvidia-driver-450 - third-party free recommended
    driver   : xserver-xorg-video-nouveau - distro free builtin
    
  3. Installo l’ultima versione (Versione 450, 931MB)
    1
    
    ~$ sudo ubuntu-drivers autoinstall
    
  4. Resetto il PC
    1
    
    ~$ sudo reboot
    
  5. Verifico driver Nvidia installati (utile anche per monitorare risorse GPU)
    1
    
    ~$ nvidia-smi
    

CUDA

  1. Installa CUDA dependencies
    Me ne ero dimenticato e le ho installate dopo, infatti il Summary dell’installazione di cuda mi ha avvisato ‘missing recommended libraries’
    1
    2
    
    ~$ sudo apt install freeglut3-dev libx11-dev libxmu-dev libxi-dev libglu1-mesa libglu1-mesa-dev
    ~$ sudo apt install g++ build-essential # non li ho installati
    
  2. Installo CUDA
    Tensorflow 2.2 supporta cuda 10.1, non superiore, pesa circa 2.4GB.
    Download dal sito Nvidia, necessita della registrazione il portale developer
    Apparirà un messaggio che avvisa che i driver Nvidia sono già installati, è suffiente continuare ma dopo bisogna rimuovere dall’elenco che propone l’installazione dei Nvidia drivers (es. 418.87.00).
    1
    2
    3
    4
    5
    6
    7
    
    ~$ cd Downloads
    ~$ wget http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda_10.1.243_418.87.00_linux.run
    ~$ sudo sh cuda_10.1.243_418.87.00_linux.run
    Existing package manager installation of the driver found. It is strongly
    recommended that you remove this before continuing.
    Abort
    Continue
    

    ..Continue
    ..Accept
    ..unmark Driver
    ..Install

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    
    Summary
    Driver:   Not Selected
    Toolkit:  Installed in /usr/local/cuda-10.1/
    Samples:  Installed in /home/user/, but missing recommended libraries
    Please make sure that
    PATH includes /usr/local/cuda-10.1/bin
    LD_LIBRARY_PATH includes /usr/local/cuda-10.1/lib64, or, add /usr/local/cuda-10.1/lib64 to /etc/ld.so.conf and run ldconfig as root
    To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-10.1/bin
    Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-10.1/doc/pdf for detailed information on setting up CUDA.
    WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 418.00 is required for CUDA 10.1 functionality to work.
    To install the driver using this installer, run the following command, replacing CudaInstaller with the name of this run file:
    sudo CudaInstaller.run --silent --driver
    Logfile is /var/log/cuda-installer.log
    
  3. Cuda Path
    Ho l’impressione che sia inutile perché così sono temporanei. Ho sempre avuto estremo fastidio ad impostare le variabili d’ambiente con Ubuntu perché sembrano esserci 3 file diversi in cui si potrebbero impostare e nella storia delle versioni di ubuntu hanno spesso cambiato nome/percorso.
    Mi assicuro che il path in cui stia cuda sia quello corretto.
    1
    2
    3
    4
    
    ~$ ls /usr/local/cuda-10.1/
    ~$ export PATH=/usr/local/cuda-10.1/bin${PATH:+:${PATH}}
    ~$ export LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib64\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
    ~$ echo $PATH
    
  4. Cuda Path Permanent
    Non l’ho provato e personalmente eviterei. Se rompi il bashrc è una rottura di scatole, un giorno imparerò a manipolarlo.. forse
    1
    2
    
    ~$ echo "export PATH=/usr/local/cuda-10.1/bin:$PATH" >> ~/.bashrc
    ~$ echo "export LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib64:$LD_LIBRARY_PATH" >> ~/.bashrc 
    
  5. Cuda test example
    1
    2
    3
    
    ~$ cd ~/NVIDIA_CUDA-10.1_Samples/5_Simulations/nbody
    ~$ make
    ~$ ./nbody
    

  6. Cuda version
    nb. se da errore è perché bisogna richiamare nuovamente la variabile d’ambiente
    1
    2
    3
    4
    5
    
    nvcc --version
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2019 NVIDIA Corporation
    Built on Sun_Jul_28_19:07:16_PDT_2019
    Cuda compilation tools, release 10.1, V10.1.243
    

cuDNN

  1. Installa cuDNN
    Download cuDNN dal sito Nvidia, necessita della registrazione il portale developer
    Download cuDNN v7.6.5 (November 5th, 2019), for CUDA 10.1
    libcudnn7_7.6.5.32-1%2Bcuda10.1_amd64.deb (Runtime Library)
    libcudnn7-dev_7.6.5.32-1%2Bcuda10.1_amd64.deb (Developer Library)
    libcudnn7-doc_7.6.5.32-1%2Bcuda10.1_amd64.deb (Code Samples)
    1
    2
    3
    4
    
    ~$ cd Downloads/
    ~$ sudo dpkg -i libcudnn7_7.6.5.32-1+cuda10.1_amd64.deb
    ~$ sudo dpkg -i libcudnn7-dev_7.6.5.32-1+cuda10.1_amd64.deb
    ~$ sudo dpkg -i libcudnn7-doc_7.6.5.32-1+cuda10.1_amd64.deb
    
  2. Resetto il PC
    1
    
    ~$ sudo reboot
    
  3. Verify Cuda Installation
    1
    2
    3
    
    ~$ cd /usr/local/cuda/samples/1_Utilities/deviceQuery
    ~$ sudo make
    ~$ ./deviceQuery
    
    1
    2
    3
    
    ~$ cd /usr/local/cuda/samples/1_Utilities/bandwidthTest 
    ~$ sudo make
    ~$ ./bandwidthTest
    
    1
    2
    3
    
    ~$ cd /usr/src/cudnn_samples_v7/mnistCUDNN/
    ~$ sudo make clean && sudo make
    ~$ ./mnistCUDNN
    
    1
    2
    3
    
    ~$ cd /usr/src/cudnn_samples_v7/conv_sample/
    ~$ sudo make clean && sudo make
    ~$ ./conv_sample
    

Python

  1. Costruisco un conda environment apposito per TensorFlow
    1
    2
    3
    4
    5
    
    ~$ conda-env list
    base                  *  /home/user/miniconda3
    py3                      /home/user/miniconda3/envs/py3
    ~$ conda create -n py3_tf --clone py3
    ~$ conda activate py3_tf
    
  2. Installo TensorFlow
    1
    2
    3
    
    ~$ pip install --upgrade pip
    ~$ pip install --upgrade tensorflow
    Downloading tensorflow-2.2.0-cp37-cp37m-manylinux2010_x86_64.whl (516.2 MB)
    
  3. Verifico pre-installazione (1/2) (l’ho lanciato prima di fare qualsiasi tipo di setup)
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    
    ~$ python -c "import tensorflow as tf; x = [[2.]]; print('Tensorflow Version ', tf.__version__); print('hello TF world, {}'.format(tf.matmul(x, x)))"
    Tensorflow Version  2.2.0
    2020-07-23 00:23:46.566744: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
    2020-07-23 00:23:46.566765: E tensorflow/stream_executor/cuda/cuda_driver.cc:313] failed call to cuInit: UNKNOWN ERROR (303)
    2020-07-23 00:23:46.566786: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (unknown): /proc/driver/nvidia/version does not exist
    2020-07-23 00:23:46.567045: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
    2020-07-23 00:23:46.591079: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3199620000 Hz
    2020-07-23 00:23:46.591771: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fe094000b20 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
    2020-07-23 00:23:46.591789: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
    hello TF world, [[4.]]
    
  4. Verifico pre-installazione (2/2)
    1
    2
    3
    4
    
    if tf.test.gpu_device_name(): 
     print('Default GPU Device:{}'.format(tf.test.gpu_device_name()))
    else:
     print("Please install GPU version of TF")
    
    1
    
    Please install GPU version of TF
    
  5. Verifico post-installazione (1/2)
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    
    ~$ python -c "import tensorflow as tf; x = [[2.]]; print('Tensorflow Version ', tf.__version__); print('hello TF world, {}'.format(tf.matmul(x, x)))"
    Tensorflow Version  2.2.0
    2020-07-23 23:51:18.168952: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
    2020-07-23 23:51:18.223363: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2020-07-23 23:51:18.223705: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
    pciBusID: 0000:1c:00.0 name: GeForce GTX 1050 Ti computeCapability: 6.1
    coreClock: 1.43GHz coreCount: 6 deviceMemorySize: 3.94GiB deviceMemoryBandwidth: 104.43GiB/s
    2020-07-23 23:51:18.226765: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
    2020-07-23 23:51:18.291938: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
    2020-07-23 23:51:18.323948: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
    2020-07-23 23:51:18.334741: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
    2020-07-23 23:51:18.409372: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
    2020-07-23 23:51:18.418972: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
    2020-07-23 23:51:18.521344: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
    2020-07-23 23:51:18.521648: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    ..removed some prints
    2020-07-23 23:51:18.629002: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2020-07-23 23:51:18.629643: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3349 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:1c:00.0, compute capability: 6.1)
    2020-07-23 23:51:18.637862: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
    hello TF world, [[4.]]
    
  6. Verifico post-installazione (2/2)
    1
    2
    3
    4
    
    if tf.test.gpu_device_name(): 
     print('Default GPU Device:{}'.format(tf.test.gpu_device_name()))
    else:
     print("Please install GPU version of TF")
    
    1
    
    Default GPU Device:/device:GPU:0
    

Install-cuda-10-and-cudnn-on-ubuntu-18
How-To-Install-CUDA-10-1-on-Ubuntu-19-04

Dubbi

  1. Il fatto che debba limitare la memoria della CPU in modo forzato comporta una perdita di performace?
  2. Usare il OpenJDK 11 invece del 8 può dare problemi?
  3. Usare il compilatore gcc 7.5 invece del 7.4 può dare problemi?
  4. Perché se ho installato Cuda 10.1 il comando ‘nvidia-smi’ mi restituisce Cuda Version: 11.0