美文网首页NVIDIA Jetson working
TX2 using TensorFlow buglist

TX2 using TensorFlow buglist

作者: 童年雅趣 | 来源:发表于2019-02-21 12:01 被阅读209次
  1. tensorflow.python.framework.errors_impl.InternalError: GPU sync failed
    Jetson forum topic refer to
    Keras-Yolo3 验证时出错!
    原因:
    TensorFlow 运行需要内存较大,需要为TF 分配较大内存
    解决:
    方案1、Python 代码 yolo_video.py 添加 "config.gpu_options.allow_growth = True"
    方案2、释放当前Ubuntu内存,$echo 3 > /proc/sys/vm/drop_caches
nvidia@tegra-ubuntu:~/work/d.keras/keras-yolo3$ python3 yolo_video.py --input ../../algorithm/alexey_darknet/data/Autobahn.mp4

Using TensorFlow backend.
2019-02-21 03:57:43.908835: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:864] ARM64 does not support NUMA - returning NUMA node zero
2019-02-21 03:57:43.909000: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties: 
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.67GiB freeMemory: 4.85GiB
2019-02-21 03:57:43.909055: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2019-02-21 03:57:44.775649: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-02-21 03:57:44.775753: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958]      0 
2019-02-21 03:57:44.775792: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   N 
2019-02-21 03:57:44.775971: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4458 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2019-02-21 03:57:45.287109: E tensorflow/stream_executor/cuda/cuda_driver.cc:1108] could not synchronize on CUDA context: CUDA_ERROR_UNKNOWN :: *** Begin stack trace ***
    stream_executor::cuda::CUDADriver::SynchronizeContext(stream_executor::cuda::CudaContext*)
    stream_executor::StreamExecutor::SynchronizeAllActivity()
    tensorflow::GPUUtil::SyncAll(tensorflow::Device*)
*** End stack trace ***

2019-02-21 03:57:45.526617: E tensorflow/stream_executor/cuda/cuda_driver.cc:1108] could not synchronize on CUDA context: CUDA_ERROR_UNKNOWN :: *** Begin stack trace ***
    stream_executor::cuda::CUDADriver::SynchronizeContext(stream_executor::cuda::CudaContext*)
    stream_executor::StreamExecutor::SynchronizeAllActivity()
    tensorflow::GPUUtil::SyncAll(tensorflow::Device*)
*** End stack trace ***

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1322, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InternalError: GPU sync failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/nvidia/work/d.keras/keras-yolo3/yolo.py", line 70, in generate
    self.yolo_model = load_model(model_path, compile=False)
  File "/home/nvidia/.local/lib/python3.5/site-packages/keras/engine/saving.py", line 419, in load_model
    model = _deserialize_model(f, custom_objects, compile)
  File "/home/nvidia/.local/lib/python3.5/site-packages/keras/engine/saving.py", line 225, in _deserialize_model
    model = model_from_config(model_config, custom_objects=custom_objects)
  File "/home/nvidia/.local/lib/python3.5/site-packages/keras/engine/saving.py", line 458, in model_from_config
    return deserialize(config, custom_objects=custom_objects)
  File "/home/nvidia/.local/lib/python3.5/site-packages/keras/layers/__init__.py", line 55, in deserialize
    printable_module_name='layer')
  File "/home/nvidia/.local/lib/python3.5/site-packages/keras/utils/generic_utils.py", line 145, in deserialize_keras_object
    list(custom_objects.items())))
  File "/home/nvidia/.local/lib/python3.5/site-packages/keras/engine/network.py", line 1032, in from_config
    process_node(layer, node_data)
  File "/home/nvidia/.local/lib/python3.5/site-packages/keras/engine/network.py", line 991, in process_node
    layer(unpack_singleton(input_tensors), **kwargs)
  File "/home/nvidia/.local/lib/python3.5/site-packages/keras/engine/base_layer.py", line 457, in __call__
    output = self.call(inputs, **kwargs)
  File "/home/nvidia/.local/lib/python3.5/site-packages/keras/layers/normalization.py", line 185, in call
    epsilon=self.epsilon)
  File "/home/nvidia/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 1858, in normalize_batch_in_training
    if not _has_nchw_support() and list(reduction_axes) == [0, 2, 3]:
  File "/home/nvidia/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 292, in _has_nchw_support
    gpus_available = len(_get_available_gpus()) > 0
  File "/home/nvidia/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 278, in _get_available_gpus
    _LOCAL_DEVICES = get_session().list_devices()
  File "/home/nvidia/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 199, in get_session
    [tf.is_variable_initialized(v) for v in candidate_vars])
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 900, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1135, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1316, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1335, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: GPU sync failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1322, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InternalError: GPU sync failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "yolo_video.py", line 75, in <module>
    detect_video(YOLO(**vars(FLAGS)), FLAGS.input, FLAGS.output)
  File "/home/nvidia/work/d.keras/keras-yolo3/yolo.py", line 45, in __init__
    self.boxes, self.scores, self.classes = self.generate()
  File "/home/nvidia/work/d.keras/keras-yolo3/yolo.py", line 73, in generate
    if is_tiny_version else yolo_body(Input(shape=(None,None,3)), num_anchors//3, num_classes)
  File "/home/nvidia/work/d.keras/keras-yolo3/yolo3/model.py", line 72, in yolo_body
    darknet = Model(inputs, darknet_body(inputs))
  File "/home/nvidia/work/d.keras/keras-yolo3/yolo3/model.py", line 48, in darknet_body
    x = DarknetConv2D_BN_Leaky(32, (3,3))(x)
  File "/home/nvidia/work/d.keras/keras-yolo3/yolo3/utils.py", line 16, in <lambda>
    return reduce(lambda f, g: lambda *a, **kw: g(f(*a, **kw)), funcs)
  File "/home/nvidia/work/d.keras/keras-yolo3/yolo3/utils.py", line 16, in <lambda>
    return reduce(lambda f, g: lambda *a, **kw: g(f(*a, **kw)), funcs)
  File "/home/nvidia/.local/lib/python3.5/site-packages/keras/engine/base_layer.py", line 457, in __call__
    output = self.call(inputs, **kwargs)
  File "/home/nvidia/.local/lib/python3.5/site-packages/keras/layers/normalization.py", line 185, in call
    epsilon=self.epsilon)
  File "/home/nvidia/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 1858, in normalize_batch_in_training
    if not _has_nchw_support() and list(reduction_axes) == [0, 2, 3]:
  File "/home/nvidia/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 292, in _has_nchw_support
    gpus_available = len(_get_available_gpus()) > 0
  File "/home/nvidia/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 278, in _get_available_gpus
    _LOCAL_DEVICES = get_session().list_devices()
  File "/home/nvidia/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 199, in get_session
    [tf.is_variable_initialized(v) for v in candidate_vars])
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 900, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1135, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1316, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1335, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: GPU sync failed

nvidia@tegra-ubuntu:~/work/d.keras/keras-yolo3$ vim yolo_video.py 
> import tensorflow as tf
> from keras.backend.tensorflow_backend import set_session
> config = tf.ConfigProto()
> config.gpu_options.allow_growth = True
> sess = tf.Session(config=config)
> set_session(sess)

运行环境及结果:
NVIDIA Jetson TX2
Python 3.5.2 + Keras 2.2.4 + tensorflow 1.9.0
Keras-Yolov3 运行并分析视频Autobahn.mp4,但帧率非常低,仅有1-2fps

  1. "ARM64 does not support NUMA",但不影响运行
    TX2的ARM64架构不支持NUMA,运行时会提示:
    tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:864] ARM64 does not support NUMA - returning NUMA node zero
    (TensorFlow 分为有GPU和无GPU版本)

  2. "Dst tensor is not initialized."
    原因: 内存不足
    解决:
    方案1、释放Ubuntu系统内存(root) $echo 3 > /proc/sys/vm/drop_caches
    方案2、Python 代码添加 "config.gpu_options.allow_growth = True"

2019-02-22 07:52:01.459910: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Sum Total of in-use chunks: 142.79MiB
2019-02-22 07:52:01.459965: I tensorflow/core/common_runtime/bfc_allocator.cc:680] Stats: 
Limit:                   513175552
InUse:                   149731072
MaxInUse:                163384064
NumAllocs:                      64
MaxAllocSize:             67108864

2019-02-22 07:52:01.460006: W tensorflow/core/common_runtime/bfc_allocator.cc:279] ****xx***__****__________________****************__**__***********************______________________
Traceback (most recent call last):
  File "scripts/models_to_frozen_graphs.py", line 64, in <module>
    sess=tf_sess
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1752, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 900, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1135, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1316, in _do_run
    run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1335, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
     [[Node: save/RestoreV2/_55 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_60_save/RestoreV2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

相关文章

网友评论

    本文标题:TX2 using TensorFlow buglist

    本文链接:https://www.haomeiwen.com/subject/leyhyqtx.html