8000 Could not find any TPU devices on Colab using TF 2.0 Alpha · Issue #26513 · tensorflow/tensorflow · GitHub
[go: up one dir, main page]

Skip to content

Could not find any TPU devices on Colab using TF 2.0 Alpha #26513

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
leemengtw opened this issue Mar 9, 2019 · 2 comments
Closed

Could not find any TPU devices on Colab using TF 2.0 Alpha #26513

leemengtw opened this issue Mar 9, 2019 · 2 comments
Assignees
Labels
comp:dist-strat Distribution Strategy related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.0 Issues relating to TensorFlow 2.0 type:bug Bug

Comments

@leemengtw
Copy link

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Colab
  • TensorFlow installed from (source or binary): source using pip
  • TensorFlow version (use command below): 2.0.0-alpha0
  • Python version: 3.6

Describe the current behavior

Error occurred when try to run the colab notebook shown in TF 2.0 Alpha: Distributed Training in TensorFlow for TPUStrategy:

resolver = tf.distribute.cluster_resolver.TPUClusterResolver()
tf.tpu.experimental.initialize_tpu_system(resolver)
tpu_strategy = tf.distribute.experimental.TPUStrategy(resolver)

# output: RuntimeError: Could not find any TPU devices
# (Detailed Error message shown below)

I had enabled Colab runtime to TPU, and even checked there indeed is a TPU available:

def check_tpu_statue():
    import os
    
    if 'COLAB_TPU_ADDR' not in os.environ:
      print('ERROR: Not connected to a TPU runtime')
    else:
      tpu_address = 'grpc://' + os.environ['COLAB_TPU_ADDR']
      print ('TPU address is', tpu_address)

check_tpu_statue()
# output: TPU address is grpc://10.70.191.234:8470

Describe the expected behavior

TPU devices can be found on Colab when runtime is changed to TPU and using:

  • tf.tpu.experimental.initialize_tpu_system(resolver)
  • tpu_strategy = tf.distribute.experimental.TPUStrategy(resolver)

Code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate the problem.

!pip install tensorflow-gpu==2.0.0-alpha0
import tensorflow as tf

resolver = tf.distribute.cluster_resolver.TPUClusterResolver()
tf.tpu.experimental.initialize_tpu_system(resolver)
tpu_strategy = tf.distribute.experimental.TPUStrategy(resolver)

Other info / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-15-9ec182bf3b8d> in <module>()
      1 resolver = tf.distribute.cluster_resolver.TPUClusterResolver()
----> 2 tf.tpu.experimental.initialize_tpu_system(resolver)
      3 tpu_strategy = tf.distribute.experimental.TPUStrategy(resolver)

/usr/local/lib/python3.6/dist-packages/tensorflow/python/tpu/tpu_strategy_util.py in initialize_tpu_system(cluster_resolver)
     89     # pylint: enable=protected-access
     90 
---> 91     with ops.device(get_first_tpu_host_device(cluster_resolver)):
     92       output = tpu_functional_ops.TPUPartitionedCall(
     93           args=[], device_ordinal=0, Tout=[dtypes.string], f=func_name)

/usr/local/lib/python3.6/dist-packages/tensorflow/python/tpu/tpu_strategy_util.py in get_first_tpu_host_device(cluster_resolver)
     41         [x for x in context.list_devices() if "device:TPU:" in x])
     42     if not tpu_devices:
---> 43       raise RuntimeError("Could not find any TPU devices")
     44     spec = tf_device.DeviceSpec.from_string(tpu_devices[0])
     45     task_id = spec.task

RuntimeError: Could not find any TPU devices
@ymodak ymodak self-assigned this Mar 12, 2019
@ymodak ymodak added comp:tpus tpu, tpuestimator comp:dist-strat Distribution Strategy related issues type:bug Bug and removed comp:tpus tpu, tpuestimator labels Mar 12, 2019
@ymodak ymodak assigned sb2nov and unassigned ymodak Mar 12, 2019
@ymodak ymodak added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Mar 12, 2019
@sb2nov
Copy link
Contributor
sb2nov commented Mar 13, 2019

@leemengtaiwan

  1. You'll need to connect to remote tpu host when using eager mode so something like
tf.config.experimental_connect_to_host(TPU_ADDRESS)

this needs to happen before you initialize the device.

PS: TPU support in 2.0 is still WIP but we're actively working on that right now.

@huan
Copy link
Contributor
huan commented Apr 15, 2019

Related to #24412

@lvenugopalan lvenugopalan added the TF 2.0 Issues relating to TensorFlow 2.0 label Apr 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:dist-strat Distribution Strategy related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.0 Issues relating to TensorFlow 2.0 type:bug Bug
Projects
None yet
Development

No branches or pull requests

6 participants
0