-
Notifications
You must be signed in to change notification settings - Fork 74.7k
[TF 2.0] Issue with TPUStrategy / initialize_tpu_system #27992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for Git 8000 Hub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Please note that with tf-nightly-2.0-preview from 04/19, the above exceptions seem to be fixed, but another one comes when instantiating the Keras model. This one may be related to the worker name problem from above.
|
Isn't this just the lack of current support for TPUs in TF 2.0 in colab? See TPU training with Keras API raises error in Tensorflow 2.0 #27339 |
You are right. Thanks for mentioning issue #27339. |
#27339 (comment) |
I have a similar exception when using colab and tf=1.15.0-rc3. Downgrading to tf=1.14.0 helped, and my code runs as expected. So it is not only tf=2.0.x |
System information
Describe the current behavior
Error occurs when trying to instantiate a simple Keras model running on TPU on Colab, using TPUStrategy.
It seems that there is an internal problem regarding the worker name:
if WORKER_NAME is set to 'worker', then an exception is raised during the call to initialize_tpu_system(): "/job:tpu_worker/replica:0/task:1/device:CPU:0 unknown device."
if WORKER_NAME is set to 'tpu_worker', the strategy is properly initialized, but another exception is raised later when creating the Keras model: "Error copying tensor to device: /job:worker/replica:0/task:0/device:TPU:0"
I have read issue #26513 to place a call to experimental_connect_to_host() before calling initialize_tpu_system(), but it does not help.
Describe the expected behavior
Model should be properly instantiated.
Code to reproduce the issue
See above.
Other info / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.
Exception when WORKER_NAME='worker':
Exception when WORKER_NAME='tpu_worker':
The text was updated successfully, but these errors were encountered: