8000 TPU support is incomplete · Issue #24412 · tensorflow/tensorflow · GitHub
[go: up one dir, main page]

Skip to content

TPU support is incomplete #24412

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
martinwicke opened this issue Dec 18, 2018 · 28 comments
Closed

TPU support is incomplete #24412

martinwicke opened this issue Dec 18, 2018 · 28 comments
Assignees
Labels
comp:tpus tpu, tpuestimator TF 2.1 for tracking issues in 2.1 release type:others issues not falling in bug, perfromance, support, build and install or feature

Comments

@martinwicke
Copy link
Member

TensorFlow version (use command below): 2.0 preview

TPU support is work in progress, and the 2.0 preview does not yet contain a DistributionStrategy for TPU.

This is a tracking issue which will be updated when progress is made on this issue.

@huan
Copy link
Contributor
huan commented Apr 15, 2019

Hello @martinwicke, Thanks for set up this thread for the main tracking issue for TPU support with TF 2.0.

Do we have any plan of ETA now?

@martinwicke
Copy link
Member Author

@jhseu Any timeline you can share?

@jhseu
Copy link
Contributor
jhseu commented Apr 15, 2019

It works right now at master, but we don't have a matching Cloud TPU release. We'll release an official Cloud TPU version alongside TF 2.0 final.

@huan
Copy link
Contributor
huan commented Apr 15, 2019

@jhseu Thanks for letting me know that the master had already worked!

Do we have any code example to show how it works in TF 2.0 with TPU?

A demo with serval lines of core API calls will be enough, thanks!

@jhseu
Copy link
Contributor
jhseu commented Apr 15, 2019

@huan Yeah, there's an example here:
https://www.tensorflow.org/guide/distribute_strategy

You would use TPUStrategy instead of MirroredStrategy.

@thananchaiktw
Copy link
thananchaiktw commented Apr 17, 2019

@jhseu Hi, Is it work on Colab TPU ? I got this error "InvalidArgumentError: /job:tpu_worker/replica:0/task:1/device:CPU:0 unknown device."

@bduclaux
Copy link

@jhseu @ttaee Same problem here.

It seems that there is an issue with the job_name parameter in TPUClusterResolver.
Either 'worker' or 'tpu_worker' don't work when using the TPUStrategy scope() method, in combination with a call to tf.config.experimental_connect_to_host.

I have submitted a bug report at #27992 , but it would be super helful to get a working notebook using TF 2.0 and TPUStrategy on Colab.

@bduclaux
Copy link

The following code generates an exception on Colab with Tensorflow version 2.0.0-dev20190421 when instatiating a basic Keras model with the scope of a TPUStrategy.

ValueError: variable object with name 'cd2c89b7-88b7-44c8-ad83-06c2a9158347' already created. Use get_variable() if reuse is desired.

!pip install --upgrade tensorflow==2.0.0-alpha0
!pip install --upgrade tf-nightly-2.0-preview

import tensorflow as tf
import os
import sys

print("Tensorflow version " + tf.__version__)

TPU_WORKER = 'grpc://' + os.environ['COLAB_TPU_ADDR']
tf.config.experimental_connect_to_host(TPU_WORKER)
resolver = tf.distribute.cluster_resolver.TPUClusterResolver() 
tf.tpu.experimental.initialize_tpu_system(resolver)
strategy = tf.distribute.experimental.TPUStrategy(resolver)
devices=tf.config.experimental_list_devices()
print(*devices,sep="\n")

with strategy.scope():
  model = tf.keras.Sequential([tf.keras.layers.Dense(1, input_shape=(1,))])
  model.compile(loss='mse', optimizer='sgd')

@AntGul
Copy link
AntGul commented May 13, 2019

It would be good to have one working example with TF2.0.

  1. This is a great article, but unfortunately, the code - does not work with TF2.0
    see
    tpu_model = tf.contrib.tpu.keras_to_tpu_model(
    model,
    strategy=tf.contrib.tpu.TPUDistributionStrategy(
    tf.contrib.cluster_resolver.TPUClusterResolver(TPU_ADDRESS)))

Keras support is now deprecated in support of TPU Strategy. Please follow the distribution strategy guide on tensorflow.org to migrate to the 2.0 supported version.

  1. On the other hand the example here on distribution strategy
    does not seem to work either
    as already mentioned above

@jvishnuvardhan jvishnuvardhan added the type:others issues not falling in bug, perfromance, support, build and install or feature label May 31, 2019
@lukemelas
Copy link
lukemelas commented Jun 8, 2019

Given that TF 2.0 beta is now out, is there an update with regard to this issue (about either the status or the timeline of TPU support)?

@huan
Copy link
Contributor
huan commented Jun 8, 2019

@lukemelas +1

A roadmap or ETA would be very helpful for the cloud TPU fans!

@chiayewken
Copy link

With reference to #29550, TPUStrategy in Tensorflow 2.0 Beta has not been working for me.

@huan
Copy link
Contributor
huan commented Aug 28, 2019

Is there any progress for the TPU usability for TF 2.0 RC?

Would love to hear from the TF team about the TPU support plan because I still can not find any news after did a hard search on the internet.

@jhseu
Copy link
Contributor
jhseu commented Aug 28, 2019

Yeah, the gist is that we intend to announce support for TPUStrategy alongside TensorFlow 2.1. TensorFlow 2.0 will work under limited use-cases but has many improvements (bug fixes, performance improvements) that we're including in TensorFlow 2.1, so we don't consider it ready yet.

We have some examples of usage here:
Custom training loop: https://github.com/tensorflow/tpu/blob/master/models/experimental/resnet50_keras/resnet50_ctl_tf2.py
Keras compile/fit:
https://github.com/tensorflow/tpu/blob/master/models/experimental/resnet50_keras/resnet50_tf2.py

@huan
Copy link
Contributor
huan commented Aug 29, 2019

@jhseu Thank you very much for the update information and the examples, it is great to know that we will officially announce support for TPUStrategy with TF 2.1!

I had just explored the examples you provided, it's great, and it will be greater if we could have a Colab to demonstrate those codes in notebook online because the Colab will be easy to get started.

Do we have any Colab Notebook to demo the TF 2.0 with TPU right now?

@giannisdaras
Copy link
giannisdaras commented Sep 18, 2019

Hey everyone, I can confirm that I was able to train a custom keras model on TPUs, using tf-nightly-gpu-preview-2.0. and the tf.distribute.experimental.TPUStrategy.
I faced a lot of issues in the process:

  1. I had to wrap my model with the Sequential API, otherwise I was getting an error stating that functional model was expected, but got subclass Model instead.
  2. The slicing operation does not seem to work on TPUs. My output layer needs to return the first token out of a sequence of tokens, so I was basically doing something like that: return x[:, 0, :]. However, this returns the strange error: FetchOutputs node strided_slice/stack:0: not found. Moving this operation to the loss function resolved the issue, but I can't understand why is that the case.
  3. I needed to build the model before compiling it, however that was not the case when training on GPUs.

I am wondering whether the aforementioned barriers are because of me not understanding fully the documentation of training on TPUs with Tensorflow 2, or because of actual bugs that would be resolved when TF 2.1 will become available. At any case, kudos for your excellent work with TF2 and I truly hope that TPUs training will be supported fully soon.

@EoinKenny
Copy link

Is there any super simple example of using a TPU in Google colab? It seems so hard to get going in comparison to a GPU, I've been using a tonne of code online trying to get a pipeline that works, with no luck.

@jhorowitz
Copy link

@EoinKenny This is not specifically for 2.0 but it's a TPU in colab tutorial. https://colab.research.google.com/github/tensorflow/tpu/blob/master/tools/colab/custom_training.ipynb

@goldiegadde goldiegadde added TF 2.0 Issues relating to TensorFlow 2.0 TF 2.1 for tracking issues in 2.1 release and removed TF 2.0 Issues relating to TensorFlow 2.0 labels Oct 9, 2019
@qo4on
Copy link
qo4on commented Nov 8, 2019

Almost a year has passed. Is there any update about running TF 2.x on TPU?

@weiweitoo
Copy link

Same question. Was thinking to run TF2.0 on Colab TPU.

@martinwicke
Copy link
Member Author
martinwicke commented Nov 8, 2019 via email

@qo4on
8000 Copy link
qo4on commented Nov 8, 2019

We're targeting this for 2.1.

Are you going to update your tutorials https://cloud.google.com/tpu/docs/colabs with 2.1 release?

@exelents
Copy link
exelents commented Jan 7, 2020

Hello. I would like to run my model on TPU. As I see here, TPUs from Colab doesn't work with TF2. Can I use for this TPUs from Google Cloud services?

@fhaase2
Copy link
fhaase2 commented Jan 9, 2020

According to the release notes of v2.1.0:

Experimental support for Keras .compile, .fit, .evaluate, and .predict is available for Cloud TPUs, Cloud TPU, for all types of Keras models (sequential, functional and subclassing models).

So i guess this issue is kind of solved?

@dcavaller
Copy link

Wath is work with tensorfloat in Jupiter Lab?

@dcavaller
Copy link

I´m work in conda

@dcavaller
Copy link

Thanks and regards for alll

@rxsang
Copy link
Member
rxsang commented Jan 21, 2020

Yes, TPUs work with TF2.1 now. The guide can be found here https://www.tensorflow.org/guide/tpu

@rxsang rxsang closed this as completed Jan 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:tpus tpu, tpuestimator TF 2.1 for tracking issues in 2.1 release type:others issues not falling in bug, perfromance, support, build and install or feature
Projects
None yet
Development

No branches or pull requests

0