8000 [TF 2.0] Issue with TPUStrategy / initialize_tpu_system · Issue #27992 · tensorflow/tensorflow · GitHub
[go: up one dir, main page]

Skip to content

[TF 2.0] Issue with TPUStrategy / initialize_tpu_system #27992

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for Git 8000 Hub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bduclaux opened this issue Apr 19, 2019 · 6 comments
Closed

[TF 2.0] Issue with TPUStrategy / initialize_tpu_system #27992

bduclaux opened this issue Apr 19, 2019 · 6 comments
Assignees
Labels
comp:dist-strat Distribution Strategy related issues comp:tpus tpu, tpuestimator TF 2.0 Issues relating to TensorFlow 2.0 type:bug Bug

Comments

@bduclaux
Copy link

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Colab
  • TensorFlow installed from (source or binary): !pip install tensorflow-gpu==2.0.0-alpha0
  • TensorFlow version (use command below): 2.0.0-alpha0
  • Python version: 3.6.7 (Colab)

Describe the current behavior

Error occurs when trying to instantiate a simple Keras model running on TPU on Colab, using TPUStrategy.

It seems that there is an internal problem regarding the worker name:

  • if WORKER_NAME is set to 'worker', then an exception is raised during the call to initialize_tpu_system(): "/job:tpu_worker/replica:0/task:1/device:CPU:0 unknown device."

  • if WORKER_NAME is set to 'tpu_worker', the strategy is properly initialized, but another exception is raised later when creating the Keras model: "Error copying tensor to device: /job:worker/replica:0/task:0/device:TPU:0"

I have read issue #26513 to place a call to experimental_connect_to_host() before calling initialize_tpu_system(), but it does not help.

!pip install tensorflow-gpu==2.0.0-alpha0

import tensorflow as tf
import os
import sys

TPU_WORKER = 'grpc://' + os.environ['COLAB_TPU_ADDR']
WORKER_NAME='tpu_worker'

tf.config.experimental_connect_to_host(TPU_WORKER, WORKER_NAME) 

resolver = tf.distribute.cluster_resolver.TPUClusterResolver(TPU_WORKER)
tf.tpu.experimental.initialize_tpu_system(resolver)
strategy = tf.distribute.experimental.TPUStrategy(resolver)
devices=tf.config.experimental_list_devices()
print(*devices,sep="\n")

with strategy.scope():
  
  model = tf.keras.Sequential([
      tf.keras.layers.Conv2D(32, 3, activation='relu', input_shape=(28, 28, 1)),
      tf.keras.layers.MaxPooling2D(),
      tf.keras.layers.Flatten(),
      tf.keras.layers.Dense(64, activation='relu'),
      tf.keras.layers.Dense(10, activation='softmax')
  ])

  model.compile(loss='sparse_categorical_crossentropy',
                optimizer=tf.keras.optimizers.Adam(),
                metrics=['accuracy'])

Describe the expected behavior

Model should be properly instantiated.

Code to reproduce the issue

See above.

Other info / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

Exception when WORKER_NAME='worker':

InvalidArgumentError                      Traceback (most recent call last)

<ipython-input-3-2725ffed0ef5> in <module>()
     10 
     11 resolver = tf.distribute.cluster_resolver.TPUClusterResolver(TPU_WORKER)
---> 12 tf.tpu.experimental.initialize_tpu_system(resolver)
     13 strategy = tf.distribute.experimental.TPUStrategy(resolver)
     14 devices=tf.config.experimental_list_devices()

/usr/local/lib/python3.6/dist-packages/tensorflow/python/tpu/tpu_strategy_util.py in initialize_tpu_system(cluster_resolver)
     91     with ops.device(get_first_tpu_host_device(cluster_resolver)):
     92       output = tpu_functional_ops.TPUPartitionedCall(
---> 93           args=[], device_ordinal=0, Tout=[dtypes.string], f=func_name)
     94     serialized_topology = output[0].numpy()
     95   else:

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_tpu_ops.py in tpu_partitioned_call(args, device_ordinal, Tout, f, name)
   5606       else:
   5607         message = e.message
-> 5608       _six.raise_from(_core._status_to_exception(e.code, message), None)
   5609   # Add nodes to the TensorFlow graph.
   5610   if not isinstance(Tout, (list, tuple)):

/usr/local/lib/python3.6/dist-packages/six.py in raise_from(value, from_value)

InvalidArgumentError: /job:tpu_worker/replica:0/task:1/device:CPU:0 unknown device.

Exception when WORKER_NAME='tpu_worker':

RuntimeError                              Traceback (most recent call last)

<ipython-input-4-4fe0e3235d99> in <module>()
     22       tf.keras.layers.Flatten(),
     23       tf.keras.layers.Dense(64, activation='relu'),
---> 24       tf.keras.layers.Dense(10, activation='softmax')
     25   ])
     26 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/tracking/base.py in _method_wrapper(self, *args, **kwargs)
    454     self._setattr_tracking = False  # pylint: disable=protected-access
    455     try:
--> 456       result = method(self, *args, **kwargs)
    457     finally:
    458       self._setattr_tracking = previous_value  # pylint: disable=protected-access

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/sequential.py in __init__(self, layers, name)
    106     if layers:
    107       for layer in layers:
--> 108         self.add(layer)
    109 
    110   @property

/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/tracking/base.py in _method_wrapper(self, *args, **kwargs)
    454     self._setattr_tracking = False  # pylint: disable=protected-access
    455     try:
--> 456       result = method(self, *args, **kwargs)
    457     finally:
    458       self._setattr_tracking = previous_value  # pylint: disable=protected-access

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/sequential.py in add(self, layer)
    167           # and create the node connecting the current layer
    168           # to the input layer we just created.
--> 169           layer(x)
    170           set_inputs = True
    171 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py in __call__(self, inputs, *args, **kwargs)
    592           # Build layer if applicable (if the `build` method has been
    593           # overridden).
--> 594           self._maybe_build(inputs)
    595           # Explicitly pass the learning phase placeholder to `call` if
    596           # the `training` argument was left unspecified by the user.

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py in _maybe_build(self, inputs)
   1711     # Only call `build` if the user has manually overridden the build method.
   1712     if not hasattr(self.build, '_is_default'):
-> 1713       self.build(input_shapes)
   1714     # We must set self.built since user defined build functions are not
   1715     # constrained to set self.built.

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/layers/convolutional.py in build(self, input_shape)
    163         constraint=self.kernel_constraint,
    164         trainable=True,
--> 165         dtype=self.dtype)
    166     if self.use_bias:
    167       self.bias = self.add_weight(

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py in add_weight(self, name, shape, dtype, initializer, regularizer, trainable, constraint, partitioner, use_resource, synchronization, aggregation, **kwargs)
    375         collections=collections,
    376         synchronization=synchronization,
--> 377         aggregation=aggregation)
    378     backend.track_variable(variable)
    379 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/tracking/base.py in _add_variable_with_custom_getter(self, name, shape, dtype, initializer, getter, overwrite, **kwargs_for_getter)
    620     new_variable = getter(
    621         name=name, shape=shape, dtype=dtype, initializer=initializer,
--> 622         **kwargs_for_getter)
    623 
    624     # If we set an initializer and the variable processed it, tracking will not

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer_utils.py in make_variable(name, shape, dtype, initializer, trainable, caching_device, validate_shape, constraint, use_resource, collections, synchronization, aggregation, partitioner)
    150       collections=collections,
    151       synchronization=synchronization,
--> 152       aggregation=aggregation)
    153   return v
    154 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variables.py in __call__(cls, *args, **kwargs)
    210   def __call__(cls, *args, **kwargs):
    211     if cls is VariableV1:
--> 212       return cls._variable_v1_call(*args, **kwargs)
    213     elif cls is Variable:
    214       return cls._variable_v2_call(*args, **kwargs)

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variables.py in _variable_v1_call(cls, initial_value, trainable, collections, validate_shape, caching_device, name, variable_def, dtype, expected_shape, import_scope, constraint, use_resource, synchronization, aggregation)
    173         use_resource=use_resource,
    174         synchronization=synchronization,
--> 175         aggregation=aggregation)
    176 
    177   def _variable_v2_call(cls,

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variables.py in getter(**kwargs)
     56   """To avoid capturing loop variables."""
     57   def getter(**kwargs):
---> 58     return captured_getter(captured_previous, **kwargs)
     59   return getter
     60 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py in creator_with_resource_vars(*args, **kwargs)
    821       kwargs["use_resource"] = True
    822       kwargs["distribute_strategy"] = strategy
--> 823       return self._create_variable(*args, **kwargs)
    824 
    825     def distributed_getter(getter, *args, **kwargs):

/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/tpu_strategy.py in _create_variable(self, next_creator, *args, **kwargs)
    439     return _create_tpu_mirrored_variable(
    440         self._container_strategy(), device_map, logical_device,
--> 441         _real_mirrored_creator, *args, **kwargs)
    442 
    443   def _reduce_to(self, reduce_op, value, destinations):

/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/tpu_strategy.py in _create_tpu_mirrored_variable(strategy, device_map, logical_device, real_mirrored_creator, *args, **kwargs)
    101   with tape.stop_recording():
    102     devices = device_map.logical_to_actual_devices(logical_device)
--> 103     value_list = real_mirrored_creator(devices, *args, **kwargs)
    104     result = values.TPUMirroredVariable(
    105         strategy, device_map, value_list, aggregation,

/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/tpu_strategy.py in _real_mirrored_creator(devices, *args, **kwargs)
    432               kwargs["initial_value"] = initial_value_fn
    433           with context.device_policy(context.DEVICE_PLACEMENT_SILENT):
--> 434             v = next_creator(*args, **kwargs)
    435           assert not isinstance(v, values.TPUMirroredVariable)
    436           value_list.append(v)

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variables.py in <lambda>(**kwargs)
    152                         aggregation=VariableAggregation.NONE):
    153     """Call on Variable class. Useful to force the signature."""
--> 154     previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
    155     for _, getter in ops.get_default_graph()._variable_creator_stack:  # pylint: disable=protected-access
    156       previous_getter = _make_getter(getter, previous_getter)

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variable_scope.py in default_variable_creator(next_creator, **kwargs)
   2490         caching_device=caching_device, name=name, dtype=dtype,
   2491         constraint=constraint, variable_def=variable_def,
-> 2492         import_scope=import_scope, distribute_strategy=distribute_strategy)
   2493   else:
   2494     return variables.RefVariable(

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variables.py in __call__(cls, *args, **kwargs)
    214       return cls._variable_v2_call(*args, **kwargs)
    215     else:
--> 216       return super(VariableMetaclass, cls).__call__(*args, **kwargs)
    217 
    218 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/resource_variable_ops.py in __init__(self, initial_value, trainable, collections, validate_shape, caching_device, name, dtype, variable_def, import_scope, constraint, distribute_strategy)
    420           name=name,
    421           dtype=dtype,
--> 422           constraint=constraint)
    423 
    424   def __repr__(self):

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/resource_variable_ops.py in _init_from_args(self, initial_value, trainable, collections, caching_device, name, dtype, constraint)
    543           with ops.name_scope("Initializer"), device_context_manager(None):
    544             initial_value = ops.convert_to_tensor(
--> 545                 initial_value() if init_from_fn else initial_value,
    546                 name="initial_value", dtype=dtype)
    547           self._handle = eager_safe_variable_handle(

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer_utils.py in <lambda>()
    132           (type(init_ops.Initializer), type(init_ops_v2.Initializer))):
    133         initializer = initializer()
--> 134       init_val = lambda: initializer(shape, dtype=dtype)
    135       variable_dtype = dtype.base_dtype
    136   if use_resource is None:

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/init_ops_v2.py in __call__(self, shape, dtype)
    432     else:
    433       limit = math.sqrt(3.0 * scale)
--> 434       return self._random_generator.random_uniform(shape, -limit, limit, dtype)
    435 
    436   def get_config(self):

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/init_ops_v2.py in random_uniform(self, shape, minval, maxval, dtype)
    795       op = random_ops.random_uniform
    796     return op(
--> 797         shape=shape, minval=minval, maxval=maxval, dtype=dtype, seed=self.seed)
    798 
    799   def truncated_normal(self, shape, mean, stddev, dtype):

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/random_ops.py in random_uniform(shape, minval, maxval, dtype, seed, name)
    238   with ops.name_scope(name, "random_uniform", [shape, minval, maxval]) as name:
    239     shape = _ShapeTensor(shape)
--> 240     minval = ops.convert_to_tensor(minval, dtype=dtype, name="min")
    241     maxval = ops.convert_to_tensor(maxval, dtype=dtype, name="max")
    242     seed1, seed2 = random_seed.get_seed(seed)

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py in convert_to_tensor(value, dtype, name, preferred_dtype, dtype_hint)
   1048   preferred_dtype = deprecation.deprecated_argument_lookup(
   1049       "dtype_hint", dtype_hint, "preferred_dtype", preferred_dtype)
-> 1050   return convert_to_tensor_v2(value, dtype, preferred_dtype, name)
   1051 
   1052 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py in convert_to_tensor_v2(value, dtype, dtype_hint, name)
   1106       name=name,
   1107       preferred_dtype=dtype_hint,
-> 1108       as_ref=False)
   1109 
   1110 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py in internal_convert_to_tensor(value, dtype, name, as_ref, preferred_dtype, ctx, accept_symbolic_tensors)
   1184 
   1185     if ret is None:
-> 1186       ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
   1187 
   1188     if ret is NotImplemented:

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py in _constant_tensor_conversion_function(v, dtype, name, as_ref)
    302                                          as_ref=False):
    303   _ = as_ref
--> 304   return constant(v, dtype=dtype, name=name)
    305 
    306 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py in constant(value, dtype, shape, name)
    243   """
    244   return _constant_impl(value, dtype, shape, name, verify_shape=False,
--> 245                         allow_broadcast=True)
    246 
    247 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py in _constant_impl(value, dtype, shape, name, verify_shape, allow_broadcast)
    251   ctx = context.context()
    252   if ctx.executing_eagerly():
--> 253     t = convert_to_eager_tensor(value, ctx, dtype)
    254     if shape is None:
    255       return t

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py in convert_to_eager_tensor(value, ctx, dtype)
    108       return ops.EagerTensor(
    109           value, handle, device, dtype, tensor)
--> 110     t = ops.EagerTensor(value, handle, device, dtype)
    111     scalar_cache[cache_key] = t
    112     return t

RuntimeError: Error copying tensor to device: /job:worker/replica:0/task:0/device:TPU:0. /job:worker/replica:0/task:0/device:TPU:0 unknown device.
@bduclaux
Copy link
Author

Please note that with tf-nightly-2.0-preview from 04/19, the above exceptions seem to be fixed, but another one comes when instantiating the Keras model. This one may be related to the worker name problem from above.

variable object with name 'cd2c89b7-88b7-44c8-ad83-06c2a9158347' already created. Use get_variable() if reuse is desired."
---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-1-ed7fbebf6586> in <module>()
     23       tf.keras.layers.Flatten(),
     24       tf.keras.layers.Dense(64, activation='relu'),
---> 25       tf.keras.layers.Dense(10, activation='softmax')
     26   ])
     27 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/tracking/base.py in _method_wrapper(self, *args, **kwargs)
    458     self._self_setattr_tracking = False  # pylint: disable=protected-access
    459     try:
--> 460       result = method(self, *args, **kwargs)
    461     finally:
    462       self._self_setattr_tracking = previous_value  # pylint: disable=protected-access

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/sequential.py in __init__(self, layers, name)
    106     if layers:
    107       for layer in layers:
--> 108         self.add(layer)
    109 
    110   @property

/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/tracking/base.py in _method_wrapper(self, *args, **kwargs)
    458     self._self_setattr_tracking = False  # pylint: disable=protected-access
    459     try:
--> 460       result = method(self, *args, **kwargs)
    461     finally:
    462       self._self_setattr_tracking = previous_value  # pylint: disable=protected-access

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/sequential.py in add(self, layer)
    167           # and create the node connecting the current layer
    168           # to the input layer we just created.
--> 169           layer(x)
    170           set_inputs = True
    171 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py in __call__(self, inputs, *args, **kwargs)
    591           # Build layer if applicable (if the `build` method has been
    592           # overridden).
--> 593           self._maybe_buil
8000
d(inputs)
    594           # Explicitly pass the learning phase placeholder to `call` if
    595           # the `training` argument was left unspecified by the user.

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py in _maybe_build(self, inputs)
   1822     # Only call `build` if the user has manually overridden the build method.
   1823     if not hasattr(self.build, '_is_default'):
-> 1824       self.build(input_shapes)
   1825     # We must set self.built since user defined build functions are not
   1826     # constrained to set self.built.

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/layers/convolutional.py in build(self, input_shape)
    172           constraint=self.bias_constraint,
    173           trainable=True,
--> 174           dtype=self.dtype)
    175     else:
    176       self.bias = None

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py in add_weight(self, name, shape, dtype, initializer, regularizer, trainable, constraint, partitioner, use_resource, synchronization, aggregation, **kwargs)
    381         collections=collections,
    382         synchronization=synchronization,
--> 383         aggregation=aggregation)
    384     backend.track_variable(variable)
    385 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/tracking/base.py in _add_variable_with_custom_getter(self, name, shape, dtype, initializer, getter, overwrite, **kwargs_for_getter)
    664         dtype=dtype,
    665         initializer=initializer,
--> 666         **kwargs_for_getter)
    667 
    668     # If we set an initializer and the variable processed it, tracking will not

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer_utils.py in make_variable(name, shape, dtype, initializer, trainable, caching_device, validate_shape, constraint, use_resource, collections, synchronization, aggregation, partitioner)
    152       collections=collections,
    153       synchronization=synchronization,
--> 154       aggregation=aggregation)
    155   return v
    156 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variables.py in __call__(cls, *args, **kwargs)
    252   def __call__(cls, *args, **kwargs):
    253     if cls is VariableV1:
--> 254       return cls._variable_v1_call(*args, **kwargs)
    255     elif cls is Variable:
    256       return cls._variable_v2_call(*args, **kwargs)

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variables.py in _variable_v1_call(cls, initial_value, trainable, collections, validate_shape, caching_device, name, variable_def, dtype, expected_shape, import_scope, constraint, use_resource, synchronization, aggregation)
    215         use_resource=use_resource,
    216         synchronization=synchronization,
--> 217         aggregation=aggregation)
    218 
    219   def _variable_v2_call(cls,

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variables.py in getter(**kwargs)
     56   """To avoid capturing loop variables."""
     57   def getter(**kwargs):
---> 58     return captured_getter(captured_previous, **kwargs)
     59   return getter
     60 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py in creator_with_resource_vars(*args, **kwargs)
   1075       kwargs["use_resource"] = True
   1076       kwargs["distribute_strategy"] = strategy
-> 1077       return self._create_variable(*args, **kwargs)
   1078 
   1079     def distributed_getter(getter, *args, **kwargs):

/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/tpu_strategy.py in _create_variable(self, next_creator, *args, **kwargs)
    486     return _create_tpu_mirrored_variable(
    487         self._container_strategy(), device_map, logical_device,
--> 488         _real_mirrored_creator, *args, **kwargs)
    489 
    490   def _reduce_to(self, reduce_op, value, destinations):

/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/tpu_strategy.py in _create_tpu_mirrored_variable(strategy, device_map, logical_device, real_mirrored_creator, *args, **kwargs)
    100   with tape.stop_recording():
    101     devices = device_map.logical_to_actual_devices(logical_device)
--> 102     value_list = real_mirrored_creator(devices, *args, **kwargs)
    103     result = values.TPUMirroredVariable(
    104         strategy, device_map, value_list, aggregation,

/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/tpu_strategy.py in _real_mirrored_creator(devices, *args, **kwargs)
    479             kwargs["initial_value"] = initial_value_fn
    480           with context.device_policy(context.DEVICE_PLACEMENT_SILENT):
--> 481             v = next_creator(*args, **kwargs)
    482           assert not isinstance(v, values.TPUMirroredVariable)
    483           value_list.append(v)

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variables.py in <lambda>(**kwargs)
    194                         aggregation=VariableAggregation.NONE):
    195     """Call on Variable class. Useful to force the signature."""
--> 196     previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
    197     for _, getter in ops.get_default_graph()._variable_creator_stack:  # pylint: disable=protected-access
    198       previous_getter = _make_getter(getter, previous_getter)

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variable_scope.py in default_variable_creator(next_creator, **kwargs)
   2491         distribute_strategy=distribute_strategy,
   2492         synchronization=synchronization,
-> 2493         aggregation=aggregation)
   2494   else:
   2495     return variables.RefVariable(

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variables.py in __call__(cls, *args, **kwargs)
    256       return cls._variable_v2_call(*args, **kwargs)
    257     else:
--> 258       return super(VariableMetaclass, cls).__call__(*args, **kwargs)
    259 
    260 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/resource_variable_ops.py in __init__(self, initial_value, trainable, collections, validate_shape, caching_device, name, dtype, variable_def, import_scope, constraint, distribute_strategy, synchronization, aggregation)
    441           constraint=constraint,
    442           synchronization=synchronization,
--> 443           aggregation=aggregation)
    444 
    445   def __repr__(self):

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/resource_variable_ops.py in _init_from_args(self, initial_value, trainable, collections, caching_device, name, dtype, constraint, synchronization, aggregation)
    585               shared_name=shared_name,
    586               name=name,
--> 587               graph_mode=self._in_graph_mode)
    588         self._shape = initial_value.shape
    589         # pylint: disable=protected-access

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/resource_variable_ops.py in eager_safe_variable_handle(initial_value, shared_name, name, graph_mode)
    192       raise ValueError("variable object with name '%s' already created. Use "
    193                        "get_variable() if reuse is desired." %
--> 194                        shared_name)
    195     with context.graph_mode(), ops.Graph().as_default() as graph:
    196       h = gen_resource_variable_ops.var_handle_op(shape=shape, dtype=dtype,

ValueError: variable object with name 'cd2c89b7-88b7-44c8-ad83-06c2a9158347' already created. Use get_variable() if reuse is desired.

@gadagashwini-zz gadagashwini-zz self-assigned this Apr 22, 2019
@gadagashwini-zz gadagashwini-zz added 2.0.0-alpha0 comp:tpus tpu, tpuestimator comp:dist-strat Distribution Strategy related issues type:bug Bug labels Apr 22, 2019
@javicivit
Copy link

Isn't this just the lack of current support for TPUs in TF 2.0 in colab? See TPU training with Keras API raises error in Tensorflow 2.0 #27339

@bduclaux
Copy link
Author

You are right. Thanks for mentioning issue #27339.
Waiting for the 2.0 release !

@ymodak
Copy link
Contributor
ymodak commented May 1, 2019

#27339 (comment)
Thus closing this issue for now. Thanks!

@ymodak ymodak closed this as completed May 1, 2019
@tensorflow-bot
Copy link
tensorflow-bot bot commented May 1, 2019

Are you satisfied with the resolution of your issue?
Yes
No

@anki-xyz
Copy link
anki-xyz commented Oct 9, 2019

I have a similar exception when using colab and tf=1.15.0-rc3. Downgrading to tf=1.14.0 helped, and my code runs as expected. So it is not only tf=2.0.x

@lvenugopalan lvenugopalan added the TF 2.0 Issues relating to TensorFlow 2.0 label Apr 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:dist-strat Distribution Strategy related issues comp:tpus tpu, tpuestimator TF 2.0 Issues relating to TensorFlow 2.0 type:bug Bug
Projects
None yet
Development

No branches or pull requests

6 participants
0