Q&A Session with Ryan Sepassi on Tensor2Tensor

Webinar by Sven Chmielewski on 23 May 2018

Our webinar series Coffee with Clusterone features talks from researchers and industry leaders in deep learning and data science.

This time, Ryan Sepassi from the Google Brain team talked about Tensor2Tensor, a library for deep learning models and datasets created to accelerate machine learning research.


You can download the slides from the webinar here.

Part of Coffee with Clusterone is a question and answer session with our speaker. We have collected all questions that were raised during the webinar. Below are the questions, along with Ryan's answers. Enjoy!

Q: Does Tensor2Tensor offer mechanisms for transfer learning between problems of different shape? How can Tensor2Tensor simplify transfer and multi-problem training?

A: We don’t support transfer learning out of the box. But you always have full control over everything that’s happening in the framework. We built Tensor2Tensor so that you can use the premade models and problems, but you can also overwrite the entire input pipeline for problems or the entire model function for models.

If you have a pre-trained checkpoint that you want to continue training from, you could put the checkpoint in the output directory and the training will continue from that point.

More advanced transfer learning cases are of course possible as well, but you would have to wire them up yourself.

Q: Does Tensor2Tensor support unsupervised learning and what models are available?

A: Language modeling is an unsupervised learning task where you provide sequences and the model learns the sequence.

There is also a basic generative adversarial network if you want to do GAN training for images. We also have the ImageTransformer, which can be used for unsupervised learning of images. There, you provide images and they get modeled as a sequence task, similar to PixelCNN.

The transformer can be used as an unsupervised model for language modeling. In this case, there’s only a decoder - no encoder - and the network is just learning to predict the next token. You can provide unlabeled text and it will learn to label the text. Image transformers work the same way, only they learn to model the pixels in images instead of text.

Q: How can I create new embeddings in Tensor2Tensor?

A: You could, for example, train a language modeling task and then use the embedding variable as the embeddings for tokens. Or you could do an unsupervised sequence-to-sequence autoencoder and use the encoder output as sentence embeddings.

Q: What platform is the demo in the webinar running on? What are alternative options?

A: The webinar demo was shown on Colab, which is Google’s version of Jupyter notebooks for machine learning. Jupyter notebooks are very useful for fast prototyping and trying things out. You can use a local notebook too. That’s all for fast interactive usage; most of the time though you’ll probably be using the CLI (e.g. t2t-trainer and t2t-decoder).

Q: Any comments about the roadmap for Transformer2D, especially image decoding?

A: It’s an active research area so stay tuned for further publications. You can read the ImageTransformer code and go from there to make your own version.

Q: Can you use pre-trained embeddings like glove or word2vec in Tensor2Tensor?

A: It’s not currently wired up to make that easy, but it's certainly possible if you're willing to wire it up yourself by overriding the model’s estimator_model_fn and doing whatever needs to be done (likely through some hooks returned to the estimator).

Q: You showed training and then evaluation in the MNIST demo. Is it possible to interleave the two?

A: Yes, by default t2t-trainer will switch between training and evaluation. This is controlled through the --schedule flag (train will be only training, continuous_train_and_eval - the default - will switch periodically between training and evaluation).

Q: When training a translation algorithm, is it possible to use a custom vocabulary file? How can that be done?

A: Yes, it is. See the Text2TextProblem base class in text_problems.py, which is a parent class for all translation problems. You can set vocab_type to TOKEN where you provide your own vocabulary file. Just make sure that it has the right name and that it reserves the IDs 0 and 1 for PAD and EOS respectively.

Q: What other metrics do you support for language generation tasks in Tensor2Tensor apart from BLUE?

A: The simplest metric for language generation or any other symbol generation task would be the perplexity metric, which is supported in Tensor2Tensor (see metrics.py for more info). It gives you a measure of the probability of a sequence of symbols as measured by the probabilities that were put out by the softmax at each time step. The goal is then to maximize these probabilities.

Q: How can you define your own metrics in Tensor2Tensor?

A: Take a look at metrics.py in Tensor2Tensor. That’s where all the metrics are defined and you can easily see what the API looks like. You can add a metric right there either for your own model. We’re also always open for new contributions to the project.

If you just want to use the metric in your own model, the T2TModel class has a method called estimator_spec_eval(), which returns all of the metrics that the trainer should run during evaluation. There, you can add whatever you like. To see what the API looks like, just take a look at any of the other metrics.

Q: Does Tensor2Tensor support QA tasks? More specifically, does t2t_decode work with QA & language modeling tasks?

A: Yes, we do support QA tasks and recently added the bAbI datasets to Tensor2Tensor. See the QuestionAndContext2TextProblem base class in text_problems.py.

Q: Does Tensor2Tensor contain problems regarding voice recognition and processing?

A: Yes it does. We have the Librispeech problem for speech-to-text (or reverse it to get text-to-speech) and a tutorial to train it with Transformer.

Q: How can I create new models and how can I contribute them to the project?

A: The model definition looks exactly the same, no matter if you want to create a model for yourself or contribute it to the library.

If you create a model that’s been published on, we’d be very happy to take that as a pull request to the Tensor2Tensor library. All the models are defined in tensor2tensor/models in the Github repository.

All models are derived from the t2t_model.T2TModel class. See the shake_shake model implementation for an example. There, you can find the definition of the model, as well as an example of how hyperparameters are defined in Tensor2Tensor.

If you just want to use a model for yourself, there’s a command line flag called t2t_usr_dir. It allows you to define your own models and problems in a local directory. Take a look at the Tensor2Tensor documentation for more information.

We hope you enjoyed this Q&A session with Ryan Sepassi. If you are interested in joining one of our upcoming webinars, please see here for more information and to register.