1)Is this code compatible with Cloud TPUs? What about GPUs?

  • Yes, all of the code in this repository works out-of-the-box with CPU, GPU, and Cloud TPU. However, GPU training is single-GPU only.

    2)I am getting out-of-memory errors, what is wrong?

  • See the section on out-of-memory issues:

  • There is no official PyTorch implementation.

  • However, NLP researchers from HuggingFace made a PyTorch version of BERT available(which is compatible with our pre-trained checkpoints and is able to reproduce our results. ):
  • We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository.

    4)Is there a Chainer version available?

  • There is no official Chainer implementation.

  • However, Sosuke Kobayashi made a Chainer version of BERT available(which is compatible with our pre-trained checkpoints and is able to reproduce our results.):
  • We were not involved in the creation or maintenance of the Chainer implementation so please direct any questions towards the authors of that repository.

    5)Will models larger than BERT-Large be released?

  • So far we have not attempted to train anything larger than BERT-Large. It is possible that we will release larger models if we are able to obtain significant improvements.

    6)What license is this library released under?

  • All code and models are released under the Apache 2.0 license. See the LICENSE file for more information.