1)Is this code compatible with Cloud TPUs? What about GPUs?
Yes, all of the code in this repository works out-of-the-box with CPU, GPU, and Cloud TPU. However, GPU training is single-GPU only.
2)I am getting out-of-memory errors, what is wrong?
See the section on out-of-memory issues:
- https://github.com/google-research/bert#out-of-memory-issues
3)Is there a PyTorch version available?
- https://github.com/google-research/bert#out-of-memory-issues
There is no official PyTorch implementation.
- However, NLP researchers from HuggingFace made a PyTorch version of BERT available(which is compatible with our pre-trained checkpoints and is able to reproduce our results. ):
We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository.
4)Is there a Chainer version available?
There is no official Chainer implementation.
- However, Sosuke Kobayashi made a Chainer version of BERT available(which is compatible with our pre-trained checkpoints and is able to reproduce our results.):
We were not involved in the creation or maintenance of the Chainer implementation so please direct any questions towards the authors of that repository.
5)Will models larger than BERT-Large be released?
So far we have not attempted to train anything larger than
BERT-Large
. It is possible that we will release larger models if we are able to obtain significant improvements.6)What license is this library released under?
All code and models are released under the Apache 2.0 license. See the LICENSE file for more information.