IndoBERT is the Indonesian version of BERT model. We train the model using over 220M words, aggregated from three main sources:
We trained the model for 2.4M steps (180 epochs) with the final perplexity over the development set being 3.97 (similar to English BERT-base).
pip install transformers==3.5.1 from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("indolem/indobert-base-uncased") model = AutoModel.from_pretrained("indolem/indobert-base-uncased")