Commit
·
db7bf93
1
Parent(s):
f9d785f
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,7 +1,6 @@
|
|
| 1 |
-
|
| 2 |
# Ancient Greek BERT
|
| 3 |
|
| 4 |
-
|
| 5 |
|
| 6 |
The first and only available Ancient Greek sub-word BERT model!
|
| 7 |
|
|
@@ -15,13 +14,27 @@ Please refer to our paper titled: "A Pilot Study for BERT Language Modelling and
|
|
| 15 |
|
| 16 |
## How to use
|
| 17 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
Can be directly used from the HuggingFace Model Hub with:
|
| 19 |
|
|
|
|
| 20 |
```python
|
| 21 |
from transformers import AutoTokenizer, AutoModel
|
| 22 |
tokeniser = AutoTokenizer.from_pretrained("pranaydeeps/Ancient-Greek-BERT")
|
| 23 |
model = AutoModel.from_pretrained("pranaydeeps/Ancient-Greek-BERT")
|
| 24 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
## Training data
|
| 26 |
|
| 27 |
The model was initialised from [AUEB NLP Group's Greek BERT](https://huggingface.co/nlpaueb/bert-base-greek-uncased-v1)
|
|
@@ -31,7 +44,18 @@ Gorman's Treebank
|
|
| 31 |
## Training and Eval details
|
| 32 |
|
| 33 |
Standard de-accentuating and lower-casing for Greek as suggested in [AUEB NLP Group's Greek BERT](https://huggingface.co/nlpaueb/bert-base-greek-uncased-v1)
|
|
|
|
|
|
|
| 34 |
|
|
|
|
| 35 |
|
| 36 |
-
|
| 37 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# Ancient Greek BERT
|
| 2 |
|
| 3 |
+
<img src="https://ichef.bbci.co.uk/images/ic/832xn/p02m4gzb.jpg"/>
|
| 4 |
|
| 5 |
The first and only available Ancient Greek sub-word BERT model!
|
| 6 |
|
|
|
|
| 14 |
|
| 15 |
## How to use
|
| 16 |
|
| 17 |
+
Requirements:
|
| 18 |
+
|
| 19 |
+
```python
|
| 20 |
+
pip install transformers
|
| 21 |
+
pip install unicodedata
|
| 22 |
+
pip install flair
|
| 23 |
+
```
|
| 24 |
+
|
| 25 |
Can be directly used from the HuggingFace Model Hub with:
|
| 26 |
|
| 27 |
+
|
| 28 |
```python
|
| 29 |
from transformers import AutoTokenizer, AutoModel
|
| 30 |
tokeniser = AutoTokenizer.from_pretrained("pranaydeeps/Ancient-Greek-BERT")
|
| 31 |
model = AutoModel.from_pretrained("pranaydeeps/Ancient-Greek-BERT")
|
| 32 |
```
|
| 33 |
+
|
| 34 |
+
## Fine-tuning for POS/Morphological Analysis
|
| 35 |
+
|
| 36 |
+
Please refer the GitHub repository for the code and details regarding fine-tuning
|
| 37 |
+
|
| 38 |
## Training data
|
| 39 |
|
| 40 |
The model was initialised from [AUEB NLP Group's Greek BERT](https://huggingface.co/nlpaueb/bert-base-greek-uncased-v1)
|
|
|
|
| 44 |
## Training and Eval details
|
| 45 |
|
| 46 |
Standard de-accentuating and lower-casing for Greek as suggested in [AUEB NLP Group's Greek BERT](https://huggingface.co/nlpaueb/bert-base-greek-uncased-v1)
|
| 47 |
+
The model was trained on 4 NVIDIA Tesla V100 16GB GPUs for 80 epochs, with a max-seq-len of 512 and results in a perplexity of 4.8 on the held out test set.
|
| 48 |
+
It also gives state-of-the-art results when fine-tuned for PoS Tagging and Morphological Analysis on all 3 treebanks averaging >90% accuracy. Please consult our paper or contact [me](mailto:[email protected]) for further questions!
|
| 49 |
|
| 50 |
+
## Cite
|
| 51 |
|
| 52 |
+
If you end up using Ancient-Greek-BERT in your research, please cite the paper:
|
| 53 |
+
|
| 54 |
+
```
|
| 55 |
+
@inproceedings{ancient-greek-bert,
|
| 56 |
+
author = {Singh, Pranaydeep and Rutten, Gorik and Lefever, Els},
|
| 57 |
+
title = {A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek},
|
| 58 |
+
year = {2021},
|
| 59 |
+
booktitle = {The 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2021)}
|
| 60 |
+
}
|
| 61 |
+
```
|