|
|
# Grading Model
|
|
|
|
|
|
For the evaluation of translation results, we used the BLEU (Bilingual Evaluation Understudy) model to evaluate the translation results. Since the BLUE model calculates rating pairs based on comparisons, we constructed a corpus to store high-quality translated sentences. Meanwhile, since different users have different preferences when using the software, the software records the user's behavior of clicking the copy button and adds the phrases selected by the user to the corpus, so that the software can gradually adapt to the user's preferences and achieve personalization.
|
|
|
|
|
|
Here we have an example, if we use 3-Gram matching, for the the two sentence, the matching sore is 2/4. By analogy, we can easily implement a program to iterate through the N-grams to calculate a match. Generally speaking, the 1-gram result represents how many words in the text have been translated individually, so it reflects the faithfulness of the translation; while when we calculate the 2-gram or more, the result more often reflects the fluency of the translation, and the higher the value, the better the readability of the text.
|
|
|
|
|
|
|
|
|
|
|
|
However, if 1-gram is calculated, we can know according to the characteristics of the algorithm that if only one term from the corpus appears in the translated result, the algorithm will have problems (100% match, but the translation result is completely wrong)
|
|
|
|
|
|
|
|
|
![Screen_Shot_2021-06-10_at_20.12.11](uploads/12045c54b784ad3b50791bc2ada372fd/Screen_Shot_2021-06-10_at_20.12.11.png)
|
|
|
|
|
|
In order to correct this algorithm, we extract the minimum of the number of occurrences of N-gram in the machine translated translation and the maximum number of occurrences of N-gram in the reference translation, and calculate the N-gram corrected result according to the following formula.
|
|
|
|
|
|
![rendered_image](uploads/4f0a699a95d91ad3fce4251100081247/rendered_image.jpg)
|
|
|
|
|
|
The matching of N-grams may get better with shorter sentence lengths, so there will be a problem that a translation engine that translates only part of a sentence and translates it more accurately will still have a high matching score. To avoid this scoring bias, BLEU introduces a length penalty factor in the final scoring result
|
|
|
|
|
|
![rendered_image2](uploads/4254f455758867c0e1f8a44b36cd6912/rendered_image2.jpg)
|
|
|
|
|
|
![rendered_image3](uploads/18110f8e4e971fac165a8db6f86c3da3/rendered_image3.jpg)
|
|
|
|
|
|
In our program, the ratings are automatically converted to 🌟 next to the translator, with one star representing a low score relative to other translation results, two stars representing its middle score relative to other translation results, and three stars indicating its best translation. This user-friendly design allows users to use the program more easily and get a better experience.
|
|
|
|