In the situation of supervised Finding out, the trainers performed both sides: the person as well as AI assistant. inside the reinforcement Studying phase, human trainers very first rated responses which the model had established in a past discussion.[fifteen] These rankings were applied to create "reward versions" that were utilized to fantastic-t