• Thumbnail for Reinforcement learning from human feedback
    In machine learning, reinforcement learning from human feedback (RLHF) is a technique to align an intelligent agent to human preferences. In classical...
    43 KB (4,906 words) - 07:14, 6 May 2024
  • Reinforcement learning (RL) is an interdisciplinary area of machine learning and optimal control concerned with how an intelligent agent ought to take...
    55 KB (6,582 words) - 12:51, 15 April 2024
  • learning, but there are also techniques to fine-tune a model using weak supervision. Fine-tuning can be combined with a reinforcement learning from human...
    13 KB (1,381 words) - 11:57, 7 May 2024
  • Thumbnail for ChatGPT
    ChatGPT (category Short description is different from Wikidata)
    applications using a combination of supervised learning and reinforcement learning from human feedback. ChatGPT was released as a freely available research...
    176 KB (15,303 words) - 07:52, 10 May 2024
  • Claude (language model) (category Machine learning)
    model is fine-tuned on these revised responses. For the reinforcement learning from AI feedback (RLAIF) phase, responses are generated and compared according...
    10 KB (1,044 words) - 18:08, 3 May 2024
  • Large language model (category Deep learning)
    trained on textbook-like data generated by another LLM. Reinforcement learning from human feedback (RLHF) through algorithms, such as proximal policy optimization...
    129 KB (11,635 words) - 08:56, 9 May 2024
  • Paul Christiano (researcher) (category Short description is different from Wikidata)
    paper "Deep Reinforcement Learning from Human Preferences" (2017) and other works developing reinforcement learning from human feedback (RLHF). He is...
    12 KB (1,070 words) - 06:25, 6 May 2024
  • Thumbnail for Generative pre-trained transformer
    Generative pre-trained transformer (category Articles with unsourced statements from November 2023)
    instructions using a combination of supervised training and reinforcement learning from human feedback (RLHF) on base GPT-3 language models. Advantages this...
    46 KB (4,098 words) - 16:16, 2 May 2024
  • 360-degree feedback Biofeedback Climate change feedback, for positive and negative feedbacks associated with climate change Reinforcement learning from human feedback...
    3 KB (406 words) - 23:14, 26 April 2024
  • AI era (category Short description is different from Wikidata)
    technologies such as ChatGPT. Reinforcement learning from human feedback: Enabled the alignment of AI models through human feedback, creating AI assistants...
    20 KB (1,849 words) - 18:58, 30 April 2024