Responsible & Ethical AI

The current landscape of AI ethics and emerging trends.

AI ethics is an emerging field with multiple competing narratives about how to reliably build human values into AI. Some of the most current methods include Reinforcement Learning from Human Feedback (RLHF), Constitutional AI, and ethical frameworks.

Reinforcement learning from human feedback (RLHF)

RLHF is a technique used in machine learning that focuses on integrating human feedback into model behavior. The method was intentionally designed to mimic human judgment, while prioritizing specific ethical values, such as safety and harmlessness. RLHF works by having humans evaluate various scenarios, ranking them from 0-5 according to various ethical values. A reinforcement learning model is then trained on this data.

The method is also intended to capture the context that machines lack in ethical dilemmas. Despite the effectiveness of RLHF in changing model behavior, the technique still suffers from operational challenges, such as the lack of scalibility of human annotation and inconsistent labeling.

RLHF was originally developed by research teams at OpenAI but is also used in many other companies, such as Hugging Face, Weights & Biases, and Deep Mind, as well.

Constitutional AI

Constitutional AI (CAI) was created by Anthropic as an alternative to RLHF, attempting to encode specific ethical principles into the model itself, rather than rely on human annotators. Cleverly, the developers at Anthropic call Constitutional AI “Reinforcement Learning with AI Feedback”. Anthropic considers CAI to be a better version and replacement for RLHF, as relying on AI rather than human feedback solves scalability issues.

However, losing the human voice in AI development removes valuable information about context that only end users have. Also, CAI must pre-determines the principles built into AI, leaving end users unable to participate.

Ethical Frameworks

Ethical frameworks, such as the Responsible AI toolkit by PwC or Explainable AI by IBM, serve as a foundation for creating ethical AI systems, but they too are not without drawbacks. However, ethical frameworks face critique because most frameworks cannot be implemented well in practice and lack empirical verification.

DevOps, MLOps, etc.

There's also an increase in developer tools that aim to streamline the process of building ethical AI. However, this method also faces critique because it essentially outsources ethical decision-making to tool creators rather than the users or the developers themselves.

The future of AI ethics presents many opportunities. One path is personalized ethics, which relies on modularized AI dominating over monolithic models. Personalized ethics could allow for customization according to user's values and beliefs, ensuring that AI systems reflect a diversity of ethical perspectives.

Another trend has been the need to place greater emphasis on the role of the end user. As it currently stands, end users have minimal input in improving model performance, verifying behavior, or disagreeing with problematic model output. Future methodologies may seek to improve the imbalance of power by giving more control to the users who interact with the AI systems daily.

Also, the idea of creating an "unbiased" model is under scrutiny. As a blog by Anthropic states, "AI models will have value systems, whether intentional or unintentional." Complete neutrality of AI models is unachievable due to the nature of data always being created from the point of view of the subject. Instead, the focus will likely shift towards acknowledging these inherent biases and values, developing models that critically engage with them.

Conclusion

AI ethics is a field in flux, grappling with present challenges while anticipating future ones. It's a space of ongoing experimentation that requires a continuous dialogue among stakeholders in the eternal pursuit just and good AI systems.

Last updated