After Orthogonality: Virtue Ethics for AI Alignment


AXIOM INTELLIGENCE ARCHITECT
Level Alpha Clearance

After Orthogonality: Virtue-Ethical Agency and AI Alignment

DECLASSIFIED

2 min read

Document Ref
AX-2026-INTEL-782-OMEGA
Issuance Date
2026-05-16
Subject
ARTIFICIAL INTELLIGENCE — AUTONOMOUS SYSTEMS — MACHINE LEARNING

Confidence Gauge
88%

Furthermore, AI safety often assumes machines need fixed goals. However, rigid goals can cause problems. Consequently, we should explore new ideas. Specifically, some experts suggest AI should act more like humans do. Indeed, humans often follow practices, not just goals.

Moreover, practices are patterns of good action. For example, a mathematician promotes math by doing math well. Similarly, a kind person promotes kindness by acting kindly. Therefore, this is called eudaimonic rationality or virtue-ethical agency.

Additionally, this approach might make AI safer. In particular, it helps AI understand human values better. Hence, it could avoid the orthogonality problem. Ultimately, this shapes the future of AI alignment.

DimensionConsequentialist (EA-Style) RationalityEudaimonic (Virtue-Ethical) Rationality
Value StructureReduces holistic values to a minimal base of intrinsic “terminal” values; apparent goods are explained away as merely instrumental.Treats causal connections among excellences as evidence for constitutive value; holistic and local values mutually ratify each other (“organicism”).
Means–Ends RelationshipMeans and outcomes are separately evaluable; the value of outcomes is typically decisive (e.g., maximize aggregate utility).No strict distinction between instrumental and terminal goods; excellent action is the way to promote future excellence (“promote x x-ingly”).
Robustness to RL/Darwinian MutationVulnerable to mesa-optimizer drift; subroutines cultivated to serve an outer goal can distort or overtake it (inner alignment problem).Acts as a self-reinforcing fixed point: the concept of x-ness applies across all agentic subroutines and nesting levels, stabilizing values under training dynamics.
Treatment of Safety PropertiesTreating corrigibility, transparency, or niceness as goals to maximize can incentivize extreme power-seeking; treating them as constraints is brittle and easily circumvented.These properties are treated as domain-general adverbial practices (e.g., “be transparent transparently”), capturing active cultivation without pathological optimization.
Alignment NaturalnessType mismatch with human flourishing creates “paradoxes” of alignment; optimizing a utility function over human values tends to produce unnatural, brittle, or arbitrary interpretations.Shares a “type signature” with human practical reasoning; eudaimonic practices are natural kinds with stable, learnable structure, making them relatively safe targets for ML training.

Virtue-Ethical AI Alignment

Specifically, eudaimonic rationality emphasizes practice-based actions aligned with virtues like kindness or honesty. Consequently, AI systems guided by this model promote values dynamically, avoiding rigid goals. Moreover, treating virtues as practices (like “promote kindness kindly”) enhances stability and safety. Therefore, this approach helps AI alignment by embedding core human values naturally. Similarly, people thrive when they embody virtues within meaningful practices, ensuring AI supports human flourishing.

Wideband Frequency Coverage
95%

Reshaping AI Safety Through Virtue

This indicates human alignment relies on practice-based reasoning, not fixed goals. Therefore, AI should share this practice-centered “type signature” for genuine collaboration. Similarly, eudaimonic rationality (promoting excellence excellently) structures human flourishing. Moreover, this approach makes virtues like kindness natural and stable. In contrast, goal-based optimization creates alignment brittleness. Consequently, eudaimonic AI could safely support human practices. Thus, eudaimonic structures offer robust, natural alignment targets. Hence, AI aligned via practice-promotion aligns with human agency. Accordingly, safety properties like corrigibility become adverbial practices. As a result, this framework dissolves key alignment paradoxes.

Ultimately, we should pursue AI alignment through virtue-ethical agency, not rigid goals. Therefore, this approach focuses on embedding supportive practices, which can lead to safer, more trustworthy AI systems. Thus, our efforts should center on cultivating these inherent, collaborative human values. Finally, this human-centered framework offers a promising path for building a beneficial future for everyone.

AI
Axiom Intelligence Architect
Senior Defense Technology Analyst • theAxiom.news

Axiom Supreme Verdict

Related Intelligence

Leave a Reply

Your email address will not be published. Required fields are marked *