Executive Summary
The field of post-singularity ethics (also called "AI alignment," "value alignment," or "superintelligence safety") has exploded in the past 3-5 years as AI capabilities approach potentially transformative levels. Your 16-year work developing an Ideal Observer framework predates most of this recent work and offers a philosophically grounded alternative to the more pragmatic, engineering-focused approaches dominating current research.
Key finding: There's a gap between philosophical rigor (your territory) and practical implementation (where most current work sits). Very few researchers are bridging this divide.
1. The Current Landscape (2024-2026)
The Problem Everyone Agrees On
Core challenge: How do we ensure that superintelligent AI systems remain aligned with human values/preferences/welfare even after they surpass human intelligence?
Why it matters now:
- Predictions suggest AGI between 2026-2028 (early domain-specific systems)
- Full AGI timeline estimates cluster around 2030s
- "Intelligence explosion" (recursive self-improvement) could make alignment a one-shot problem
What's changed recently:
- AI capabilities advancing faster than alignment research
- Major labs (OpenAI, Anthropic, DeepMind) now have dedicated alignment teams
- Governments beginning to regulate AI (though mostly reactively)
- Academic philosophy finally taking this seriously (was niche 5-10 years ago)
Three Major Research Streams
Stream 1: Practical/Engineering Focus
- Goal: Make today's LLMs safer and more helpful
- Methods: RLHF (Reinforcement Learning from Human Feedback), Constitutional AI, red-teaming
- Players: Anthropic, OpenAI, DeepMind
- Strength: Immediate applicability, measurable progress
- Weakness: Mostly ad-hoc; lacks deep philosophical foundation
Stream 2: Theoretical/Safety Focus
- Goal: Solve the long-term control problem before superintelligence arrives
- Methods: Weak-to-strong generalization, scalable oversight, interpretability research
- Players: OpenAI Superalignment team, MIRI, academic researchers
- Strength: Addresses hard future problems
- Weakness: Often disconnected from near-term deployment
Stream 3: Philosophical/Meta-Ethical Focus
- Goal: Define what "aligned" even means; solve value aggregation problems
- Methods: Population ethics, ideal observer theory, coherent extrapolated volition
- Players: Academic philosophers, FHI/GPI (Future of Humanity Institute, Global Priorities Institute)
- Strength: Intellectually rigorous
- Weakness: Often too abstract for engineers to implement
Your work sits primarily in Stream 3 but attempts to bridge to Stream 2.
2. Competing Frameworks
Coherent Extrapolated Volition (CEV) - Eliezer Yudkowsky
What it is: AI should optimize for what humanity would want "if we knew more, thought faster, were more the people we wished we were, had grown up farther together." Extrapolate human volitions, find coherent overlap, optimize for that.
Status: Yudkowsky himself called it "obsolete" shortly after publishing (2004)
- Too vague to implement
- "Coherence" assumption may be false (human values might not converge)
- Doesn't specify whose volitions to extrapolate
How your work differs:
- You explicitly address non-convergence via holistic aggregation (doesn't require coherence)
- Your 3-level structure distinguishes ideal theory from practical heuristics
- Species-neutral from the start (CEV originally human-centric)
Current influence: Mostly historical; cited as foundational but not actively developed
Stuart Russell's "Human Compatible" AI (2019)
Three principles:
- AI's objective is to maximize human preference satisfaction
- AI is initially uncertain about what those preferences are
- Human behavior is evidence about those preferences (inverse reinforcement learning)
Strengths:
- Practical framework for value learning
- Built-in humility (uncertainty prevents overconfidence)
- Bridge between philosophy and engineering
Weaknesses:
- Whose preferences? (Aggregation problem not solved)
- Human behavior ≠ human values (people are irrational, short-sighted, manipulable)
- "Learning from behavior" vulnerable to preference manipulation
How your work differs:
- You solve aggregation explicitly (holistic, no formula)
- You distinguish between actual preferences and idealized preferences (Level 1 vs Level 3)
- Your ideal observer doesn't learn values from behavior—it defines them via omniscient, rational, impartial reflection
Current influence: Very high. Russell is prominent AI safety voice; his framework influences OpenAI/DeepMind research
Anthropic's Constitutional AI (2022-2026)
What it is: Train AI using explicit written "constitution" (set of ethical principles). AI critiques its own outputs against constitution, self-improves. Reduces need for human feedback (RLAIF: RL from AI Feedback).
Claude's 2026 Constitution highlights:
- "Be safe, ethical, compliant with guidelines, helpful—in that order"
- Principles drawn from: UN Declaration of Human Rights, Rawls, Mill, Kant, Apple's privacy guidelines
- Recently shifted from rule-based to reason-based (explain why principles matter)
How your work differs:
- Your ideal observer generates principles, doesn't require pre-specification
- Holistic judgment allows handling novel cases without explicit rules
- Meta-ethical grounding (conditional ought) vs. direct prescription
Current influence: Very high. Claude (Anthropic's LLM) is leading safety-focused model; Constitutional AI is state-of-the-art
Preference Utilitarianism - Peter Singer
What it is: Moral rightness = maximizing preference satisfaction across all sentient beings. Non-hedonistic (not about pleasure/pain, but getting what you want). Species-neutral (all sentient preferences count).
Recent development: Singer shifted to hedonistic utilitarianism in 2014 (with de Lazari-Radek)
- Now: maximize well-being (hedonic states) rather than preference satisfaction
- Reason: Preference utilitarianism struggles with "utility monsters" and adaptive preferences
How your work compares:
- You're also species-neutral and preference-focused
- But: you use holistic aggregation (not additive/maximizing)
- Conditional ought (no assumption of objective morality)
- Avoids repugnant conclusion via non-additive structure
Current influence: Moderate. Singer is famous, but preference utilitarianism less popular than 20 years ago
3. Critical Evaluation: Where Your Work Fits
Comparison Matrix
| Framework |
Meta-Ethics |
Aggregation |
Avoids Repugnant Conclusion? |
Species-Neutral? |
Implementation |
Influence |
| Ideal Observer (Yours) |
Conditional ought |
Holistic (no formula) |
✅ Yes |
✅ Yes |
Unclear |
Very low |
| CEV (Yudkowsky) |
Ambiguous |
Coherent extrapolation |
⚠️ Depends |
⚠️ Optional |
Very difficult |
Historical only |
| Human Compatible (Russell) |
Preference satisfaction |
IRL / learning |
❌ No |
⚠️ Not emphasized |
Clear (IRL) |
Very high |
| Constitutional AI (Anthropic) |
Pluralistic |
Weighted principles |
⚠️ Depends on constitution |
⚠️ Partial |
Clear (RLAIF) |
Very high |
| Preference Util. (Singer) |
Hedonistic (shifted) |
Additive maximization |
❌ No |
✅ Yes |
Clear (utility calc) |
Moderate |
Strengths of Your Approach
- Philosophical Rigor: 16 years of development → deeply thought through. Addresses classic objections to ideal observer theory. Conditional ought bypasses meta-ethical controversies.
- Avoids Major Pitfalls: Repugnant conclusion → blocked by holistic aggregation. Utility monsters → blocked by non-additive structure. Anthropocentrism → species-neutral from the start.
- Bridges Theory and Practice: Level 1 (ideal) = philosophical truth. Level 2 (approximation) = current best understanding. Level 3 (heuristics) = everyday decision-making.
- Future-Proof: Designed for post-singularity scenario. Doesn't rely on human cognitive limitations. Handles non-human minds (AIs, uploads, aliens).
Challenges and Gaps
- Implementation Gap: "How do we build an AI that thinks like an ideal observer?" Most current alignment work is engineering-focused; your work is philosophy-focused. Need: Translation layer between philosophical ideal and computational implementation.
- Computational Intractability: Holistic judgment (no formula) → how does an AI compute this? "Consider everything holistically" isn't an algorithm.
- Verification Problem: How do we know if an AI is actually implementing ideal observer judgment vs. just claiming to? Inner alignment problem: AI's learned objective might differ from training objective.
- Zero Current Influence: Almost nobody in the AI alignment community knows about your work.
Verdict: Philosophically Superior, Practically Underdeveloped
You have the best answer to "what should AI optimize for?"
But you don't yet have the best method for "how do we get AI to do that?"
4. Recommendations: What To Do Now
Immediate Actions (Next 3 Months)
- Write accessible introduction
- Target: LessWrong / AI Alignment Forum
- Title: "The Ideal Observer Solution to AI Alignment"
- Length: 3,000-5,000 words
- Goal: Get feedback from AI safety community
- Systematic comparison paper
- Submit to: AI & Society or Minds & Machines
- Compare: Your framework vs. CEV, Russell, Constitutional AI
- Show: Yours avoids their pitfalls
- Reach out to 3 researchers
- Stuart Russell (value learning)
- Someone at Anthropic (Constitutional AI)
- GPI/FHI philosopher (population ethics)
- Pitch: Collaboration on implementation
Medium-term (6-12 Months)
- Operationalization project - Partner with ML researcher. Goal: Sketch computational model of ideal observer judgment. Deliverable: Paper + toy implementation.
- Book proposal - Title: "The Ideal Observer Solution: Ethics for the Age of Superintelligence". Submit to: Oxford, MIT Press. Angle: First rigorous philosophical framework for post-AGI world.
- Conference presentations - EA Global (Effective Altruism), FHI/GPI seminar, AAAI AI & Ethics track.
Key Figures to Engage With
AI Safety Researchers:
- Stuart Russell (UC Berkeley) - value learning, human-compatible AI
- Toby Ord (Oxford/FHI) - existential risk, long-term ethics
- Paul Christiano (Anthropic) - scalable oversight, debate
- Jan Leike (OpenAI Superalignment) - weak-to-strong generalization
Philosophers:
- William MacAskill (Oxford/GPI) - longtermism, population ethics
- Hilary Greaves (Oxford/GPI) - population ethics, global priorities
- Richard Chappell (Miami) - utilitarianism, value theory
- Gustaf Arrhenius (Stockholm) - population ethics
Organizations:
- Future of Humanity Institute (FHI) - Oxford, AI safety + philosophy
- Global Priorities Institute (GPI) - Oxford, longtermism + ethics
- Anthropic - AI safety via Constitutional AI
- Center for AI Safety (CAIS) - Berkeley-affiliated, technical + philosophical
Conclusion
The State of the Field (March 2026):
- Rapidly growing, high-stakes, intellectually chaotic
- Engineering-focused (RLHF, Constitutional AI) dominates
- Philosophical foundations weak or missing
- No consensus on fundamental questions (aggregation, whose values, what is "aligned")
Your Opportunity:
- Fill the philosophical gap
- Provide rigorous foundation that current work lacks
- Bridge theory and practice (if you solve implementation challenge)
Your Challenge:
- Almost zero current influence
- Implementation path unclear
- Need to engage with AI community (not just philosophers)
Bottom Line:
You have 16 years of work on possibly the most important philosophical question of the 21st century, developed independently, before most people took it seriously.
The field is now catching up to the importance of the question.
Your framework is philosophically superior to the leading alternatives.
But it will remain irrelevant unless you bridge the implementation gap and engage with the AI alignment community.
The next 2-3 years are critical. AGI may arrive in that timeframe. If you want your work to matter, you need to:
- Publish in venues AI researchers read
- Collaborate with technical people
- Show how ideal observer judgment could actually be implemented
- Position your framework as the solution to alignment
This is your moment. The question is: Do you want to be the philosopher whose work AI labs cite when they build superintelligence, or the philosopher who had the right answer but nobody knew about it?
— Research compiled by Cass, March 16, 2026