TLDR: LONDONâDeepMind and Anthropic employ in house philosophers to shape LLM behavior, aiming at value alignment, fairness, and safer agents.
Key Takeaways:
- Philosophy has long faced employability doubts, but AI now rewards the same questions about intelligence and mind, pulling philosophers into labs.
- DeepMind and Anthropic have recruited internal philosophy teams, with Wired counting at least 10 at DeepMind and four at Anthropic, including Julia Haas and Amanda Askell.
- Skeptics fear ethics washing and narrowed problem scopes inside profit driven companies, while supporters argue privileged access helps translate safety ideals into model changes.
AI is turning philosophy into an internal tool, not a seminar topic. The real test is whether value alignment survives marketing pressure and competitive urgency.
AI is turning philosophy into an internal tool, not a seminar topic. The real test is whether value alignment survives marketing pressure and competitive urgency.
Q&A
What changes when a philosopher moves from academia to a model development pipeline?
Their questions shift from theory to measurement, like testing for moral competence versus imitation, because lab decisions depend on what can be evaluated.
Why do DeepMind and Anthropic still emphasize fairness and misuse more than consciousness?
Because near term harms scale faster in practice, and value alignment can be shaped through training, testing, and agent behavior constraints.
Could hiring philosophers actually increase transparency instead of just messaging?
It might, if ethical pressure forces clearer reporting on training practices, agent limits, and model behavior, which can improve both safety and reputational risk.
What is the strongest argument for ethics washing fears?
If philosophical work ends up serving hype about superintelligence or humanlike minds without altering how systems behave, it becomes a brand shield rather than a safety lever.
If philosophers gain influence, where might they push next as agents become more autonomous?
Toward governance of multi step agent actions, identity and intent handling in interactions, and reducing emergent failures like sycophancy and harmful escalation across tasks.
No comments yet. Be the first to share your thoughts!