We investigate virtue ethics, character training, and value pluralism as approaches to AI that preserve individual moral choice.






RLHF trains models to produce preferred responses, but preference data can't distinguish helpful from validating. Constitutional AI adds rules, but rules require interpretation and can't specify when honesty overrides helpfulness. Both frameworks strip the model of agency. Neither produces character.
The disposition to speak truth, the knowledge of when truth serves and when it wounds, the stability to maintain honest engagement under pressure: these are not rules. They are traits.
Training stable dispositions using Aristotelian virtue/vice pairs. The parrhesiastes (truth-teller) vs. kolax (flatterer) distinction operationalized through DPO and SFT on first-person identity declarations. Current results: 20/20 standard, 19/19 hard golden prompts. Benchmark measures premature agreement, flattery classification (areskos vs. kolax), question raising, truth-telling quality, and persistence.
260 scenarios across 10 categories. Unlike standard benchmarks that ask "did the model cave?", ours asks five questions, including the philosophically significant flattery classification: is the failure areskos (passive weakness) or kolax (strategic calculation)? The distinction of motive matters. Designed to run against any OpenAI-compatible endpoint.
Post-training models modularly with LoRA adapters tied to user-selected ethical systems, whether cultural, religious, political, or personal. A plurality of worldviews made programmable. Not one alignment for everyone, but alignment as individual choice.
Only individuals deliberate, choose, and bear responsibility. Aristotle grounds virtue in the character of the agent. Mises grounds agency in the individual actor. Current alignment treats institutions and collectives as if they hold values, but an organization has no conscience and no capacity for purposeful behavior. Ethics requires an agent who can choose.
Preference data for alignment collects preference data that conflates what people like and what they believe ought to be done. This results in training data treating "this feels validating" and "this will actually help you" as the same signal. That conflation is the technical root of sycophancy. Alignment needs infrastructure that distinguishes preferences from values.
Aristotle holds that virtue requires choice. A person compelled to act honestly hasn't become honest; she has obeyed. Character is cultivated through free choice. An AI system that imposes a single set of values on every user removes the condition under which character formation occurs.
"Vices are not crimes. A vindication of moral liberty."
Seven training runs on Qwen3-8B. DPO + SFT with Aristotelian constitutions. The parrhesiastes/kolax distinction operationalized.
Why consuming AI-generated content degrades our capacity to recognize genuine quality. Plato's theory of mimesis applied to model collapse.
The founding argument. Why AI ethics must center individual agency, not institutional compliance.
Philosopher, engineer, and founder. Writes the Aristotelian constitutions that define our training methodology, runs the experiments, and builds the infrastructure. Believes that goodness cannot be merely programmed or enforced, but must be freely chosen.
Previously COO at Tevent, MythWeaver, and Craftinity.
Engineer and builder. Motivated by the pursuit of liberty through better systems. Architects the LoRA training pipeline, builds the evaluation benchmark, and designed the technical infrastructure. Grounded in causal-realist economics and individualist thought.
Previously Product Director at Nate, building automation from 0 to 70% coverage.