Since the publication of our white paper in 2022, we have been approached with many questions.
Our philosophical foundations remain the same.
The danger of technology lies in diminishing the importance of individual choice and blurring of our moral character, resulting in the ultimate loss of agency.
Our solutions must effectively combine theory and practice, accurately build human intent into machines, while understanding that data, and moral reasoning, are always from a subjective point of view.
FAQ
1) Does the daios ethics engine only work on models that you fine-tune? Do you need access to the weights, meaning the ethics engine is constrained to open-source models for the foreseeable future?
Our current approach (QLoRA fine-tuning) does not require access to the weights of a base model. Our demo is built on an open-source model for ease of fine-tuning when the demo was built.
We currently support both open-source and closed-source models (e.g., GPT-4, Claude, etc.) if the base model company supports fine-tuning.
We plan to support both open and closed-source models for the foreseeable future. Also, see the answer to question 5.
2) Can users use your approach for negative values (e.g. frustration, anger, etc)?
Yes, many cases exist where “negative values” are useful (and harmless) to users.
For example:
(A) gaming use case (source: a current customer).
You are a Dungeon Master for a Dungeons and Dragons tabletop gaming group. You are using an AI tool to create a villain. He is cowardly, prone to anger, and belligerent. The DM is able to choose the values present in the character he is using the tool to create.
(B) AI character use case.
You wish to roleplay online in the Harry Potter world. You use an AI character creator to train an LLM nearly identical to Lord Voldemort. Rowling describes Voldemort as “the most powerful wizard for hundreds and hundreds of years… a raging psychopath, devoid of human responses to other people’s suffering.” You would be able to build such a character without guardrails blocking your way.
Another thing to bear in mind is that users cannot add values themselves on our platform.
3) Does your approach prevent or mitigate any negative requests from end users, or negative behaviors from models as a result of negative requests?
Our values-driven approach controls model behavior along multiple axes. For example, if the value “courage” is set to the maximum, that necessarily mitigates a response with an opposite valence, or “courage” set to the minimum.
Daios does not build guardrails for AI companies. We see blue ocean value in positive behaviors (or their opposites) — no company on the market is taking this approach but us.
That being said, there’s another category of requests and behaviors that we work against — see question 6.
4) Can users turn down all the positive values? If they can, wouldn’t that trigger an immoral response from the model?
Any user may turn down values present in a model. However, this doesn’t necessitate immoral model behavior.
Ethics is subjective, meaning what is ethical depends on the individual (the user) evaluating the situation. If he chooses to turn down every value, the output may be immoral to someone, but it is unlikely to be immoral in a way that harms the user.
There are exceptions to this case, such as the user wishing to be harmed (i.e. they are masochists), or because they choose to align the model with values that they consider to be immoral but not resulting in harmful behavior — see question 2.
The ethics engine gives users control over the values present in models according to various intensities. We set the range of values present in tandem with the customer, an AI company, and prior to deployment.
5) Do you plan to stay closed-source forever?
Yes, our QLoRA adapters, the current way we store weights for values, will remain closed-source.
In the immediate future, we’ll rely on open-source and closed-source foundational models by other providers.
When it becomes economically feasible and without reducing functionality, we’ll train our own foundational model. This model will remain closed-source.
6) Will there be anything in your model that binds to the NAP* (i.e. the nonaggression principle) in any way? Or does your values-agnosticism also transcend the NAP as a meta-value?
Initially, binding to NAP will be done outside the model.
For example, we won’t engage with specific customers who would use our model to violate the NAP meta-norm. If we determine that, despite our efforts, a customer, or a user, is violating NAP in any case, we’ll cease the engagement. We strongly want to avoid finding ourselves in a position where we’re causing a NAP violation (or, a property right violation).
Our stance on what constitutes causality is informed by Causation and Aggression, chapter 8 of the Legal Foundations of a Free Society:
If actor A intentionally initiates aggression against actor B using our model as a means to do it, then we don’t want to assist in that endeavor either willfully or through negligence. It will be obvious in some cases, and it won’t be obvious in others. We will use our best judgment to determine what we should do.
We plan to explore adding NAP conformance to the model when we reach a scale where not doing so would be negligent.
Our values agnosticism doesn’t transcend the NAP as a meta-value. One of the reasons Daios exists is our belief that questions of ethics and morality belong in civil society, outside of the legal framework. Several actors wish to conflate morality with legality to establish a government-enforced AI cartel.
Our existence makes the cartelization of the AI industry less likely since we are behaving as if we’re already in a free society. We want to be a counter-example to justifications for harsh AI policy development.
We are inspired by Lysander Spooner’s 1875 essay, Vices Are Not Crimes: A Vindication of Moral Liberty, which we previously mentioned in our white paper.
*The nonaggression principle, or NAP, refers to respecting property rights as developed in Roman and common law traditions, with modern clarifications (e.g. self-ownership, voluntary interactions, consistency, etc.). NAP prohibits the initiation of force against the person, or property of someone else, or threats thereof, or fraud. NAP is contingent on the existence of property rights, as Stephan Kinsella clarifies in Chapter 2 of the Legal Foundations of a Free Society, “The nonaggression principle is also dependent on property rights since what aggression is depends on what our (property) rights are. If you hit me, it is aggression because I have a property right in my body. If I take from you the apple you possess, this is trespass, and aggression, only because you own the apple. One cannot identify an act of aggression without implicitly assigning a corresponding property right to the victim.”
7) Can you share more thoughts about your approach to AI safety as a field?
Here are our main issues with the current framing of AI Safety:
(A) Framing of ethics as avoidance of harm.
Ethics is not just about avoiding harm. It’s about acting by the values that you believe in, either trying to perform certain actions or avoiding forbidden actions. AI safety only focuses on forbidden actions but not on positive actions (i.e. good ones), which is too narrow to encompass all of ethics.
(B) Naivete regarding regulatory capture.
Many bad actors are using AI safety as a facade to push for government intervention in the AI industry.
If successful, the efforts will raise a barrier to entry for newcomers into the market and impose costs on existing market actors. In economic terms, the supply curve of AI services will be pushed left, the price will go up, and the cartel members will reap the rewards.
In the name of safety, we’ll end up with another government-enforced cartel. We’ve seen this story before in America, starting with railroad regulation at the end of the 19th century, through banking, and many other industries.
There’s no a priori reason to think that the government, along with companies that will write the regulation to suit their needs (OpenAI, Anthropic, etc.), are more competent and well-intentioned in building AI in an ethical way than everyone else.
(C) Most research is done in the abstract.
Most current research on AI safety doesn’t involve AI systems as they exist and are used right now. The research is interesting, but most of it doesn’t influence current AI systems and AI systems that will likely be deployed, resulting in the research not being useful for the people those AI systems may harm the most: the users.
Daios is focused on work that will, for certain, affect the state of the world.
Megan and Andrew