Note: The subject of this post is known to have caused actual psychological trauma and enduring distress. According to certain people (e.g. Eliezer Yudkowsky), the expected disutility (negative utility) of learning about the concept known as the Roko Basilisk is enormous.
3 plus ones
Shared publicly•View activity
View 19 previous comments
- Parfit's hitchhiker makes implicit assumptions about what it means to be "rational", which also is human fruzziness.
If the AI hasn't been explicitly designed to do everything not to die of thirst then it isn't irrational not to precommit not to break a deal.
If you claim that the AI will transform into a TDT agent because it will die of thirst having to face a Parfit's hitchhiker situation, then you narrowed down on the exact specifications of the AI's utility-function. Namely that it values world states where it does not die of thirst but changed its decision theory more than world states where it never changes its decision theory but faces annihilation given certain kinds of game theoretic scenarios.
In other words, you just shifted the problem of human fuzziness from "absurdities" to "best outcome" and "rational".Jan 13, 2013
Parfit's Hitchhiker, of course, shows no such thing. To demonstrate that CDT will modify into TDT you would need to show that there is nothing more CDT optimal to modify to.Jan 13, 2013
- "Parfit's hitchhiker makes implicit assumptions about what it means to be "rational", which also is human fuzziness."
CDT is precisely defined (http://en.wikipedia.org/wiki/Causal_decision_theory) -- it doesn't need to involve the words "rationality" at all, just list some desirability functions desirability(DIES)=-10 desirability (PAYS)=-1 and evaluates utility of an action according to such.
Sure, if we assert additionally desirability(SELF-MODIFIES)=-50, then the agent won't self-modify under these circumstances. But I was discussing here agents with no such injuction against self-modification.
(There's also a problem about how to define self-modification. Is adding information in its memory self-modification? Is creating a secondary program that holds veto power over the first program's decision, "self-modification"? For every explicit self-modification that is forbidden there's probably a work-around that doesn't actually explicitly violate the injuction but leads to the same effective result)
"To demonstrate that CDT will modify into TDT you would need to show that there is nothing more CDT optimal to modify to."
Well, sure, in my demonstration program I would just simplistically explicitly define two theories, a CDT and TDT-lite (e.g. CDT with the injuction to keep its promises), so it would have only two choices and prefer TDT-lite according to CDT-optimality. I wouldn't be able to program the whole range of decision-theory-space, let alone be able to create a program that could map it out itself, so the program wouldn't show that a real AI would actually move to a TDT theory if it had the whole range of decision-theory configuration space to move to.Jan 13, 2013
I think you are confused. CDT with commitment is not TDT-lite. Getting back to our original problem, it is not CDT optimal to waste resource on torturing people for past wrongs, and thus it is CDT-better to insert modifications that prevent such torture. It is quite straightforwardly the case that conversion to TDT is not CDT-optimal, in the sense that better choices are easily and straightforwardly generated.Jan 13, 2013
- "CDT with commitment is not TDT-lite"
It seems to me to be so. The way I understand them:
CDT-with-committment makes commitments when it knows commitments are a winning strategy. So e.g. knowing that a Newcomb's box style dilemma is in its future, CDT-with-committment would know to commit to one-box, so as to get the bigger prize. But that means it effectively has to calculate in advance all the possible type of problems before it can make useful committments for each of them. Thus, not ideal, as it will fail to find the optimal strategy for problems it hadn't calculated in advance.
So a CDT-with-committment whose code will be duplicated and forced to face Prisoner's Dilemma will be able to Cooperate if-and-only-if it knows it will be duplicated and forced to face Prisoner's Dilemma.
TDT abstracts over a whole category of such problems by not having to do those committments in advance, not needing to be aware of a particular dilemma. It instead knows to acts as if it had committed itself when such "acting as if" is a winning strategy. So it one-boxes in Newcomb's box, and cooperates in Prisoner's Dilemma (when facing versions of itself), without needing to know in advance that it will have to face such dilemmas.Jan 13, 2013
- Look. CDT at the time t0 can modify itself into
A: an agent that will act as if it had committed itself at any time t,
B: an agent that will act as if it had committed itself at any time t>=t0 .
(and a zillion other things)
The A will spend resources in the ways which are [CDT at time t0]-ineffective, such as wasting clock cycles on torture or paradise or what ever your fantasy is. B won't. CDT will pick B over A (if not some C over either A and B). Existence of B is sufficient argument that CDT won't pick A. edit: and had CDT been able to pick A, it wouldn't need to self modify in the first place anyway.
You keep arguing that it will pick A, which only looks plausible because you couldn't conceive of B, but is nonsense as even if you can only conceive of A you have no reason what so ever to think that another human - let alone a superintelligence! - can't think of something else.
And with relation to our original argument, it doesn't matter how broadly or narrowly you define TDT, a future instance of CDT neither wants to torture you for nothing nor wants to modify into something that would torture you for nothing.Jan 13, 2013