Agoric Talks, 20 Jul 2016

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Mark Miller - The Elements of Decision Alignment: Large programs as complex organizations

Description

Talk Link: http://2016.ecoop.org/event/ecoop-2016-papers-plenary-speaker-

A

Today, I'm going to tell you about what we have come to call the elements of decision alignment. This is an interdisciplinary collaboration.

A

This is an interdisciplinary collaboration were unfocused on the computer science, but with a long-standing interest in economics and my colleague bill Tala. His focus is economics, but with a long-standing interest in computer science, we motivate the whole talk with this question when one object makes a request of another object. Why do we expect the second objects behavior to satisfy? The first objects wishes we're? Not the only field in which such questions are relevant.

A

Many complex systems can be characterized as consisting largely of networks of entities making requests of other entities, not just object, oriented programs but also human organizations and human economies. We stand on a long tradition of computer science, borrowing ideas from economics in object-oriented software construction, Bertrand, my rights like an economist. We are interested in individual agents, not so much for what they are internally as for what they have to offer.

A

Each other much of object-oriented design is indeed designed by contract and, as we all know, the resulting designed by contract ideas have been of tremendous influence at our field.

A

Any such interdisciplinary investigation is an orthogonal cut through the world. It necessarily must touch lightly. On many different topics that other disciplines have studied in quite a lot of depth, so so the price of this is tremendous oversimplification.

A

So beware in the first part of the talk we're going to discuss the making of requests, then we'll discuss the aligning of the decisions by the entities involved in these networks of requests making and then the trade-offs involved in aligning the requests aligning the behavior rather and how all of this bears on the division and composition of knowledge in networks of such entities. A.

A

Simple example in our world of requests making is when a parser asks a stacked push a token in economics. A simple request might be that I have a gift. I want to um my dad to receive so I walk into a package. Delivery, business and I asked them to deliver the gift.

A

In economics, the term for the request making entity is the principal and the term for the entity that responds to the request is the ancient when both principal and agent are humans. These relationships are studied by economics.

A

What we want to do is not just borrow ideas from economics as an analogy. Rather, we need to recognize that the world were already in is one of dynamic, shifting networks in which objects make requests in which humans make requests and objects through user interfaces, objects make requests of each other and objects, make requests of human beings thing through user interfaces, think Merkel workflow systems or the user interfaces exposed to an uber driver.

A

So these networks shift and change in which roles in these networks are held by humans or objects, change quickly and will change many times. So what we wish to do is find ways to reason about these networks that transcend the differences between the human world and the software world.

A

First, some definitions: the principle is the one making the request. The agent is the one who receives and reacts to the request.

A

Incentive alignment in economics is when a principle our agent implicitly assumed to be human uses incentives to induce the others intentions to align with their on by a line we mean compose well without interference.

A

We coined the term decision alignment as a generalization of incentive alignment decision alignment is when a principle our agent uses various tools to make it more likely for the others, decisions and actions to align with our own. The contrasts are we drop the implicit assumption that we're dealing with human beings? We also do not assume that we're dealing with objects we're trying to be neutral across those worlds.

A

We don't emphasize incentives as a special category; rather we just consider it to be one tool among many to be brought to bear on the problem and we care about what the agents actually do. We don't much care about whether that is what they intended to do.

A

So in economics, it's very important that these relationships are examined both from the principal's perspective and from the agents perspective symmetrically. Hence- and we agree that is important. However, for purposes of this talk, we're going to take the principal's perspective and examine what tools the principal can use to shape the behavior of the agent.

A

Economics started off trying to think about these problems purely in terms of incentive alignment and found that, because of hidden information in various ways, they could not solve this problem in that whether they needed to bring in other elements oops.

A

They needed to bring in other elements, so they divided the problem into three phases before the request is actually made. There's the préfet ease in which the principal is faced with the problem of hidden characteristics of things that it does not know about the agent that it might be making the request not in the post phase. The principal is faced with the problem of hidden actions. It might not be able to tell what it is that the agent actually did now in economics.

A

The focus is on intentional misbehavior, so this creates a very particular hazards that arise from this lack of information.

A

Computer science is mostly focused on accidental, miss behavior, which is to say bugs, so the particular hazards that arise from this lack of information are different, but the overall structure is just as useful for us as it is for them.

A

In both cases, we can think of each of these phases, as consisting of the following steps.

A

The principal first selects an agent then inspect the agent that is selected based to in order to determine what its abilities or limits might be. Then a request itself contains three components: there's what the request allows the agent to do in service of the principal.

A

There is the explanation: what is it that the principal actually wants the agent to do and then there's arranging to reward the agent if it if it actually does those things and then in the post phase, there's various kinds of monitoring to see what the agent is actually doing or has done? And then the feedback from the information gathered feeds back into selection, to guide the selection decision of this and other principles for expository purposes.

A

We rearrange the slot this slide like so because we want to focus in on the steps and focus on the fact that they occur in a loop.

A

The ordering that we see here is mostly expository. Pre-Clear Lee comes before request comes before post, but the order of steps within the these phases is purely an expository convenience and the reality will differ from case to case.

A

So economists first examined shaping these relationships in the human world purely using incentives found that that wasn't adequate and expanded to take into account these other steps in the relationship, all of which are tools that can be brought jointly brought to bear by the principal to help shape the agents behavior.

A

What we're doing is recognizing commonalities between what they're describing what we've always done in computer science, in order to generalize this framework to cover both worlds and to enable it to reason about a mixed world in in subduing, we drop the special focus on incentive alignment and just consider it to be one tool out of many.

A

Let's talk about package delivery.

A

So what I want to deliver a package? The first thing I need to do is select a package delivery service. I. Do this on the basis of reputation? Is this one I've heard of before and fit? Does it offer the services I need? If my dad's birthday is tomorrow, I might need one that offers overnight delivery.

A

In this scenario, inspect is not relevant, so we skip directly to allow in order to enable them to deliver the package. I must give them the package.

A

I then explained to them. What I want them to do with it? What address to deliver it to any other delivery instructions, and then I pay them creating an incentive for them to follow through.

A

Now the agent has my package: it's literally out of my hands am I left in the position of only being able to hope and pray. Well, no, there's monitoring of various kinds.

A

The package delivery service might offer tracking or return receipts and, in any case, I can ask my dad if he received it and whether it was damaged and all the information that results from that monitoring. I can feed back to guide the selection. The future selection decisions of both myself and other principals.

A

Now, let's apply this framework to software before we can apply it to software. We must. We must distinguish the two worlds: the static world of design and development time and the dynamic world of run time in the static world. The principles and agents of interest are developers and code, and in the dynamic world there are objects making requests of other objects at runtime.

A

For normal software development incentives are not really relevant objects. Don't try to induce good behavior from the other objects, so we D emphasize that and we'll skip it in our software development. Look so selection starts out hire the best programmers find good libraries build on those.

A

We spend a tremendous amount of effort in the inspect phase, with code reviews and static analysis tools.

A

And then, in the allow phase, we arrange for the program when run to be allowed to take various actions. Generally, what we do is we run the program as the user. It's running on behalf of, and therefore the actions it's allowed are all the actions that that user is allowed. The square root function in your math library is allowed by your system to delete all of your files.

A

This excess authority invites attack and creates vulnerabilities.

A

The object capability work provides an interesting alternative that starts off by shifting the static dynamic boundary to include allow. So what actions are allowed is determined on a perm request basis. This has a direct analogy in our package delivery story when in asking them to deliver the package, I give them the package.

A

Then explain is the topic of API design. It's the API defines a little language that the principal users to express its desires to the agent and there's a lot to be said here, but we'll come back to that later in the talk and then, while the agent is reacting to the request, instrumentation of various sorts monitors, what it is doing and that information is very useful for testing and for generating bug reports.

A

So as we examine these worlds of humans objects and user interfaces, and we look at them from the perspective of each of these six steps, we find that each of the cells is occupied by interesting activity and many of them are studied by separate disciplines to a shocking degree. Each cell is discussed and studied in isolation from the other cells. So now, let's look at some of the ways in which these elements combine.

A

We're going to use the the metaphor of an audio mixing board for each of these steps. The principal has many choices they can make on an audio mixing board. Each slider can be set independently of every other slider, but not all settings, not all combinations of slider settings, sound good. Some of them work much better with other settings of the other sliders.

A

Well, selecting an agent might range from open entry to more gated to having stronger admission controls into the candidate set of agents. Internal inspection can range from code reviews to full machine check, verification of correctness.

A

Allowing actions range from providing very broad authority, which is necessary consequence of providing the authority statically to very narrow least authority.

A

The requests that are explained can be more for informal or can be more formally specified and reward can range from just providing guidance like the reward function exposed to a machine learning system, or it can actually be trying to induce good behavior such as paying the package delivery. Company and monitoring can serve the purpose of providing feedback and also detecting corruption and stopping it quickly and also to be able to repair the damage.

A

When we take a look at the package delivery service in terms of these sliders, these are the settings that we come up with.

A

When we do so for internal software development, these settings are the ones that seem representative to us. Instead, we've analyzed some other scenarios.

A

What happens within human organizations when principal and agent are both employees of the company v.

A

There's a heavy dependence on the gating of who is allowed into the company and within the company, the print the requests that are made are often very informal and based on on shared understanding. That's never written down.

A

And in computer security, oh the safe plug-in boundary that I talked about at ma on these at the frozen realms talk on Monday and Tuesday is a trust boundary between a framework and plugins that would plug into that framework where the framework does not trust the plugins.

A

An engineering for such system seems to have this. As a representative set of settings, it's much more open entry. It's sort of the point anybody can write a new plug-in offer it to be plugged in and in order to have a framework that people can can contribute. Plugins to the interfaces must be more formally specified.

A

Bitcoin and a theorem from a very curious case, because they push some of these sliders to the extreme. In particular, they push open entry to the extreme.

A

These are known as permission lyst systems, nobody needs anybody's permission to participate and nobody can be evicted for bad behavior in order to to help with the consequences of completely open entry, they put a lot of pressure on the design of incentive systems, pushing that to the opposite extreme, where they try to create this whole architecture of incentives that by itself pushes all the participants into behavior that continues to sustain the system. Over time, then they've been successful at that.

A

An interesting example that demonstrates how some of these elements are better used together comes from copy news, thesis on building reliable voting machine software.

A

In the scenario he examines the programmer is assumed to possibly be biased and admitted that the programmer might wish to bias the election. So neither the program ignore the code that he writes is trusted.

A

However, the programmer must seem to have satisfied Tony horse prescription of having written a voting machine. That is so simple that it is obviously correct and in fact, what ping presented in his thesis was the code for a working voting machine that consisted of 400 very simple lines of code in a simple first order. Language and he wrote extensive prose rationale justifying every single line of code in there.

A

He then subjected that code to intense review, intense security review, and the hope was that by having a very, very simple code, with extensive rationale ah that the reviewers would be able to verify by review that it was correct so to test this ping inserted three bugs into the code, each of which could bias the election and all of which were carefully crafted to escape review.

A

So we know that intense review of simple code is fairly effective at spotting, accidental bugs. What we found was that pings attempt to insert the bugs such that they evaded review was successful. He they did a vague review. I was one of the reviewers that failed to spot the bugs and in fact none of the reviewers spotted all the bugs and there in one bug escaped notice by all the reviewers and could have gone on to bias an election in the scenario.

A

So the lesson here is that malicious bugs can easily be written to escape detection by review, even under the best of conditions and we've always known that malicious bugs can be written to escape detection purely by black box testing.

A

The interesting thing is that it seems much harder to write malicious, bugs that evade detection when review and testing or brought together, and that that's harder than it seems like it would be just from the difficulty created by reviewer testing considered by themselves. Why is this? Well, let's take a look at a at a very oversimplified example.

A

This code has an off by one bug, it's notoriously hard to spot off by one bugs in other people's code. So perhaps this looks fine and passes review. However, an off by one bud like this fails when test it against zero and one, so it will quickly be spotted even by a small amount of testing.

A

Alternatively, this code is written to evade detection under exhaustive testing, and the reason is that exhausted blackbox testing will never hit the condition that triggers the trapdoor. However, the things you need to do to evade testing when subject to careful review, look weird. They look weirder than the bugs that ping was able to successfully hide so Vic. So the hope is that the characteristics that make a bug able to evade detection by testing make it more obvious to review and vice versa.

A

Another example bringing these same two elements together, we saw in all far erling sins a keynote from Monday morning where, by combining static analysis and massive data gathering from the program as running in the world, is able to provide better security than you might think from the strength of static analysis or data gathering considered by themselves.

A

So, what's going on here, well, a tool when employed just by itself, stressed to solve the entire problem, sometimes not always depends on. The particulars hits a a cost curve that blows up. We can think of this as the price of perfection and backing off from the perfect solution. We can consider to be a compromise, but by artfully combining compromises on several different dimensions, we can still create an a significant degree of aggregate strength while staying away from the prohibitive parts of the cost curve.

A

But this only works well, if our combinations are cross bracing, if the strengths of one compensate for the weaknesses of the other, if they have correlated weaknesses, the combination is still stronger, but it's less stronger than you might have hoped for.

A

Package delivery illustrates the weakness of trying to solve these problems using only incentives over here. The box represents the space of all possible agent actions. Every point within the box is something the agent might do when I walk up to the counter with the package I want delivered. Oh I'm, sorry, the green circle are all of the agent actions which would be to my benefit.

A

When I walk up to the counter with the package to be delivered of all the things that the clerk might do, that are to my benefit, the ones that I have in mind are for him to deliver the package. That's what I'm prepared to pay him for, so if he delivers the package and I pay him, then he wins and I win. Those are the points that are of benefit to both, of course, there's various things he could do. That would cause me to suffer.

A

He might damage or lose or steal the package now bringing in the allowed dimension. If the arete, the agent is simply allowed to do all of these things, then it's very hard to solve this just with incentives, because the agent will always derive some benefit from stealing the package and, if he's deriving benefit from actions that hurt me I'm in trouble.

A

Now we could try to solve this in the pure computers gets a pure computer security style solution by taking the least of least authority seriously of saying that the only action that the agent is allowed to do is deliver the package that if the agent damages or losses or steals the package, he's doing something illegal and that those are prohibited actions.

A

The problem is that, if damaging or losing the package is illegal, it's not practical to run a package delivery service, and if anyone tries to run a package delivery service under those conditions, they have to charge astronomical prices.

A

So fortunately we can recognize that the agent doesn't particularly benefit or get hurt if he damages or losses. My package it affects me a lot. It doesn't affect him very much.

A

So, instead we can provide the agent narrow authority, but not literally least authority, where the only thing that we really need to prohibit is the area of danger where harm to me coincides with benefit to him and an honest package. Delivery service would not find this prohibition to be a burden, and then the remaining problem having taken that area of danger off the table, the remaining problem of inducing the agent to deliver my package is one that can be dealt with by incentives.

A

In language based software security, we face a logically identical landscape in the safe plugging problem. A malicious plug-in might try to attack the framework and we can divide the attacks as usual between integrity attacks and availability attacks. Now we need to recognize that most of the reasons why a plug-in might be interested in attacking the framework art attacks on integrity.

A

Attack salon, availability, um there's rarely any reason to engage in those attacks.

A

Unfortunately, attacks on integrity. We can defend against by employing safe language techniques like truly encapsulated objects, attacks alone, availability can consist of things like the plug-in just goes into an infinite loop when it's invoked wedging the framework. Now it can do that, but it has very little reason to and.

A

Safe language techniques have a tremendous problem trying to defend against that. It's very difficult for safe language techniques to provide good defenses against those kinds of attacks.

A

But there will always be some weird attacker that decides it's in their interest to deny service, whether for bragging rights for fun or whatever. So this picture only makes sense when, coupled with at least a little bit of selectivity, a a plugin that repeatedly simply wedges its framework will stop being plugged in.

A

So when we look at how these elements complement each other, we don't just want to look at them individually and we don't just want to look at them in pairs like the blue struts. Here, we need to understand that the overall context is one of many structural members, the the legs of this piece of furniture as well, and we need to take a look at the overall structural integrity and what the the overall weaknesses and strengths are.

A

As these things play together, let's talk about the division of labor, which is really the division and composition of knowledge, and this brings us back to API design the issue of a principle explaining to the agent what they want the agent to do.

A

These explanations can be at the more informal end of the spectrum. The parser might just open code, the push in terms of an underlining array or if I know that you're going to be seeing my dad tomorrow. I might just hand you the box and say: could you drop this off with my dad when you see them.

B

A

We might have more formally specified more work, more explicitly articulated abstractions that sit between the principal and the agent like stack and package delivery, and the big advantage of these articulated abstractions is they enable a multiplicity on each side. They abstract over the multiple ways in which an agent might implement this interface and and they simultaneously abstract over the multiple reasons why a principal might want to in get to em boy such an abstraction, and it's when it abstracts from both sides that we call it an abstraction boundary.

A

Such an abstraction boundary does a good job at giving us the information hiding that that part is recommended. It shields both principal and agent, from from deed from needing to have detailed knowledge of each other.

A

When we examined various scenarios and set the sliders to the settings that we think are representative of them, we find that some interesting patterns emerge and some of these patterns crosscut the differences between humans and objects, so over here on the Left column, are those scenarios in the human world, the right column or the scenarios in the object world.

A

The scenarios on the first world on the first row depend heavily on gating, they're generally organized such that the dominant shape of the network is a hierarchy because they they depend so heavily on gaining they're, rather trusting within that system.

A

The arrangements are mostly informal and being informal and one-to-one. We would say these arrangements are very concrete. On the other hand, we have the package delivery service or, more generally, businesses offering services on the market in which there's open entry. The dominant network structure is a decentralized Network because of the open entry, principles and agents have to be much more. Wary have to be much more defensive on their own against the possible misbehavior of they're dealing with, and to enable this multiplicity that we have in market.

A

These relationships have to be on the more specified side and we find that the security boundaries that we engineer in the safe language work have many likely safe. Plugin boundary have many of the same characteristics.

A

So, in summary, we already exist in a world of mixed networks of humans and objects, all making requests at each other, and these these networks are changing and shifting rapidly, and we need to be able to reason over over the whole network, independent of which role is currently held by a human or object, because those roles will shift the.

A

Information hiding that Parnis recommended that we so value in computer science that we seek by dividing knowledge is, is also the information hiding that creates various hazards. um It's worth those hazards, but we should recognize the hazards and these hazards are studied in economics, uh but with their focus being intentional. This behavior.

A

The structure of cross bracing and the way in which these things support or do not support each other is worthy of study and as we design languages and systems and tools, these compositions of compromises are also worthy of support. We should have the larger picture in mind as the system we're trying to support.

A

And we need to better understand the emergent properties of networks of such relationships when things go right and when things go wrong, when things go right, we want to maximize the benefits they come from cooperation and when things go wrong, we want to limit the damage.

B

Thank you for the talk. I was even though somehow deep down I knew that we are operating at different levels and different layers. The the way that we explain to us how these layers are related to each other and how combined they can bring a better effect that was really illuminating and the question that I have is and in your you're in what two shoulders you are talking of the objects being one system, and but there are many systems and they leak stuff to each other.

B

For instance, the package delivery I will give them the address of my dad, and then they can use leak them to somebody and use them. Can your can this be also described in your picture.

A

Well, yes, partially in a negative manner, uh which is that we do interact with agents desiring them to keep secrets companies have in the a grands. They want their employees to keep lots of secrets, um and we use various means to try to gain confidence that the secrets will be kept.

A

We do a lot of upfront vetting, which you can consider to be both partially selection and partially vetting. We try to do monitoring to see if the secrets are leaked but negative result. There is of all the things that the of all the misbehavior is the agent might engage in that are hard for the principal to know about leaking secrets is one of the hardest. It's one of them. The easiest kinds of misbehavior to evade detection by monitoring so leaning, more heavily on the inspection when you're on an automated system um makes sense among people.

A

You can't inspect them by doing a formal proof that they were correctly so obey in India, I.

C

Have a question.

C

So the one of the reasons the package delivery system works is because the person I give money to has their own life. They want to spend the money on movies and so on, and so there's an actual hard currency that trades hands and they have an incentive to have more of that currency. That's our assumption. You know computational systems, you sort of stopped talking about that part of it when you got to the computational system, because what do we do give them cycles?

C

We get a memory and network connection, they have no real incentives or interest, they don't have a life. So how would you change that? How would you make the computational system have more of a life? There's one possible ways to feed that back into the static world right. You could imagine github every time you use a library call that gets reflected back and github and somebody's for goes up, and now that agent has an incentive. So tell me more about how to give a life to computational agents.

A

So Eric cracks are not back in the late 80s I did some work on computational markets, and this was specifically focused on bringing the incentive aspect of markets the running prices into into the software system, and the motivation was still not to induce objects to do the right thing, but rather to provide price information.

A

That would be a decentralized signal to guide objects to making trade-offs that serve the system's old guy decentralized resource allocation crafts, so memory space, processor time that were bandwidth were all allocated through running auctions. We came up with forms of auction that can be run very efficiently, ah and the idea is that an individual agent sees some prices at the moment for these various resources, and only it knows how it can trade, those resources off and by trading those resources off.

A

It then affects the prices which propagates to the to the trade-offs made by other agents. Now, when I did this work, I fully expected that prices to guide decisions would become part of software systems. Well, before software systems grew to the size, they are now and that's part of why, in sort of revisiting this connection with economics, I've been drawn to say: well, okay, if we de-emphasize the incentives and focus on the rest of it, there's still a lot of useful connection with economics.

A

There's a lot of useful things to borrow, and it's actually much more informative of what we actually do. Yeah um is it. Would it still be useful to import prices into computation? Yes to do so? Well, you actually do need to connect it to real world prices. We didn't experiments at sun labs in the 90s, where we got clearance through their accounting system that their internal accounting dollars were the ones that researchers were spending and earning by running services within the system.

A

C

B

You for the thought: how do you relate the concepts on but do I measure? How do you relate the concept you described in this talk with the current status of age, entering to the programming and multiagent systems? I.

A

Have not clearly it's relevant with agent Orion programming? If I understand it, I thought there's not an area that I've looked into deeply but as I understand it, it's much more oriented around giving the agents goals. Lighting the agents through the planning of how to satisfy those goals, is that would that be a fair characterization. um So, in a case like I said it's not an area of investigator I think it is relevant to this framework.

A