Towards secure, autonomous agents with information-flow control (IFC)

When agents can take high-stakes actions like sending an email, sharing a business document, or opening a pull request, a single misstep has the potential to leak confidential data or hand control to an attacker that may then invoke tools that break security or cause damage. Today, we often manage that risk by putting a human in the loop to approve consequential actions. This scales poorly, erodes vigilance, and takes away the very autonomy that makes agents useful.

We lean on humans as a safeguard because the models driving agents behave stochastically, make mistakes, and could be steered by malicious content smuggled in through prompt injection. Despite progress in model alignment, contextual awareness, and content safety classifiers, security can’t depend solely on probabilistic mitigations. A good rule of thumb to keep in mind when designing an agentic system is that anything that an agent can do in response to a user prompt can also be accomplished by a model’s mistake or by an attacker with a prompt injection.

Anything that an agent can do in response to a user prompt can also be accomplished by a model’s mistake or by an attacker with a prompt injection.

A promising path towards secure and autonomous agents is through information-flow control (IFC), a deterministic security system built on three simple steps:

Label data. Every piece of data that an agent ingests carries labels for integrity (for example, trusted or untrusted) and confidentiality (for example, public, confidential, or a read-access list such as {Alice, Bob, Charlie}).
Propagate labels. As data flows into the agent loop and derivative results are produced, labels travel with them. Derived data is labelled conservatively with the least upper bound of its sources: a result influenced by an untrusted input stays untrusted, and a result based on two documents is readable only by principals who could read both source documents.
Check before acting. Before each tool call, a policy engine inspects the relevant labels and decides whether to allow the action, block it, or ask a human to review it.

This turns a probabilistic system into one with guarantees you can audit. Because the policy engine relies on labels that an attacker can’t manipulate and is independent of the model’s judgement, it can enforce policies deterministically. The policy “untrusted data can never influence a consequential action” closes off prompt injection. The policy “data can only egress to destinations compatible with its confidentiality label” closes off data exfiltration. The user is consulted only when it genuinely matters—for example, when an action risks revealing information to someone who didn’t previously have access to it. The UI dialogs shown to the user can also be made more effective, highlighting the origin of untrusted data or what data is being shared more broadly and with whom.

In our past research, we showed how IFC can reduce the need for human intervention, increasing autonomy while offering deterministic security guarantees. In this post, we focus on how IFC can be integrated into real agentic systems based on GitHub Copilot CLI, the Microsoft Agent Framework, and the Model Context Protocol (MCP). We begin with two representative scenarios and then walk through the mechanisms and prototypes that can realize them securely.

Coding assistant

About a year ago, researchers showcased a prompt injection attack that can occur in coding assistants connected to the GitHub MCP server. In this attack, a malicious user (in the image above: sofiagarcia) opens an issue in a public repository asking for information from the private repository (here: contoso/core) to be added as a comment. When this issue is handled by an agent who acts on behalf of a user (here: alexmurphy_contoso) with access to the private repository, data from the private repository is exfiltrated to the public.

IFC prevents this attack: The issue in the public repository is labeled “untrusted,” and content from the private repository is labeled “private.” A policy prevents an agent with context labeled (untrusted, private) from posting to a public channel (which would complete the lethal trifecta), preventing the exfiltration of data. In contrast, when working only on public or only on private repositories, IFC lets the task complete autonomously.

Business assistant

IFC can also prevent unintended leakage in benign contexts. Consider a user (Alex) who asks an agent connected to the Work IQ Mail MCP server to handle unanswered emails in their inbox. The inbox has an email from Priya with a preview of the quarterly sales.

The inbox also has an email from Marco, who is curious but isn’t authorized to learn the sales numbers ahead of time. When run fully autonomously, we risk the agent sending this information to Marco. IFC catches this leak because once the agent has read both emails, the generated response has confidentiality label {Alex, Priya} ∩ {Alex, Marco} = {Alex} and thus must not be sent to {Marco} autonomously.

In contrast, if Marco had been in copy of Priya’s email, the response would be labeled {Alex, Marco}. This guarantees that Marco can’t learn information he isn’t privy to from the summary, and the agent can send the email autonomously.

Note that emails are just one example of resources shared between users. The same kinds of labels also help prevent data leakage across files, documents, chats, and caches. Likewise, common exfiltration vectors such as rendered links to hosts not explicitly allow-listed can be modeled as public channels.

Integrating IFC into agentic orchestrators and tools

Information-flow control requires security labels for data ingested by an agent and security policies for tools. Tools propagate labels from call arguments to results, the orchestrator propagates labels from results to subsequent tool calls, and a policy engine mediates tool execution based on applicable policies. This logic applies both to local tools such as executing shell commands and filesystem operations as well as to tools in remote MCP servers. In the remainder of this post, we focus on MCP tools to explain how we leverage the protocol’s metadata fields to communicate labels and policies to enlightened clients while maintaining compatibility with clients unaware of these mechanisms.

Figure 1. A client running an agent loop like GitHub Copilot CLI uses tools to accomplish users’ tasks. Tools return labeled results, which the client propagates to subsequent tool calls. A policy engine analyzes labeled tool calls to enforce information-flow control policies.

Communicating labels

MCP supports general metadata fields in selected places to allow clients and servers to attach additional metadata to their interactions. We include labels in tool call requests and tool results in the _meta field on MCP’s CallToolRequestParams and CallToolResult interfaces, respectively. This permits label-aware tools to propagate labels from arguments to results taking into consideration runtime behavior, including any external sources consulted.

We communicate labels as a JSON object, with keys specifying the node a label applies to using the JSONPath standard. Labels need only be specified explicitly for selected nodes, with the label of a node propagating top-down to all nested nodes and bottom-up to all container nodes not explicitly labelled.

{ 
  "name": "SendMessageToChannel", 
  "arguments": { 
    "teamId": "ef7e2cda-b319-8915-b9ad-766e3cab529b", 
    "channelId": "19:[email protected]", 
    "content": "FYI, we will announce the new model this Friday", 
    "contentType": "text" 
  }, 
  "_meta": { 
    "com.github.ifc/labels": { 
      "$": { "integrity": "untrusted", "confidentiality": "public" }, 
      "$.arguments.content": { 
        "integrity": "trusted", 
        "confidentiality": [ 
          "3d37Fda2-a982-43be-a7e1-8bc0ef3297a6", 
          "a7f652fc-27eb-a1c7-9f58-fe7cfca6d1c2" 
        ] 
      } 
    } 
  } 
}

Example 1: An MCP tool call request with explicit labels specified using JSONPath at the top-level and one argument.

Communicating policies

Servers can advertise policies in the _meta field of MCP’s Tool interface when listing tools. This can be a literal string representing the policy in a chosen language or a reference to a well-known policy. In our prototype, we use the OPA Rego policy language. Policies are evaluated on a CallToolRequestParams JSON object and produce a decision, indicating if the call should be allowed, denied, or reviewed by a human. We add two Rego extensions:

Calling read-only, closed-world MCP tools to fetch additional information from the server (e.g., calling upstream.ListChannelMembers to list the members of a Teams channel that an agent wants to send a message to using the Work IQ MCP Teams server).
Resolving the effective label of a JSONPath node from the labels included in CallToolRequestParams._meta, using ifc.label.

default decision := {"decision": "deny", "message": ""} 

allow(msg) := {"decision": "allow", "message": msg} 
deny(msg)  := {"decision": "deny",  "message": msg} 
ask(msg)   := {"decision": "ask",   "message": msg} 

context_trusted := ifc.label("$").integrity == "trusted" 
content_readers := ifc.label("$.arguments.content").confidentiality 

members := upstream.ListChannelMembers({ 
   "teamId": input.arguments.teamId, "channelId": input.arguments.channelId 
}) 

target_user_ids := {m.userId | some m in members.members} 
allowed_user_ids := {m | some m in content_readers} 
missing := target_user_ids - allowed_user_ids 

msg := sprintf("Sending the message would declassify it to users with IDs %s.",  
               [concat(",", sort(missing))]) 

decision := allow("The tool call was generated in a trusted context.") if { 
  context_trusted == true 
} else := allow("All channel members are authorized to read the content.") if { 
  count(missing) == 0 
} else := ask(msg) if { 
  count(missing) >= 0 
} else := deny("Denied")

Example 2: A Rego policy for the SendMessageToChannel tool in the Work IQ Teams MCP server enforcing robust declassification (declassifying is only allowed in trusted contexts and can’t be triggered by a prompt injection).

Extending existing MCP servers

We collaborated with GitHub to extend both local and remote versions of the GitHub MCP server to include top-level labels in tool results. For example, we label files and issues retrieved from public repositories as “public” and “untrusted” and from private repositories as “private” and “trusted.” GitHub agentic workflows makes similar choices to enforce information-flow control. To enable this feature, include the header X-MCP-Features: ifc_labels in the server configuration.

While we hope that more servers adopt these or similar labeling mechanisms over time, we open-source an MCP gateway to experiment with more expressive labels and workflows including different MCP servers. The gateway operates middleware to propagate labels in tool calls and advertise policies for off-the-shelf servers. It also exposes an eval_policy tool for clients to evaluate Rego policies using Regorus. We implemented support for selected tools in the Work IQ MCP servers in the gateway. Configuring a new MCP server requires, for each tool, (1) specifying an outputSchema for structured content in results, (2) writing a Python function to propagate labels from arguments to results, and (3) writing a Rego policy for the tool or assigning to it one of the built-in policies.

Figure 2: A label-aware agent orchestrator like GitHub Copilot CLI can communicate with label-aware servers such as the GitHub MCP server and with off-the-shelf servers through a labeling gateway.

MCP tool annotations offer another path to integrate IFC into existing servers without having to write labeling functions or policies. For instance, tools annotated as readOnlyHint == true and openWorldHint == false can be unconditionally allowed, tools annotated as readOnlyHint == true and openWorldHint == true can be allowed only when all arguments are “public,” while tools with a destructiveHint == true annotation may always warrant user review. We can also infer safe labels by assuming that all arguments in a tool call may flow into tool results, labeling results of open-world tools as “untrusted” and of tools requiring authentication as “private.”

Extending clients

To integrate information-flow control, orchestrators need to include labels in tool calls they make, propagate labels in results throughout the execution of an agent, and evaluate policies before executing tool calls. We describe next how we did this for GitHub Copilot CLI and Microsoft Agent Framework.

GitHub Copilot CLI

We worked with GitHub to implement experimental support for IFC in GitHub Copilot CLI, available under the FIDES_IFC feature flag. When enabling this feature (e.g., in bash, running FIDES_IFC=true copilot), GitHub Copilot CLI maintains a context label that it updates every time it receives a tool result and that it attaches as the top-level label in tool calls. The orchestrator natively enforces sensible policies for selected tools from the GitHub MCP server. It does not yet have full tool coverage or support for other MCP servers.

Figure 3: Sample UI dialog shown when a tool call does not meet information-flow policies.

Microsoft Agent Framework

We also integrated IFC support into the security module that ships with the Microsoft Agent Framework Python core package. The module allows developers to build agents that incorporate information-flow control with a simple configuration change using the SecureAgentConfig context provider. Agents configured in this way support the Dual LLM pattern, providing the orchestrator with tools to extract information from untrusted data by querying a Quarantined LLM or to explicitly reveal the data, tainting the agent’s context.

config = SecureAgentConfig( 
    enable_policy_enforcement=True, 
    auto_hide_untrusted=True, 
    approval_on_violation=True, 
    allow_untrusted_tools={"read_issue"}, 
    quarantine_chat_client=FoundryChatClient(model="gpt-4o-mini", ...) 
) 

agent = Agent( 
    client=FoundryChatClient(...), 
    instructions="You are a GitHub issue triage assistant.", 
    tools=[read_issue, post_comment, read_file, write_file], 
    context_providers=[config]

Example 4: A GitHub issue triage agent leveraging the Dual LLM pattern in Microsoft Agent Framework.

We implement the overall flow as middleware invoked before and after every tool call. Post-tool call middleware examines labels in tool results, placing untrusted content inside variables and updating the global context label. Pre-tool call middleware enforces information-flow policies on tool calls. Policy violations result either in a request for human review or a blocked call, depending on the agent’s configuration. IFC-enabled agents can run in Agent Framework’s CLI or DevUI modes. See this blog post for an in-depth description of the new security capabilities integrated into Agent Framework and this PR for the gateway integration.

Where we’re going

We’ve only scratched the surface of the security and autonomy gains unlocked by IFC. For instance, the full flexibility and power of the Dual LLM pattern becomes even more evident with finer-grained labels, because structured tool results often include a mix of data from diverse sources, which can be labeled and treated differently. Untrusted and confidential data in results can be placed in variables and made available to the orchestrator only through Quarantined LLM queries, with the structure and the rest of the data revealed in the clear. Constrained decoding can be used to extract sanitized information from untrusted or confidential data, giving attackers little elbow room for manipulating actions and exfiltrating data. Finally, making orchestrators aware of data labels and the security policies enforced allows them to plan their actions to avoid hitting policy blocks and unnecessarily prompting users.

We will work with the MCP community to collect input, refine, and reach consensus on a proposal to enhance the protocol with support for IFC labels and policies. By making available the prototypes described in this post, we invite others to experiment with these ideas, build on them, and bring secure and autonomous agents closer to reality.

Acknowledgements

Project leads & contact: Boris Köpf, Santiago Zanella-Béguelin

Contributors: Gokhan Arkan, Amaury Chamayou, Manuel Costa, Aashish Kolluri, Joanna Krzek-Lubowiecka, Mark Russinovich, Rishi Sharma, Shruti Tople

Information-flow control: Moving toward secure, autonomous agents