March 3, 2025

AX: The next evolution in UX

This article was originally published on Agent Experience.

Agents serve as the delegates of end users and are first-class citizens of the digital ecosystem. Their delegated work is an extension of the user’s authority and engagement with a digital service that maintains a mutual exchange of value. Agents then become a medium for engagement with services and the experience that agents have with engaging with a service determines its ability to deliver on end users’ needs. As UX (user experience) focuses on the complete experience users have with a product or service, AX (agent experience) now sits under this discipline of UX study, design, and optimizations needed for UX in the age of AI.

While this article focuses on UX and digital services, incorporation of AX spans further into other consumer journey disciplines such as CX (customer experience), DX (developer experience), etc. and other engagement channels such as voice and phone. e.g. AX will also be a requirement for DX on developer platforms.

It takes a lot to shake up well-established practices, this article explores the concepts that are driving this evolution. This research is meant to be educational and thought-provoking, offering insights into how UX practitioners and others can participate in this area of AI and agentic experiences - even if you’re not building AI within your organization.

What do we mean by agents?

An agent is a user-delegated system (usually backed by LLMs) that autonomously interacts with digital services to complete tasks on behalf of the user while operating within service-defined boundaries. Agents dynamically interpret requests, gather necessary information, execute actions, and return results, enabling users to engage with digital services more efficiently.

Some examples of where agents are used today:

Developers using agents to generate code, provision resources (databases, SaaS, etc.), analyze code, deploy the code on their infrastructure, etc.
Support teams using agents to research internal data, forums data, and product specifications to identify ways to solve novel customer issues before grabbing engineering help.
Home buyers using agents to find different listings that match their needs and inquire about parts that might not meet what they are looking for.
Parents using agents to set up a perfect evening where one of their available babysitters can watch the kids, a restaurant reservation is made at one of their favorite places at the right time, a Lyft has been scheduled, etc.

There are already countless agents that can be leveraged to take care of the bits of work that humans don’t want to do, can’t do, etc. The end user is delegating this work to a system with varying degrees of “agency” to help them be successful at interacting with a product or service.

Agents are user serving

Agents are often miscategorized alongside traditional “bots,” sometimes even as malicious actors. This oversimplification overlooks a critical distinction: agents serve the end users of a digital service. They are not external exploiters but extensions of real users, helping them interact with services as an alternative medium to the users directly using a browser or mobile app.

The rise of AI-driven agents has enabled users to delegate tasks that they previously performed manually. Whether they are developers building with a platform, citizens filing taxes, or customers making purchases, these users remain the same; they are simply engaging through agents instead of direct interactions.

Bots vs. agents

Bots have existed since the early days of computing and now account for nearly 50% of internet traffic. Digital services must decide which bots to support and which to block.

Permitted bots operate within service guidelines and usually provide complementary value to the digital service or end user, such as search engine crawlers or content preview bots that enhance user experience.
Malicious bots extract value without benefiting the service or its users—engaging in activities like data scraping, spam, and fraud. These actors operate outside the mutual value exchange.

Agents are a distinct bot system that is neither inherently good nor malicious. Their defining characteristic is that they are dynamic, user-delegated programs with varying levels of “agency”, capable of completing tasks across multiple services. They do this delegated work while maintaining the mutual exchange of value on behalf of the end user - not circumventing it or exploiting it. Once agents complete their task, they deliver the results to the user who authorized the request.

End users and agents provide mutual value exchange, malicious bots do not.

Crawlers for foundation model data

An important distinction should be made around bots that have caused a lot of noise, legal trouble, and issues in the digital ecosystems. Companies like Anthropic, OpenAI, etc. that build foundational models have also used web scraping bots to crawl the internet and pull information from the internet. These are not agents, nor are they delegated by end users, they are traditional bots in every sense. Depending on the company’s practices, these bots land in different camps of valuable for digital services and bad digital citizens to some. Supporting AX is not endorsing the practices of the crawler bots that these companies leverage.

Adoption is shifting behaviors

While there are a lot of opinions on AI, its abilities, impacts, etc. there is no questioning the fact that people are using these tools. In Feb 2025, OpenAI sees 400 million active weekly users, GitHub Copilot has over a million paid subscribers, and so on. The numbers of users, downloads, etc. are measured in the millions with these tools. That is already sufficient to say end users are changing their habits but there’s no questioning the fact that in 1 year, 5 years, 10 years, these will have become embedded into the muscle memory and established as defaults for the generation of customers and organizations coming up.

Poor AX makes unhappy users

The process of improving AX is the process of determining how easily agents can access, understand, and operate within digital environments to achieve user-defined goals. Therefore, when there is a poor AX (agents don’t have access, unaware of how to user the service, etc.) they are unable to achieve their goals using agents. Even though alternatives may exist, they invested time and energy trying to work through a medium without success. This is friction for the users that prefer agent-based medium of working, are dependent on agents, or otherwise leverage an agent for this task. This friction has resulted in an unhappy user and less total user experience with the digital service.

When this happens, unhappy users run the risk of causing many negative impacts to the digital service:

Immediate lost customer - they decide not to buy/use and won’t put any more effort into it with a given digital service. Agents, where allowed, will also offer or automatically use alternatives to the service.
Increased support burden - if the user expects or requires an agent to work with a digital service, they will create tickets, add forum posts, call the support team, etc.
Decreased preference from users and agents - where users or agents have a choice, they will use the path of least resistance to achieve their goals to the standards they require. Not only is this a potential loss for current opportunity but can instill a long term loss due to preferences.

Agents and users developing preferences for services that support their medium of choice.

AX support is a decision about users

Digital services always retain the choice of supporting agents interacting as delegates for end users. There can be various reasons that they decide to do this. On the surface, this consideration feels like a technical one or one related to the agent system itself but it’s not. Regardless of if a digital service decides to actively provide a good AX or completely block agents, the question that’s being asked is not “Should this service support agents?” it is rather “Should this service support users that use agents?”

If the answer is yes - “Yes, these services will support users that use agents.”

If the answer is no - “No, these services will not support users that use agents.”

If the question isn’t or hasn’t been asked - “We have no idea if we support users that use agents.”

Every way you view the question of AX support, it’s always about the users.

Conclusion: The AX evolution in UX

We’ve established that agents are end user delegated systems that maintain mutual exchange of value, the adoption of agents is already extremely high - becoming a default medium for many, poor AX has immediate impacts on end user happiness where agents are used, and the decision to support a good AX is itself a question about which users is the digital service interested in serving. With this, UX is impacted by all major facets of AX support which drives the need for AX to be a top focus around concerns, opportunities, and optimizations needed for UX in the age of AI.

What will this evolution bring?

User centricity is at the core of UX and AX. Evaluation can then start from the same place as with any other medium… by asking what are the user needs and how can we support them via their preferred medium? With that, most of the tools, patterns, techniques, etc. for assessing good UX, testing it, and studying it look the same.

Where things are going to look different and new skillsets will be needed:

New tools for automated evaluation of AX quality. How do you assess if you’re improving AX in a way that gets you closer to your UX goals and avoids regressing in other areas?
Giving up control and reinforcing influence. AI systems for which agents are backed rely on non-deterministic systems, can use different models, and provide internal context which can be conflicting. Embrace the mindset of shifting from controlling every detail to the influence of agents with mechanisms for providing context and other capabilities being invented to drive agents.
There’s no substitute for using the tools that your end users use. Given the digital service you’re evaluating, these tools can vary and their abilities to use any given service will as well.
Areas that aren’t invented yet such as feedback loops for agents, reliable identifications of models and agent sources, and standards around interaction.
Context and documentation for agents to effectively navigate and use services is not the same as those that we provide humans. Tools like MCP, LLMs.txt, Arazzo, and others will become standard parts of the stack like we see with human-oriented documentation and support systems today.

As designers, developers, builders, product owners, business leaders, etc. who care about our end users and how they can achieve the best outcomes from our offerings we invest heavily into UX (or DX, CX, or other user persona focused experiences). It’s time to start asking about how our digital services are going to support users using agents to interact with your digital service.

Get involved in the open conversations on agentexperience.ax if you’re interested in contributing to these conversations or helping invent the future of AX + UX.

February 4, 2025

Navigating the Future: AX, the Agent Web, and Its Interface

This post was originally posted on Agent Experience.

As we look at the opportunities that great agent experiences (AX) can offer agents, end users, and services, we need to confront some realities about how the status quo needs to change.

The current form of the web is built to support a user experience with the intent that humans are directly consuming the experiences. Agents working on behalf of humans breaks this expectation and have different needs and requirements for a successful AX. In order for website and agents to coexist within a mutually beneficial web, we have to provide a different view of the web that provides optimal AX and user experiences without the negative impacts on websites today. Without applying these practices, businesses will struggle to retain relevance in this new medium and websites will continue to take the bulk of the burden as agents are built around using a less optimal AX. As the builders of this future, we can identify a better path for the agent web interaction.

Purposeful views for better experiences

The web as we know it, experienced through web browser systems, was built for direct human consumption. In the future, we will have to have a web that is focused on supporting direct agent consumption. The two will live alongside one another as a purpose-built mirrors of each other. It will not be a bifurcated, separate web but will certainly be a separate “view” of the web with different interfaces for its consumption.

By “agents” I am referring to the systems that are powering the future of accessing and engaging with websites and services in the age of AI, natural language interactions, agentic workflows working on behalf of humans, generative experiences, etc. What we see with Claude.ai, ChatGPT, Bolt.new, Cursor, etc. today, there will be more varied, powerful, diverse, niche, and purpose-built tools for consuming this content. Up until now, the primary systems developers built for on the web were browsers, the few search engines, and email clients. Now there will be countless agents of many types that will benefit from access to the web in a way that is easier to consume and this is a great thing for the open web.

The web for humans is visual, engaging, and focused on cues that are relevant for humans to consume directly. It cares about human accessibility, readability, engagement, etc. Websites attempt to maximize those concerns as doing so maximizes the ability to provide/capture value to/from our audiences in a way that aligns with our goals. This web is the point of contact with humans directly.

The web for agents has no concern around visuals and engaging interaction (unless that is the purpose of the information shared). It uses practical verbosity, agent relevant information structures, and is not concerned with the ability to be consumed by humans directly. This agent web is the point of contact for other agentic systems which will directly or indirectly have other points of contact to service human interactions.

The purpose for this agent web is to allow agent systems to have a purpose-built way to access the latest information (in real time in some cases) and engage with web services in a way that mutually benefits websites and the tools themselves.

An intentional approach to supporting these two different types of personas (humans and agents) will provide a better experience for humans, agents will become more capable, costs for website owners and vendors will go down, and more developers will be able to create agents that work in tandem with the web we know today. This change will also be a catalyst for further advancements that have been traditionally very hard to achieve such as ushering in a better future for accessibility to web content for humans.

The intentional approach of having a great experience for humans on the web is a focus on “user experience (UX).” The parallel to that, is an intentional approach of having a great experience for agents which is known as “agent experience (AX).” While AX practices will have more to them, this view of the web for agents is going to be critical to providing a good AX while removing the outsized burden of agents have had on websites.

Issues with the status quo

By using the human/browser focused web, agents are causing many challenges for themselves and others.

Status quo of agent consumption on the web

Websites pay a cost for every resource requested and byte that’s transferred. By using the human web, these agents are driving up costs for resources that it will not benefit from. E.g. if a page has 15 resources that make up the human experience, 1 maybe 2 of those resources are usually useful to the agent doing its job. But the website owner will be paying for all 15 to be delivered. At scale, this adds up to cost site owners a lot of money and wastes resources for all parties involved.
Websites that have not created experiences, documentation, etc. that are optimal for agent consumption, they will not exist or are doomed to provide a subpar experience through this agent medium. This is where AX comes in and where optimizing for AX is working to deliver an agent web view of the sites and services offered.
Agents have to identify, filter, ignore, or implement some mechanisms to take the information that’s given via the human view of the web and prepare it to be agent ready. This means tools incur more processing costs, slower performance, higher likelihood of errors/hallucinations, etc. Not processing this data before consuming it into LLMs can cause much higher LLM costs due to token usage taken by unnecessary content.
Observability and analytics are focused on consumption based metrics that make assumptions about the the effective delivery telling you information related to the throughput, lifespan of users visiting sites, dropoffs, etc. With agents sitting in the middle, the consumption model is entirely different and our metrics need to change. The continued use of the human oriented view of the web by agents skews the metrics and analytics of these systems which impacts businesses in many ways from strategy to costs.
Services and platforms typically create two sets of APIs - a private one used by clients they control (the website or app) and another for allowed usage by external integrators. Often the externalized API is a subset of what the private API is. This isn’t without reason, often these APIs are meant to manage important “flows” that the provider intentionally does not want to support others doing. Regardless of the validity of that concern, this means that the tasks that agents would most likely need to perform require these APIs that are intentionally undocumented or obscured for external use. In Zdenek “Z” Nemec’s talk “APIs for AI: Have we Failed?”, speaks very effectively about these challenges.
Authentication and authorization remain challenging problems - adding the fact that agents can asynchronously require access on behalf of users introduces interesting new flows. Today, these are solved one of two ways - injecting an oAuth flow or the agent system stores credentials on behalf of the user to use as it needs to authenticate. Both have many trade offs and concerns for the end user experience. If we add the fact that you might have many concurrent agents working on behalf of a user, these authentication flows become more challenging.

Why is this important to websites?

Continuing to be relevant will require it. In the early web, it became clear that if you weren’t on the web, you were irrelevant. Then a few search engines became the way people found sites and thus if you were not able to show up in search results or even the top results, you do not exist. With the advent of LLMs, agents, etc. we’ve received a new entry point for the web and solutions. The websites and businesses that continue to show up and are relevant and respond to users through this new medium, they will be the ones that are optimizing AX and supporting agent consumption of the web.

Becoming or remaining preferred requires a good AX. We’ve all had the experience where we’ve been able to find a website but it just didn’t work right, there was unexpected friction, or you’ve felt lost. Then we find a tool that “just works.” After that, we never look back to these tools that didn’t show up to meet us where we are. As companies and web creators start to ensure their site delivers exceptional AX, they will start out performing their competitors through this medium, developing a user bias to prefer their tools. If they aren’t you can expect their competitors will.

Delivering purpose built way for agents to consume optimal content through more appropriate delivery mechanisms, websites can reduce the costs it takes to support agents consuming content on their sites.

Why can’t the “human” view support the “agent” view?

It’s not impossible but it seems, at best, impractical to do this. Websites are already really complex, bloated, and nuanced to ship modern web in a way that’s accessible, useful, and engaging to humans. Trust me, I wish that were not the reality and I am thankful for modern frameworks like Astro that push for a better state.

In an ideal world, websites are simple to parse, semantic, and with clear action intent. The reality is very far from this. Even with content only websites, they are more often than not plagued with massive amounts of ads, imagery, tracking, delayed interactions, etc. that the HTML on the page hides important things from agents or puts unimportant things in the way which make LLMs and agents stumble. Sites that are considered “web apps” are at least an order of magnitude worse on most measures by their nature.

There’s also the assumption being made in this that the content and structuring needed by humans is the same as what agents will benefit from. I believe that the status quo of making due with what we have has convinced us that this is sufficient for agents but we have evidence (to be published in future reports) that shows how this is not the case.

Given the reality of what we have today and the needs of agents, forcing different code paths and exceptions into a shared presentation layer for these two personas is asking for significant complexity and fragile systems. Not impossible, but I’m not seeing a graceful path where these two persona target’s content should mesh into one layer alone.

That said, the concept of an agent web is not a specific solution, it’s a claim that we will have purpose built views of the web that will encompass many mechanisms and a range of support that will be delivered to solve for this view. The human view of the web will remain vital and will stay as the fallback opportunity for any site that’s not supporting an agent specific view of the web. The need for us to prioritize improved views of the webs for humans will not go away and this will absolutely benefit the agent view of the web.

An important note, some agent systems have built features to embed or use browsers directly to give them access to working with web services of allowing agents to interact with a website through an automated browser. I understand the attempt and it’s an obvious initial step - anyone building integrations over the years knows they wish they could go about it in this way which has been possible for a long time. However, I believe this is the wrong way. These features incur the same issues and costs with added latency and likelihood of failure. Businesses measuring the impact/usability/accessibility of their site with customers will have all new ways for analytics to be skewed, bounce rates will be wilder than ever, and so on. These closed systems will be proxying non-deterministic flows and the website owners will be left to figure out how to help support customers that run into issues within these delegated experiences. Frankly, if this were a good idea, we would have used this mechanism as a solution for building APIs instead of building APIs to work with other sites.

Aren’t APIs the “agent web”?

Yes and no. APIs represent the mechanisms for systems to talk to one another. So in the sense that agents are systems speaking to other systems (websites), they will be using APIs along with other layers. The agent web is a conceptual view that will emerge as we develop new an creative ways to provide better AX on the web. With that, we’d have to have full expectations that traditional APIs will certainly come up in the methodologies of surfacing an agent view of the web.

However, with natural language processing (NLP) and LLMs, we can now satisfy queries from agents with natural language and unstructured content allowing us to reimagine what we think of as working with APIs and APIs themselves. It’s unlikely that APIs alone will provide sufficient context to agent systems to respond flexibly to the demands of the new wave of agents consuming this content. So while APIs are used to directly interact and query systems, the agent view of the web will likely serve as a superset of APIs a site has today, abstractions to established APIs, and so forth.

The agent web’s interface

So what might this look like? We don’t know yet but I have some ideas. Ultimately, I believe we must lean on established patterns to offer a web compatible approach.

A view of an optimal agent interaction and good AX

Progressive enhancements via content negotiation

When a agent requests content from a website, it can lay out the various formats it can accept while making those requests. In doing so, it can enumerate the supported optimal versions while still accepting the standard “human” version that would traditionally be sent to a browser. This is in line with an industry standard practice called content negotiation.

As a hypothetical example, when requesting example.com/latest-info-on-specific-topic the agent can send an Accept header that includes text/html, application/llm.txt. By doing so, the website can decide to deliver the HTML version of the site (usually sent and rendered by browsers) but it could also send application/llm.txt which was a view of the same web page that’s optimized for LLM use. The beauty of this approach is that for sites that don’t know anything about providing this application/llm.txt format would simply return the HTML as it always has.

Optimal content types for agent consumption

In addition to the ability to negotiate on how to deliver the optimal content, the agent web should support many different approaches to reduce the need to add more processing for agents and websites. Sending HTML is step one (what we have today), sending LLM text data is better, sending already chunked LLM text is even better, providing the content as an already vectorized embedding for the language models that a agent uses is even better than that. With this pattern, we can see this trend of being able to support the various capabilities.

Standards-based caching

Caching is largely a well understood and solved problem within the browser based mode of development (no, it’s not perfect). However, with the new wave of agents consuming this new view of the web, it will do the agent, the website owners, and everyone in between a great service to leverage proper handling and tracking of resources for the purposes of caching. Without doing so, we run into the problem of out of date information or constantly bombarding websites with redundant requests.

Tried and true methodologies such as ETags and Cache-Control headers are the bare minimum that should be supported in these cases to ensure these systems request only the content needed and can validate they still have the latest information without burdening the website.

Natural language-based APIs

We will start to see the strict structure and rigidity of APIs to become fronted by proxy systems that can take requests in loosely structured, natural language and turn those into the proper API calls that fit into the existing strict API call formats. This will unlikely ever replace the traditional APIs but it will certainly open up new doors for direct Agent usage where the values may or may not match exact schemas defined by the APIs.

For example, you could imagine that an agent helping to book a hotel could ask an API what’s needed to book with them.. it responds with schema information, extra context, and examples. The agent could then gather this information from the user and construct a response. On one hand, this might be sufficient context to generate the original strict schema based request. On the other hand, the agent could create a request with the information needed in natural language. At this point the site’s API will use the request directly or convert the natural language request into the schema and send the request.

Delegated interactivity models

The web has long been challenged by the on-demand nature of serving content and updates to sites - meaning you have to actually visit the site and continue doing so to get the latest information. Websockets and server-sent events helped with this but the long standing connectivity is a burden on both client and server. One of the more viable approaches to mitigating this challenge on the web came up in the world of progressive web apps (PWA). This architecture pattern leverages a file called a service worker that registers events and can be ran in isolation of the website itself allowing any client to run the JavaScript to perform the actions including prefeching, background syncing data, verifying latest information, etc. Taking a page from this architecture pattern, we can imagine how we can solve the problems of offline syncing of context, event driven actions that allow the web app to provide additional context, tool definitions, etc., and a more direct hook into dynamic user flows that they can ensure are compatible with their APIs.

New problems to solve

By having a separate view of the web, we introduce a few new challenges.

An initial key challenge is agent content consistency trust. That is, how can the agent trust that they are being given the content that matches the content of the human consumed site (minus the parts that aren’t needed). To many agents, they might not care but most will. To solve this, we can establish a means to cheaply verify this with the authority in ways that do not incur negative impacts on the site or provider.

Authentication and authorization techniques will require more improvements with the now more common pattern of allowing agents to act on behalf of users but only within the constraints allowed. These problems aren’t particularly new but the options to solve them are non-trivial for most websites. We will have to simplify the ability to adopt better auth practices and avoid the dreadfully easy but dangerous practice of collecting end user passwords. If we’re enabling a world where countless of providers will build agents to take these actions on behalf of users, not solving this problem is asking for the collapse of trust in the agent ecosystem.

Observability will be harder than ever. Understanding traffic, usage, ways to optimize, and so on. We will have to rethink how agents work with the web, not just as consumers of the content but as feedback loops as well.

Delegation of brand expression and voice. If there’s a medium that sits between the end customer and the brand, that medium needs to have a means for sites, services, and creators to have a voice and expression through delegation. This will likely mean solving generative UI properly - not just what some are touting with tool calls to predefined visual components.

And so on… the list of improvements to make sites and agents work well together is not a short one.

What if this doesn’t happen?

The web and agents will continue to be at odds with one another. Websites will lump agents into the same category like they are pesky bots not allowing the website to serve their target audience. Other ways of solving AX will likely emerge but will be to the benefit of the agent providers and less for the website builders and owners. Fragmentation of support will continue the detriment of the ecosystem of agents and access to sites.

At some point there will be a few companies that invest enough into solving the agent consumption problems (not the problems that impact websites website ones - that’s the website’s problem) that they become the defacto, centralized tool because they can do these things and no one else can (that’s not a good thing).

Where do we take this from here?

I have strong conviction that this agent view of the web will surface to allow a new medium to expand where agents and the web can work together to feel like magic to customers and all parties can benefit. With the rise and proliferation of agents across every business category, tools, etc. we have to work to make sure both work well together. This an ambitious vision that we as an industry can go after.

There’s already amazing work happening in this space, for example Model Context Protocol, Open Context, LLM.txt, and others. These are pushing the boundaries of what we could be possible. With AX being a new but well established focus, like that of UX and DX, we will continue to see industry experts come together to solve AX in web architecture that resets the expectations for agents and how the web can meet end customers where they are.

Let’s get to work.

Check out the full post on Agent Experience.

May 25, 2022

Rethinking Server-Timing As A Critical Monitoring Tool

This post was originally posted on Smashing Magazine.

In the world of HTTP Headers, there is one header that I believe deserves more air-time and that is the Server-Timing header. To me, it’s a must-use in any project where real user monitoring (RUM) is being instrumented. To my surprise, web performance monitoring conversations rarely surface Server-Timing or cover a very shallow understanding of its application — despite it being out for many years.

Part of that is due to the perceived limitation that it’s exclusively for tracking time on the server — it can provide so much more value! Let’s rethink how we can leverage this header. In this piece, we will dive deeper to show how Server-Timing headers are so uniquely powerful, show some practical examples by solving challenging monitoring problems with this header, and provoke some creative inspiration by combining this technique with service workers.

Server-Timing is uniquely powerful, because it is the only HTTP Response header that supports setting free-form values for a specific resource and makes them accessible from a JavaScript Browser API separate from the Request/Response references themselves. This allows resource requests, including the HTML document itself, to be enriched with data during its lifecycle, and that information can be inspected for measuring the attributes of that resource!

The only other header that’s close to this capability is the HTTP Set-Cookie / Cookie headers. Unlike Cookie headers, Server-Timing is only on the response for a specific resource where Cookies are sent on requests and responses for all resources after they’re set and unexpired. Having this data bound to a single resource response is preferable, as it prevents ephemeral data about all responses from becoming ambiguous and contributes to a growing collection of cookies sent for remaining resources during a page load.

Check out the full post on Smashing Magazine.

September 13, 2021

How InVision delivers unified user experiences with federated code

This post was originally posted on InVision’s Engineering Medium..

Faster experiences, faster teams, better features, oh my!

Over the last few quarters, we at InVision have completed a big evolution of our web architecture that addressed many technical limitations, web performance issues, and slow iteration velocity. This change took us from an origin-delivered multi-page + multi- single-page-app (SPA) architecture to an edge-delivered and federated SPA experience. In doing so, …

InVision experiences have seen massive performance improvements (30–60% P95 TTFB improvement, 20–40% P95 Initial render improvement, and more) Engineers have a shared framework that propagates improvements as we iterate. Improving velocity for all teams and, for some teams, we’re orders of magnitude faster at propagating changes. New product capabilities have been unblocked to provide next-level product features. In this article, we want to share how we went about solving our core challenges, what our solutions look like, the benefits we captured, and what comes next.

Where we started

A key bit of context to have here is that over the last few years, we rebuilt our full product platform from top to bottom. Within that stream of work, we were pushing to move fast. However, not all teams move at the same pace, which naturally resulted in decisions that left us with siloed solutions. We shipped web features that very much looked like the shape of our org chart; if a particular team owned a sub-feature, it had its own standalone app.

This put us in a state where we had 20+ teams building features and mounting their SPAs under unique HTML paths at the origin with little consistency due to a lack of shared web tech standards. Users felt this disjointed experience when navigating across our features, as they would suffer from a full page reload to fetch a new SPA, hence, removing the primary benefit of the SPA model. This is what we mean by “origin delivered multi-page + multi-single-page-app (SPA)” architecture; this is the worst of both worlds. This pattern is not uncommon within organizations that take on massive overhauls or look to scale quickly — but we can do better.

Given that context, here are the problems we were facing:

Slow performance (Time to first byte and initial render) Reduced capabilities due to navigations between SPAs requiring full page reloads Slow velocity from having to build every new feature from the ground up Heavy page weight due to a lack of code sharing Although the challenges outweighed the benefits, it is still worth calling out that a benefit of this approach was high team autonomy to build, test, and deploy in ways that worked best for them.

The reality was that our new web experience was not just suffering from poor web architecture, but also from being a collection of isolated designs which did not fit well together. At InVision, we obsess over providing great experiences to users so this was very important for us to change.

The next few sections describe how we identified these issues, created an architectural plan, and followed it through it to turn this reality around.

The new architecture

We took a critical look at every facet of this problem and the path to deliver a better user experience. Although our list was exhaustive, we will not cover every option that we considered here. Instead, we will call out those that were important to our organization.

Team Autonomy: As a fully remote organization, we highly benefit from team autonomy, but complete autonomy was a mistake. We had to find the right balance so that it allowed teams to focus on providing their unique value.

Iterative: We were already in the midst of a platform and product rebuild and did not want to stop to rewrite what was already available. The new architecture was to enable gradual adoption and support iteration.

Shared platform: Subscribing to the “build once, use many” philosophy, shared problems like HTML delivery, CDN, caching, etc. should be solved once. Any technology that spanned multiple features should be added to the shared foundational layers and not repeatedly added to individual features. This is also how we established best practices by default.

Given our current architecture, the above core considerations, and the mountain of research that we conducted, we decided to build a true Single Page App that composed our experiences using a feature federation strategy. Our tenants were to provide our teams with autonomy, with better boundaries, and to deliver a performant, unified, and seamless user navigation experience.

What that looks like (very high-level):

Layers of InVision Web Arch — top layer: data and user assets, second layer: features, third layer: App Shell, fourth layer: UI Gateway and Global Static Pipeline

Layers of InVision Web Arch — top layer: data and user assets, second layer: features, third layer: App Shell, fourth layer: UI Gateway and Global Static Pipeline Layers of the Web Architecture Each layer maintains a separation of concerns and does not worry about the layer above

Consistent Web Artifacts (GSP). Every feature leverages a shared build process to enforce common rules and to publish the immutable artifacts and assets to our CDN layer called the Global Static Pipeline (GSP).
Each team builds, tests, and deploys its features via the GSP.
The build step generates a manifest that describes a feature. Most importantly, it identifies the critical path files needed to load a feature, i.e. the main JS code and the initial CSS. This manifest is used throughout the other layers.
The deployment step aggregates these manifests per environment and promotes them. Serving HTML (UI Gateway). All of our HTML is delivered via globally distributed Cloudflare Workers that we call the UI Gateway. This gateway is in charge of collating the routes for each feature (aka a “feature configuration”) and mapping them to the manifest that was generated previously in the GSP. Once built, the gateway delivers the HTML to the browser which launches the next step.
Rendering Features (App Shell). Using the feature configurations that detail route/navigation ownership and the manifests that detail how to load any of the features, our App Shell JavaScript client library is in charge of managing navigation across features and leveraging the details from the manifest to load the features.
Features Experiences (Client Features). Our teams build features that only worry about the experience they need to deliver. They do not worry about how the experience reaches users. They focus on loading their app-specific data and presenting users with their assets and experiences.

Our teams develop their piece of the overall web experience, deploy it when they are ready, and those changes are delivered to users alongside the other features owned/deployed by other teams. The architecture has a geo-distributed server component that aggregates our feature configs and manifests and delivers them to users as HTML. That HTML starts our App Shell client library which mounts the current feature and watches for URL changes as users navigate across different features. As users navigate, the App Shell determines which features should be mounted or unmounted and provides a smooth transition experience between them.

This new architecture allows us to lean into developer autonomy with better boundaries that supports team focus and innovation on their distinct features experiences — not the common layers that are previously solved problems. We pushed the generation of experiences to the edge leveraging Cloudflare Workers providing fast delivery times of initial HTML. Then with App Shell’s management of feature lifecycles, users do not experience full page reloads after their first landing. On the same note, maintaining the same page memory across features, we can provide shared resources across feature boundaries which allows us to provide a more connected experience.

Diving Deeper

Here is how all of the technical components come together.

InVision build architecture diagram

Feature Setup

This is the “step 0” part of registering a federated feature within the architecture. The UI Gateway and App Shell code repositories maintain a list of features and some lightweight configurations about those features. This configuration describes critical information like the name of the feature, the CDN namespace where we can find its manifest (described below), and what routes the feature owns. This information allows the UI Gateway and App Shell to know what to do or where to go to get the information it needs to serve a request.

Feature Build

Our teams leverage a shared build step that enforces rules and processes on the build artifacts. The most important of which is enforcing file immutability, standardizing the CDN usage, and providing a new feature-specific manifest file. This build → CDN pipeline is internally called the Global Static Pipeline (GSP).

By enforcing immutability, we can make safe assumptions downstream such as making all static assets cacheable for very long periods of time and provide safe invalidation at the CDN or Service Worker layers. This shared build step also provides a foundation for us to improve our build tooling across many features in a single location. For example, we rolled out page weight budgets, branch-based synthetic perf testing, and the generation of branch preview links all in one place.

The manifest is a very simple JSON blob that annotates key runtime information — most important is the criticalPathFiles field which is the basis for our feature federation. It represents the HTML, JS, and CSS files that need to be loaded and in the order they are needed to ensure dependencies are met. Because of this, the App Shell or any program can take a feature manifest and have the context for how to load it correctly.

Feature Deployment

Separate from this specific architecture, our infrastructure team provides generic deployment tools and a release pipeline. This pipeline promotes resources to different tiers like testing, preview, multi-tenant, etc. We integrated a new interface behind this deployment pipeline that validates a feature’s readiness to deploy, then promotes the manifest generated in their build phase to the appropriate tier. All features have validated manifests that are promoted to the tier for which they are deploying.

When our infrastructure initiates a deployment, it completes the promotion of the manifest to the next tier then triggers the UI Gateway to synchronize all of the feature manifests for all of the tiers. That synchronization is simply to take all of the configured features, look up their manifests for each tier based on a standardized naming convention along with a CDN namespace and store them in fast access locations. This is the federation piece — features are built and deployed separately, then we synchronize the server and client components to have the latest information for that tier.

Feature Delivery

We centralized the ownership of the HTML to geographically distributed Cloudflare Workers that we call the UI Gateway. The two most important jobs of the UI Gateway are to aggregate the manifests for each feature and to provide the HTML to users.

For the manifest aggregation, the UI Gateway has several layers of redundancy and optimizations to make sure all features are up to date for each tier. As mentioned in the Feature Deployment section, feature manifest synchronization is asynchronously triggered after every release. But if the UI Gateway finds itself in a situation where it is missing manifests or they are not valid for some reason, it takes the feature configurations and looks up the manifests directly from the CDN namespace + tier for that feature. Leveraging the asynchronously synchronized manifests collection is extremely fast (sub-millisecond) but these redundancies are in place to make sure we can always load manifests on demand for anomalous situations.

The base HTML includes our App Shell client library, feature configurations (the routes they own, name, etc.), and the manifest for each feature. With these two things, the App Shell can completely manage the lifecycle of all of our features as the user navigates across the product. With this system on Cloudflare Workers, we saw a 30–60% improvement on time to first byte metrics compared to when individual features provided their own HTML delivery at the origin.

Because deployments are immutable and delivery is managed by the manifest promotion, our deployment times are extremely fast.

Feature Orchestration

App Shell is a client library orchestrator that is in charge of the common client needs and management of feature-to-feature navigation and interactions. Once the HTML is delivered, App Shell bootstraps itself, determines the current feature to mount, puts the feature’s critical path files on the page, and asks the feature to run its initialization process. As users navigate across different pages, the App Shell will detect navigations to routes owned by other features and instructs the current feature to run its unmount processes then proceeds to mount the next one (if the critical files for the next feature are not already there). This process repeats with each navigation away from a page.

The App Shell, in essence, ties features together to achieve the Single Page App user experience. Because it manages feature life cycles, it can also provide shared resources such as libraries, common sub-features, etc. which reduces page weight and memory consumption for common tools.

What about…

Memory Consumption

With continuing to load more and more features on a page, a valid concern pops up around memory consumption. In theory, this is a very real issue but, in reality, our users generally don’t use our product in that way. They generally have a “job to be done”, and they use our product to achieve that job and leave. The only segment of the population that visits every feature of our product within one page load are those that are building our products. To ensure this does not bite us, we introduce a full page load after so many feature-to-feature navigations to be sure. This number is low enough to avoid memory aggregation concerns but large enough to account for 99% of users having a consistently seamless experience.

Updating Features

If we no longer have full page reloads, how do we ensure users are using the latest features? Similar to the answer in the section titled Memory Consumption, we add logic to trigger a full page load after so many navigations (which triggers a fetch of the latest manifest). This has been successful enough that we haven’t had to add auto-update detection. But auto-update is another approach we have on the roadmap.

Webpack Federated Modules

Webpack Federated Modules were in their very early days when we built our new architecture (this post is reflecting on our work after complete propagation of the arch + iterations). Since then, this technology has come a long way to maturity and it serves as validation to see that its approach is very similar to our federation approach. The basic principles are the same, but with our architecture, we have the ability to be more deeply integrated up and down the stack to improve the performance of federating features that we cannot achieve with Webpack Federated Modules. That said, we will continue supporting that initiative and keep an eye on opportunities where it makes sense to leverage it. The work they are doing to provide module federation on the server is especially interesting.

Authenticating HTML Delivery

When delivering HTML from the edge, “when do you authenticate the user?” Typically, the pattern web servers follow is to receive an HTML request, check for a user session, and redirect to the login page if they are unauthenticated. This verification step usually takes place at an origin server and is not free from a performance perspective. This check at the origin is common even for public sites. Furthermore, web apps that dynamically load data via API requests also validate a user session again after this initial HTML auth check.

When reviewing that full flow, it is apparent that there are redundant auth checks and that this flow is optimized for an unauthenticated user redirection experience. That accommodation for a less common unauthenticated user scenario incurs a cost for every user. Therefore, we want to optimize for the authenticated user case. To do that, our UI Gateway delivers the HTML optimistically and does not check for authentication, letting a downstream data fetch request made by the features handle detection and redirection if a user is unauthenticated. Our features do this as part of the baseline security measures and do not rely on a previous step to have done this for them. Given our HTML is never sensitive (it contains no user/company data), there is no reason to go all the way to our origin for authentication to download common HTML.

Feature Development

When leveraging Cloudflare Workers as the delivery mechanism, how do you handle development when a Cloudflare Worker is not available in a “local” environment? The web architecture offers two options to make front-end development fast and easy for developers:

The primary option is to do feature development on special tiers where all of the infrastructure is hosted similar to Production and with Cloudflare Workers available. In this tier, we have a “developer mode” where the UI Gateway provides all of the manifests like it normally would but it also instructs the App Shell to load this feature’s manifest from a locally hosted location rather than the CDN. This local manifest points to critical path files that are on a local machine. The critical path files in this developer case can be the output from local bundles and live-reloading steps as feature development happens. Features only have to have the features they are actively working on developed locally, everything else is what would be available in production.

The secondary option is to develop in a local environment where all of the infrastructure is recreated but Cloudflare Workers are not in front of every request. In this situation, we want to have the UI Gateway function the same as the former option (providing all manifests and requesting App Shell to fetch the local development manifests). To accomplish this, we provide a local proxy service that handles all incoming requests and calls a local UI Gateway for HTML requests. This UI Gateway logic is a locally hosted Cloudflare Dev server that emulates the Cloudflare Worker runtime.

Deploying tightly coupled feature versions

Generally speaking, it’s good practice to avoid having two features be tightly coupled to the point that they have to go out at the same exact time. But when that has to happen, how do we solve that when our features are federated?

One option is to try to deploy them at the “same time” which almost never works out at scale and some users will run into issues. We have a more deterministic option.

We solve this by letting features “pin” the SHA version of the manifest that they want to be deployed across all tiers. This is set in the configuration that UI Gateway owns. So two features can specify that they need to be deployed as these exact versions. Once all tiers have received that version, the pinned versions can be removed and future deploys go back to not having tight couplings.

Where are we taking this next?

We have a lot of great opportunities to leverage this architecture to provide additional capabilities across all of our features

The architecture packages up our features in a way that directly fits into the Progressive Web App approach (PWA). We plan to continue executing on that path to provide a PWA this year.
Now that we have control over HTML generation at the edge and a lot of observability into our architecture patterns, we can start to make intelligent prefetch decisions to optimize performance for the next user action.
We want to bring better development and testing for teams that are not developing in isolation and want to have a shared location for working and testing with multiple teams at the same time.

Wrapping up

This architecture has enabled us to reduce significant technical debt. It provides us with autonomy that had healthy bounded context. It has sped up individual teams feature iteration velocity by an order of magnitude. It opened the door to new product capabilities. By solving shared problems in shared platform layers, we are iterating on a foundational system that scales improvements that benefit all teams. This new architecture has also massively improved the user experience both from a performance perspective but also from a holistic experience perspective.

We are very fortunate to have some amazing people building this platform. The engineering team is an amazing group of people doing great work and I’m excited to see what we achieve next!