Who will build new search engines for new personal AI agents? (Interconnected)

Who will build new search engines for new personal AI agents?

15.23, Wednesday 20 Mar 2024 Link to this post

Short version:

We’re going to end up running personal AI agents to do tasks for us.
The actual issue here is how exactly to make them personalised. Ultimately this is where new search engines come in, as I’ll explain.

For instance you’ll say, hey go book me a place for drinks tonight in Peckham, it needs to have food and not be insanely busy.

And the agent will go away and browse the web, checking that spots are open/not insanely busy/in accordance with my previous preferences, and then come back to me with a shortlist. Option 1, I’ll say, and the agent will book it.

I’m leaving “you’ll say” deliberately vague. It might be via your phone, or your AirPods, or your weird new comms badge, or a novel hardware handheld. You interact somehow.

I’ve been hacking on agents this week and omg I have a lot of opinions haha.

Decent agents have been technically feasible for approx 1 year

Technically, the story above isn’t too hard. Let me summarise how to build something like this…

The definition of an “agent” is an autonomous AI that has access to “tools”. A tool is something like a web browser, or a calculator, or integration with a booking system, anything with an API (a machine interface).

Then you know the way that ChatGPT has a turn-taking interaction, human then AI, human then AI, etc? Agents are different. You give the AI a goal, then you tell it to choose for itself which tool to use to get it closer to its goal…

…and then you run it again, in a loop, automatically, until the AI says that it’s done.

So with our toy example above, the loops might look something like:

The AI “thinks” to get closer to my goal, I need a list of bars and restaurants in Peckham. The web search tool will do for that – so now it has a list of places, and we loop again
The AI “thinks” to get closer to my goal, I need to know if each of these places is open and if they have any events tonight. The web search tool will do for that again – so now it has a shorter list of places, and we loop again
The AI “thinks” to get closer to my goal, I need to present these to Matt and ask which one to book. I can use the “ask the user” tool for that – so now we interact.

It’s wildly effective.

And totally works! With today’s technology! It’s really simple to build.

You can embellish the basic looping pattern. Agents can retain context between sessions, i.e. the user Matt prefers some types of bars and not others. That’s part of what makes an agent personal.

BTW: I know that AI large language models are merely “next-token predictors” based on terrific quantities of matrix math and therefore they don’t THINK. But seeing as I’m content to use the word “memory” for my computer, which itself was controversial terminology back in the the day, I will use similar shorthand here.

Where are all the agents today?

I first wrote about AI agents exactly a year ago: The surprising ease and effectiveness of AI in a loop (Mar 2023).

A month later I demo’d a smart home simulator with a problem-solving agent named Lares and won a couple awards. Here’s the Lares vid and here’s a bunch of detail: Unpacking Lares (Apr 2023).

There was a TON of excitement about agents at the time. And then… nothing.

What happened? I mean, people went off and raised money and they’re now busy building, that’s one reason. But what makes agent-based products less low-hanging fruit than, say, gen-AI for marketing copy, or call centre bots? (These are based on prompting and RAG - retrieval-augmented generation - two other fundamental techniques for using LLMs.)

WELL.

Imo (from building Lares back then, and re-building it this week) there are two challenges with agents:

They’re expensive. Like, fulfilling a task can easily consume 100x more resources than a response in ChatGPT. The enabler a year ago was on 1 Mar 2023 when OpenAI shipped the model gpt-3.5-turbo with a 10x price drop. But honestly agents are barely reliable with that model… you need GPT-4-equivalent or above, and that’s gonna cost you.
They’re divergent. If you give an agent even a slightly open-ended task, it can end up doing bizarre things. For instance, I have an agent that’ll tell me the weather, and to perform that task it first asks me for my location. Once I pretended I didn’t know, so it went off and started googling for how I could install an app to give me my lat-long, etc.

…and these challenges combine to make any agent-based products really hard to design.

For example: if I want to book a place for drinks in Peckham tonight, and it turns out that everywhere is busy, as a human I would just choose not to book, or chat with my friends about what to do.

But an AI agent, lacking common sense but being highly autonomous and motivated, might email my friends to move to another evening, find I had a clash, email that clash to cancel it, and so on, loop after loop after loop.

This is entirely plausible! LET ME GIVE YOU A REAL LIFE EXAMPLE:

Agents have started shipping, a year after the original flurry.

Devin is an AI software engineer by a startup called Cognition. This is a smart product move: integrate the AI into the customer’s business by giving it a well-understood job role, and put it in a domain where its knowledge base and activities are highly scoped. Like it can talk to people and suggest code changes, but it’s not going to start messing with the corporate calendar.

Although! Here’s Ethan Mollick trying Devin for himself (X/Twitter):

I asked the Devin AI agent to go on reddit and start a thread where it will take website building requests

It did that, solving numerous problems along the way. It apparently decided to charge for its work.

Divergent!

Mollick’s takeaway: Devin has GPT-4 style limitations on what it can accomplish. I assume that the brains will be upgraded when GPT-5 class models come out, and there will be many other agents on the market soon. A thing to watch for.

And that’s roughly my takeaway too:

agents are a fundamental technical pattern to use with LLMs, as I said, like prompting (gen-AI) and RAG
GPT-3.5-equivalent models aren’t good enough, GPT-4 models are just barely, and GPT-5 definitely will be… but GPT-5-level models haven’t been released yet, and will likely be too expensive for most domains. To begin with.

The market has clear line of sight to technical and economic feasibility now, so expect a ton of agents over the coming months.

Hey but what about personal agents? They’re coming too… but have unique challenges

Here’s my tl;dr with personal agents:

agents are now technologically feasible, as are agents that rely on specific data, such as personal data
dealing with personal data is a sensitive product issue – do you really want an AI agent looking through your bank account, given the unpredictable divergence issue?
dealing with personal tools is a way more sensitive product issue – do you really want an AI spending from your bank account?

But, you know, trust is a solvable issue with sane design:

OpenAI is already experimenting with “personal memory” for ChatGPT (e.g. when you say “make a picture for my kid” then it remembers how old your kid is from a previous chat), and that’s the beginnings of the tech.
Then instead of running the AI in the cloud, run it on your phone where it has access to your data without risk of exfiltration…
…and make it so I, the user, get to manually approve all moments of tool use. e.g. I approve the “booking a bar” action, and get to see what data is being transferred.

Boom, done.

And, indeed, we’re beginning to see the first personal AI agents. Rabbit r1 is a bright orange hand-held device (previously discussed in my post about the AI hardware landscape) and there we have an agent, right there, which could go out and book a bar for me and my friends tonight.

No the Rabbit r1 agent doesn’t run privately on-device, but the high level of interest in the device shows cultural anticipation for a future, more highly trusted agent.

But but but. There’s a problem I haven’t discussed.

If my personal AI agent is going to use tools, then which tools?

Here’s what I mean by tools:

When my AI agent is searching for a restaurant, does it use Google Apps or Yelp or Resy or…
When my AI agent is ordering pizza, does it go with Dominos or the cheapest place nearby or the fancy spot on the corner or buy me a pizza over or…

“Restaurant search” is a tool which will be used by future AI agents in answering a user intent.

But restaurant search tools are not made equal. (The difference is vibe, aka brand for you marketing types.)

How will the agent decide?

One answer to this is: the user doesn’t get to choose how a request is fulfilled.

Steve Messer has a great post about Monetising the Rabbit R1. In short, Rabbit-the-device is a one-off purchase with ongoing opex for Rabbit-the-company. And therefore they need to make up that gap. We can guess how, says Messer:

Transaction fees

Subscription model

Tip your rabbit

Adverts on the free tier

Special offers from other brands

Taking a percentage of revenue

And this feels like one all-too-plausible future. When I book a restaurant, it’s based on the kickback that Rabbit will get (or whoever). Ugh.

But what makes an AI agent personal is 50% in its memory about me, and 50% in how it dispatches its requests.

So a world we might want to shoot for is: the user gets to choose how every request is fulfilled – and now we’re into an interesting challenge! How will we build that?

Like: how exactly will my preferences be recorded? How will they be matched up to one of the many, many available restaurant search providers, say? How can this not be terribly cumbersome?

For an answer, I think we look at BRAND and SEARCH ENGINES.

My AI agent will use 100 signals to choose which tool to use, and this is a Google-shaped problem

Let me make the problem one notch more complex, which is to add this: how do we get to personal AI agents, given that over 4 billion people have smartphones?

Any answer regarding the future of AI agents must also answer the “there from here” question. I refuse to believe in a near-term future where AI agents somehow displace my iPhone, or require me to have another device in my pocket.

Wonderfully, this additional constraint provides a way through the conundrum:

The AI agent chooses to search for a restaurant using The Infatution rather than Yelp because I have that app installed.

My personal preferences are expressed, not as a questionnaire given to me piecemeal by the agent, but simply by looking at my existing home screen.

Here’s the AI agent future I envisage:

The AI agent runs on my smartphone where it has access to my data
It asks for my approval before using “tools” such as: search, book, purchase, message, etc
Agent tools are implemented as app extensions. i.e. app developers include an extension a bit like a Share Sheet, or widgets, only it’s an on-device agent-accessible API for running tools

This collapses the whole “how do I choose what tool to use,” “how are new tools developed,” “how do I trust my AI tools” and “how do I discover new tools” into well-established patterns: I choose a vibe and build trust based on brand; I discover new tools just like I discover new apps today, via apps and search.

Hang on, search?

Why not? If my query is something like: “turn on the lights in this Airbnb” and the AI agent on my phone needs to find an app to control the lights, obviously I won’t have that app already, and so of course it’s going to search for it.

So now we need a AI tool search engine for use by AI agents.

And, to be a great search engine, the tools will be ranked by location, what tools have been used previously at this location, which of several tools are preferred by my friends, and so on.

This is exactly what the Google search engine has done for documents for like forever.

We already have search engines for nouns aimed at humans, now we need search engines for verbs aimed at AIs.

There’s an app a patent for that

Oh haha I forgot to say this is not a new idea.

I want to bring in a project I did with the Android group at Google back in 2016. I can’t talk about most of my consultancy work but I can talk about this specific part of this one project because it resulted in a patent with my name on it:

Shared experiences (WO2018164781A1, filed January 2018).

(I won’t say anything that isn’t explicitly in this patent.)

Download the PDF and look at the very first image. (It’s also the hero image on the project write-up over on my old consultancy site).

You’ll see a Venn diagram with three overlapping circles:

Query: “locate restaurant for dinner”
Handler index: a database of handlers and the goals they can answer (i.e. tools)
Context: location, time, and restaurant-finding app usage by the user and their friends.

In the overlap at the centre: Matching assistants.

i.e. in 2024 language, this is how an interactive AI agent finds its tools.

There is a lot of detail about the possible signals in the patent text. The fact that an app is “installed” is merely the strongest signal, but not the only one.

Also, a twist! Assistants should be pro-active.

If I’m chatting with a friend about going out for a drink (using WhatsApp, say) an AI should be able to jump in and say it can help.

(You’ll find an illustration of that concept on my project write-up page, also taken from the patent PDF: it’s a conversational user interface showing a “Contact Request” dialog from an AI assistant.)

An installed app/agent may simply join the conversation. An agent with less confidence might metaphorically “knock at the door.”

So this answers the other challenge with AI agents which is how users discover what they’re useful for.

In the app world, designers deal with feature discovery by building in affordances – visual representations of available functionality. But, as I said in my explorations of human-AI interaction (PartyKit blog): [ChatGPT] has no affordance that points the way at exactly how it can help.

AI agents need to be able to jump in, that’s what I’m saying. Agents, or tools, need to be able to put their hand up and say, hey, pick me!

And this is especially important in the first few years where agent-facing tools aren’t already installed (or approved) on my phone. Discovery will be key.

Ok let me summarise here

I’m speculating ahead several steps here:

We’ll have (semi-)autonomous AI agents – we have the first ones already and they look pretty promising. There will be many more once GPT-5-equiv models are out
We’ll have personal AI agents – again, the first is about to hit the market, and there are trust/product issues but these are solveable
We’ll need precision in how agents answer requests – highly plausible, in the same way that you might prefer Waze over Apple Maps for routing, or Resy over Yelp for finding a place to eat
We’ll achieve this in the exact same way that we meet the challenge with phones – user preferences are basically “vibe” and this is expressed as “brand” for an “app” in the phone world. The skeuomorph of that app squircle isn’t going away, that’s my take, the “there from here” challenge is insurmountable otherwise. Plausible?
Although apps might not go anywhere, we’ll need new kinds of search engines for the agents to query – maybe this will be called an index, or a store, but we’ll need some combination of query and signals, for sure
There will be a global index of available tools, and my AI agent will pro-actively offer me new ones drawn from this index, in addition to using ones that I’ve already “installed”.

Despite the long chain of speculation, I kinda feel like this is probably how it’ll play out?

I don’t have a conclusion here other than to draw out the future landscape as I see it.

Someone ought to start on that index of tools for AI agents, with novel query and ranking technologies. That’s a key enabler.

Other than that, oh my goodness there’s a lot to build and a lot to figure out.

If you enjoyed this post, please consider sharing it by email or on social media. Here’s the link. Thanks, —Matt.

Interconnected

Who will build new search engines for new personal AI agents?

15.23, Wednesday 20 Mar 2024 Link to this post

More posts tagged:

Follow-up posts:

Auto-calculated kinda related posts: