My fantasy AI app is a voice mode travel buddy called Roadtrip

17.22, Friday 28 Feb 2025

File this one under: prototypes I would make if I had the time.

So the product concept starts with something I said the other week, in my write-up about MCP (Anthropic’s new plug-in protocol for chatbots). A big chunk of my AI use is chatting about things like

  • the Hanoverian succession and the emergence of modern parliamentary democracy
  • how long it takes for a running gel to kick in, glycogen capacity and where it is in the body, and so on
  • dumb one-liners about sentient home appliances: My toaster has developed self-awareness, which is concerning because its only purpose is to burn things.
  • oh also dumb one-liners about the head of the Church of England: more like the Archbishop of Can’terbury amirite
  • why The Doors are so legendary given Jim Morrison’s poetry is… middling
  • mushroom mating types and comparisons to apple tree compatibility types
  • etc.

Yes, this relies on what we call “world knowledge” - facts about the world lossily compressed into the model via training data - and yes it is hallucination-prone. But it’s good enough? Certainly comparable to the rubbish that comes back in Google search results nowadays.

And besides, when I look back on my chats with AI, a lot of it is not just asking for information. Learning is a matter of turning over the same facts in your hands again and again until they click into (or update) your existing mental models. The AI has infinite patience as I go through that process with all my dumb questions.


My Roadtrip app idea is simply this:

I want a CarPlay app with AI and voice mode so I can talk like this while I’m driving.

As if there’s somebody sitting shotgun.

Build that for me please?

Bonus points: make it an always-on voice app for my AirPods. Uh Claude, let’s compare and contrast Archimedes and von Neumann as polymath genius weapons scientists.


SIDE NOTE: I feel like the AI research community is really sitting on the “world knowledge” inherent in large language models. Hallucinations are treated as a bug. But hallucinations are a kind of creativity is this is literally the one new thing in gen-AI. Everything else is merely an acceleration of what we already do with computers. We’ve never before had all the world’s text in one place, with a system able to make far-flung concordances. That was my very first feeling using GPT-3 in 2020.

Like, what do we even do with that?

Well, what I do with that is understand the difference between links golf and American golf in terms of the difference between cricket surfaces in England versus the sub-continent, because large language models are also really good at metaphor translation. But I’m sure it could be used for, dunno, human progress or some such.


Product nuance #1: Character matters! Being engaging matters!

Back to Roadtrip.

So I much prefer Claude by Anthropic to talk with. By which I mean type, as it doesn’t have a conversational voice mode.

Why? Well Claude likes detail and it doesn’t work to agree with me or get flustered when it doesn’t, or constantly say “oh what a great insight.”

But mainly because it has an engagement hack: When Anthropic launched “Claude 3.5 Sonnet (new)” (omg the naming is so bad) they also made it finish each interaction on a question… “Would you like to go further into…”

It just keeps things ticking along.

ChatGPT doesn’t do that.

And yes I’ve used ChatGPT Advanced Voice Mode. I love love love having conversational voice chats with AI.

But the conversations sort of trail off.

Also there’s something about ChatGPT’s character which is like nails on a blackboard for me? It has this kind of supercilious sycophancy that is so agreeable that I find it hard to get it to introduce anything new or anything beyond surface generalisations.

The current ChatGPT (4o) has a character that reminds me a tiny bit of the vicious swipe that Orson Welles took at Woody Allen, behind his back, over lunch in 1983:

O.W.: He has the Chaplin disease. That particular combination of arrogance and timidity sets my teeth on edge.

H.J.: He’s not arrogant; he’s shy.

O.W.: He is arrogant. Like all people with timid personalities, his arrogance is -unlimited. Anybody who speaks quietly and shrivels up in company is unbelievably -arrogant. He acts shy, but he’s not. He’s scared. He hates himself, and he loves himself, a very tense situation.

Ok it’s not that bad but I couldn’t resist including the quote.

So yeah, the voice mode of ChatGPT but the character of Claude pls.

Although… that wouldn’t be sufficient for my app concept.


Product nuance #2: It needs to be multiple chat bots with voice, not just one

Eventually it’s kinda tiring to have a long conversation with Claude. Ending each statement on a question, although I enjoy it to begin with, makes me feel like Claude has uh “delved” a little too deep into How to Win Friends and Influence People or the like. I feel like I’m being manipulated.

A more natural way to handle engagement is a multi-participant conversation.

I assume you’ve tried NotebookLM by Google. If not: you drop in PDFs and you can type to chat with them (it avoids hallucinations by using the PDFs as ground truth).

The magic feature is “Audio Overview” which creates a 12 minute super interesting podcast about… whatever your content is.

It’s AMAZING.

It is frustratingly difficult to find examples of NotebookLM podcasts online, so here’s a NotebookLM podcast I made about generation ships (12 mins, mp3).

  • I loaded two academic papers: one about multi-generational starship mission architectures. The second about the Wait Equation (as previously discussed).
  • I pressed a button.
  • NotebookLM creates a podcast with two co-hosts, an expert and an interrogator. There are pauses, chuckles, a bit of tension and surprise, and a ton of metaphors.

The metaphors are actually great. I had to search a couple in the original papers to make sure they weren’t already there.

These are papers I know pretty well, and the format and the ways of explaining rotate the ideas into my head in a whole new way.

Now NotebookLM is not perfect. There are still audio and conversational glitches.

But this is what I want in my car!

I want to be talking to Roadtrip as it sits next to me, and have a virtual passenger in the back seat chipping in to keep the conversation going.

It turns out that the folks at NotebookLM are working on something like this: there’s a new Interactive Mode, in which you can phone into the podcast (The Verge).

You can try it. It’s ooooookay. It’s not conversational like ChatGPT Advanced Voice Mode. But you can steer the podcast and that’s pretty neat.


Meta observation: all I’m really talking about is product design

From a tech perspective, nothing I’m asking for is new. I’ve pointed at the building blocks already in this post.

All I’m really asking for is an app that combines these blocks for a specific use case, marketed as such, with super dialled-in fit-for-purpose interaction design.

Qualities like “low friction first time user experience” and “engagement” and the right balance of adding detail vs adding abstractions during the conversation (plus ensuring the chatbots don’t accidentally radicalise the users)… we barely have words to describe what good looks like, let alone design principles to follow or metrics you can track on a dashboard.

These are product goals, not engineering goals.

There must be ten thousand possible apps like Roadtrip. This is the slow part of the technology S-curve, the deployment phase.


Anyway, I would like that app.

Something to chat with when I’m driving or walking on my commute.

The character and engagement of Claude, the conversational voice mode of ChatGPT, and the podcast co-hosts from Google’s NotebookLM.

thx!


Oh a milestone: as of this week I pay more in monthly AI compute cost than I do for TV streaming services.

Auto-calculated kinda related posts:

If you enjoyed this post, please consider sharing it by email or on social media. Here’s the link. Thanks, —Matt.