blog.mrjonhudson

267 views

My Rant About LLM Hardware

The current claims by LLM based Hardware companies are making are pretty wild (Rabbit, Humane, Brilliant Labs, Tab to name a few) - with 2 or 3 voice commands, you can have a completely autonomous virtual assistant control and manage your life to essentially Make Life Easier. It’s now March 2024, and we’re a few weeks away from shipping for some of these devices, and whilst I’m excited about the value proposition, I do think we’re being sold a dream with them.

I’ll take a step back for a moment and mention why I am excited about these devices:

The value proposition is mega strong. Who doesn’t want a device that can make their lives easier?

Every major shift in operating systems has been because of a new intuitive way of interacting with technology. Think Keyboard + Mouse unlocked the age of information (web 1), touch screens and portable devices unlocked the age of social media (web 2). It’s why I think the proposed web 3 didn’t take off - there was no new magical way to interact with them therefore no magic was found. Are LLM’s actually the web 3?

This is a UX shift, not a hardware/software shift that most people are thinking about this like. We first and foremost need to understand what is the way that people are interacting with the technology, then design the software and hardware around these findings.

Not necessarily interested in solving, but I’m intrigued about what the industry deems as AI to AI interactions. How does my AI agent interact with your AI agent? How is data shared and stored? Is there a central location of information?

This therefore makes it a really exciting time to be in this space. Change is on the cliff edge of happening, you could argue that the underlying magic is mature enough now too. However, I see problems in the space that need to be addressed before a clear winner is found.

The privacy and embarrassment of voice interaction. Voice interaction with LLMs is the obvious interaction when you come from a tech perspective, however it’s embarrassing in public places to talk to a device and receive an answer. Most LLMs now use a text interface, but my biggest gripe with them is the sheer amount of cognitive load that you need before realising the ✨magic✨ with it and thus making it a part of your everyday life.

The form factor hasn’t been cracked yet. The form factor needs to be an extension, almost like a limb or another bit of your brain, and not feel like another device that you need to carry. In other words, is this a solution looking for a problem, or is the underlying utility actually phenomenal enough for the user? We need to find a solution that rides the very thin line between magic and gimmick - the difference is in the usefulness of the utliity. In other words, is the first user using it ‘because it’s cool’ or because it actually gets what they wanted to get done better?

Social embarrassment is also a huge factor in whether someone will carry the device - does it improve or weaken ones social status, something Apple and Rolex have done magnificently well.

I also don’t believe that we actually want an experience that is super efficient and implicit.

I think about my girlfriend when she’s organising date nights. Half of the fun with her is exploring different options and reading reviews and menus to find the right place. If you remove this couple of hours, do you remove the joy of the experience?
On the other hand, when meeting a friend for coffee, I don’t really care about where I meet them, as long as the place is good (cheap, nice atmosphere, not too busy, easy to get to for everyone). This would be a magical experience for me if it were available.
The second there is some kind of option assessment, a GUI is optimal.

What a graphic interface does really well is capture User Intention, which is incredibly vague in natural language. Even though Uber suggests a pickup point, the GUI gives the user time to correct their intention, collect more information and then respond to it in real time. For me, the magic of LLM’s in UX could be that for the first time, the computer learns how you interact instead of at the moment we learn how to interact with a computer. We obviously need to consider that the learning curve that the computer needs could be too steep to the point that the magic is not caught soon enough to turn the new user into a lifelong user, but it is possible.

The beauty of the state of smartphones in 2024 is the app store - it’s the millions of use cases we have for them. This is something we either need to port (I think Rabbit’s LAM could be a key player for this) or develop a community dedicated to replicating these on a new device.

Human beings are visual learners. Therefore, is it incredibly detrimental to remove a screen entirely?

You also can’t watch videos, play games, doom scroll on Twitter/Tikok anymore, what becomes of the replacement?

The 4 things I want

Interact with my friends
Doom scroll
Watch porn
Take a photo

Thank you for reading! If you want to see future content, you can follow me on twitter or connect with me on LinkedIn

🌱 Organic produce from Shropshire