evan's thoughts


Observations (sleep deprived ramblings) on the first wave of LLM based products

For the past few months I have spent a lot of time using the first wave of LLM products from OpenAI, Microsoft (disclaimer: is my employer, who my views here do not represent), and Google. My motivation for this twofold, first, that these are cool as hell. Ever since Alexa / the Google assistant, the dream of the Star Trek computer seemed in reach, and it’s obvious that LLM’s are the next step in the path towards being able to query machines using natural human language. Second, I have ADHD so the more digestible information is and the closer in temporal proximity it is to my desire to learn it, the easier it is for me to absorb. The rest I don’t have to justify as much, but I just wanted to jot some observations I have had down and see what the feedback from everyone is.

Additionally: I am writing this on an 11 inch iPad Pro, on airplane wifi, with 3.5 hours of sleep, a grande size Americano from the Starbucks in Newark Terminal B (the shittiest airport terminal in existence now that they renovated Terminal A), and a dream. So, might repeat myself. Might forget how to write a sentence. Who knows.

The closest UX to the OS wins

In the past 8 months I have used (in no particular order) OpenAI Playground, ChatGPT 3.5 Turbo, Google Bard, Bing Chat on the Web, Bing Chat on iPhone, Generative AI on Google Search, Raycast AI, Short Circuit, Petey for iPhone / Apple Watch, ChatGPT plugin for Raycast, Discord Clyde, Notion AI, ChatGPT for iPhone, etc. The one that was the stickiest and that I see myself using on a regular basis is… Raycast AI. Might seem weird, after all every app besides the ones with Google in the name is actually OpenAI under the hood. Some are even free, and Raycast is making me pay 10 bucks a month, so what’s the deal here? Well I noticed a few trends in my habits. First, I found myself rarely using these products if they involved more effort than a Google search. Having to open a browser, than a website, then type a query was too much effort to get an unreliable answer, especially when a normal search often surfaced what I was looking for. With Raycast, I don’t even have to deal with that first step! I just hit cmd + tab and type, then tab, and the query is on my screen + in a chat window. The convenience of this is worth the tradeoff of the model being restricted to GPT 3.5 Turbo and with no search functionality 99% of the time, and the times its not, I prefer to search anyways. This made me realize that the race for the dominant model + system will depend on who can get theirs bundled with the iPhone first.

Windows Copilot was revealed at Microsoft Build this week (disclaimer part 2: so was the product I work on, Microsoft Fabric, check it out it’s very cool) and it’s obvious that this sort of OS level integration is the next frontier for these models after they’ve been integrated into search engines. Android will likely get some form of integration with a LamDA / PALM based model sometime in the next year or two, same with Chrome / ChromeOS, so the question is what Apple is going to do to keep up. Siri has for years lagged behind its competition, and I think that a future in which Android has the equivalent of Windows Copilot and Siri can’t even surface basic questions on my HomePod is going to be a genuine risk for them. If I had to place a bet, the obvious / safe one is that Apple is going to wait until they can feasibly do this on device, either by training their own LLM (unlikely) or partnering with / utilizing an open source project as the foundation for it.

Nothing is free (except some things are)

The other thing I noticed is that every single one of the above products not from Google, OpenAI, or Microsoft, cost money. This is because OpenAI’s API costs money. Every query you send to Raycast AI, they need to pay for, and since they can’t predict how many queries you’ll send they need to charge subscription revenue up front. Some products like Short Circuit and Petey, both of which were ChatGPT wrappers I used before the official iPhone app was released last week, allow you to just give them your OpenAI API key so you can pay for what you actually use. This type of payment was the only way I could use multiple of these products at the same time, and it made me realize that the cost of API requests is a problem for centralized LLM platforms. A Google employee’s fascinating memo titled We Have No Moat reframed the entire way I thought of this new competitive space forming. My takeaway from reading the memo was that open source on device models are going to catch up pretty soon, and if we’re truly going to plug this tech into many different products, I expect that you’re going to see future implementations rely on local / FOSS models a lot more as a way to get around subscription bloat. You can’t compete with free, unless you’re Adobe, in which yeah, GIMP sort of sucks and i’m willing to sell my soul to never use it. The point is, people aren’t willing to pay 5 bucks a month for you to meet them where they are for every product on the planet. It’s either gotta be free or included in the price of the product.

They sort of suck!

I don’t think there’s getting around how much relying on these things as your primary source of information feels like shit. Not because of any ethical concerns, but because the way in which the information is presented is so sterile and generic that I start to go a little crazy after reading too much of them. It’s the mental equivalent of only drinking Soylent forever. After enough time, your body starts to revolt and ask for real food. I also started to feel like I was going mental after a while, as while they don’t hallucinate often, they do enough to where you really can’t trust them for anything serious. I use them for grabbing bash commands I forgot, which was a task I used to use Stack Overflow / Google for. The nice thing about the later is you could usually use intuition to tell when something you saw was wrong, based on either the age of the information, the amount of upvotes, etc. Here it’s a total crapshoot and it makes you feel like a paranoid nutcase when something breaks. If you are asking a LLM for information on a subject you are unfamiliar with you are flying blind. Products with search like ChatGPT Plus don’t even fix this problem, as they can hallucinate just as easily.

My crystal ball predictions

Alright so it’s time to make my predictions. My predictions / takes will be a simple bulleted list so I do not have to remember how the english language works anymore because the caffeine is wearing off and i’ll be honest it’s starting to slip from me.

Predictions:

Okay that’s enough thinking for the day.