Examining the tradeoffs between Privacy and Capabilities of Frontier LLMs
There's a real paradox between privacy and capabilities that feels particularly prescient as AI gets more and more capable- in the words of my friend Charles, a truly exceptional Rust Engineer, "There is a fundamental tension between privacy tech and its capabilities - the more private/secret you want, the more overhead it has."
This fundamental tension has worn down almost everyone, excepting the most adamant privacy maximalists, a small holdout of folks focused on self hosting, using p2p encryption services like signal, and running their own servers. When you talk to the average person who uses LLMs on a day to day basis- most of them use the tech which provides the most utility. For many, privacy feels like an ancient consideration.
For a long time we've been posed a sort of faustian bargain that goes something to the tune of, "in exchange for these advanced capabilities, I want access to x", where x has become increasingly invasive. Maybe it's your browsing history, read access to your emails, or a few facts about yourself in the form of a login and "security questions". Maybe it's access to your location. Maybe it's access to peer in and see your "network". For a long time we've operated within this paradigm, one where we are essentially paying for abilities online with increasingly personal information.
There are costs associated with using frontier models, and right now millions of people are paying it. We're paying multiple times to access these models- with our money, with our time and attention, and with our context and data.
It's interesting how quickly a conversation will progress from the general to the personal when using GPT or Claude. A conversation that starts about my startup can often trend towards theological dispositions, thoughts on consciousness, and personally held opinions. Some would argue that this is just my disposition, but as conversational AI tools begin to cross the uncanny valley, I think many people will find themselves willing to be open and vulnerable with LLMs in ways they are not even with their closest friends, partners, or even therapists.
I would argue that the biggest benefits we can reap with AI and LLM's are not just from having the most powerful models but also in providing them with the most relevant/personal context. In order to do that I either have to accept that my data is essentially on someone else's "data lands" so to speak, which means I give up self-determination and autonomy, and accept that I don't have control over how that information may be utilized, or I take another path and I opt to use local-first models that are significantly less powerful, but which I can run on my own machine, or my own server.
Right now we pay with time, attention, money, and context, but in a future that includes powerful ai, we could also be paying with our agency and autonomy. I think many people are already aware of the dangers of this level of personalization without control. When we provide increasingly personal details about ourselves, when we are vulnerable, we grant access to different levels of intimacy and attention. I would argue that in many regards, the more open and directly we communicate with LLMs, the more concerned we should be with just whose data lands we are on. We don't necessarily know the intentions and desires of these corporate entities, and we don't really know what they may do with the data we share with them- after all their terms of service can change at any time, for any reason.
I believe we will reach a parity, where models are sufficiently powerful, and running the most powerful model is no longer the most important thing for the average user, but having something extremely personalized unlocks new and powerful capabilities. I want to be able to connect a real time stream of my personal data, from health wearables and personal conversations, blogs and meetings, articles I find interesting and documents I write myself, into the context of AI, but I want to be able to do it on my terms.
I'm fearful of the day that we see an announcement for something like "sign-in with chatGPT", and it's adopted by millions of people who haven't considered this argument. Imagine your presence online is linked across multiple venues- your llm usage, your email, different chat interfaces, your social applications- that's a lot of powerful data and insight. Now imagine that data being used to target you and extract the most attention from you- like a twitter feed that's impossible to look away from, or content so specific to your interests that nothing else feels compelling or interesting to look at. Imagine 'sign-in with ChatGPT' becomes the new OAuth standard, and suddenly your AI assistant knows your Netflix viewing, your Uber routes, your Amazon purchases, and your Slack conversations - all feeding into a single corporate optimization engine designed to control what you buy, what restaurants you go to, and what shows you watch. I think this is a pretty bleak outlook.
Now imagine the same thing- all of these capabilities and benefits of centralized user data, but to empower yourself. Imagine you can do the same thing in a sovereign way- instead of building a paradigm where all of that data is fed into a corporate entity, it can be fed into a data store centralized around the user, which allows them to see new insights and understandings about themselves, on a meta-analytical level. Many people are currently working to build this future- as AI tools become more powerful, I am excited to see what sovereign and local first solutions become available. It would be great to have a unified stream of data, for example, that I can use however I want with LLMS without it leaving my control as an end user. Both of these outcomes are possible, but the big question remains of whether we will continue down the path of least resistance.
A couple articles I really enjoy more thoroughly explore these concepts- d/acc one year later and AI 2027 explore these concept spaces in some detail. I think that as we move towards a future with more capabilities, that we weigh their costs and focus on also developing our sense of personal agency and autonomy, so that we can flourish and grow, and move in the direction of a world empowered by AI tooling.
There is a cost to spend time further developing new primitives and technical architecture for the future- many privacy preserving technologies like Fully Homomorphic Encryption, Multi Party Compute, and Zero Knowledge Proofs currently have a lot of overhead to operate, but I believe if we can find a balance, we can preserve our sense of self-determination and autonomy with our digital footprints, and move towards a future where we maximize our sense of agency.