On Setting the Record Straight

There have been a lot of inaccuracies reported about me and Stability AI, today I set the record straight with our team with the message below.

We are going to start being more assertive about what we do and I have some very interesting stories to share about some of the past elements touched upon here.

We have some very interesting things coming up.

This also is very intriguing with regards to the future of media and trust - something that will be essential in the coming years with generative AI and that we have a lot to contribute to.


To: Stability AI Team

Subject: Recent Forbes Article

Team,

When we launched Stability AI, we did so with the goal of using AI to help activate humanity’s potential. We saw a chance to enhance scientific research, education and the arts, and positively impact industries in ways that are transformative. That incredible mission has been guiding our team’s work since day one. 

In a race to achieve this, we have put the bulk of our focus into our core work: creating innovative products and supporting research implementing AI in the fields of medicine, healthcare, language, imaging, music and more. In just over 18 months, we have grown from 20 employees to more than 170, while building a young, dynamic company that is a leader in the AI space, both through our core teams’ innovation and supporting others’. 

It is not surprising to me that as we have grown in prominence and impact, we have also increasingly become the target of several attacks and misrepresentations in the media. This latest article attacking our company by Forbes is no exception. Despite our team spending weeks and weeks going back and forth with Forbes to correct the record, they have clearly chosen to ignore the truth on many of these issues. 

Throughout my career, I have always been quick to praise and attribute the work of collaborators. I have apologized for any errors I make and have always strived to be constantly improving. I have categorically denied the spurious claims and characterizations throughout this story, but I want to set the record straight for you, my valued team, here today. These were all clearly communicated to Forbes ahead of release.

There are countless false accusations and misrepresentations in this Forbes story which I will clarify here, in particular:

●      My degree from Oxford:

  •  I did not attend my graduation ceremony to pick up my degrees and so actually do not technically have a BA or an MA. I have paid the £60 to receive these by post and will do so next month. Hopefully I can attend my graduation ceremony in person.

●      My prior hedge fund work:

  • I came in as Co-Chief Investment Officer at Capricorn Long/Short EM in 2016 after it had a bad year in 2015. In my first full year leading portfolio construction it won the EM Risk Adjusted Hedge Fund of the Year Award for performance in 2017. I then left to focus on other things in 2018 and they shut it down for the previous bad performance.

●      Our work with the United Nations around COVID-19:

  • We were partnered with these organizations in CAIAC, which ended before summer of 2022 as discussed in multiple places as a catalyst for the focus of Stability AI as multiple companies did not share their technology or failed to deliver.
  • This is a relevant experience, not a representation of a current role, as with the other relevant experiences. 
  • I have discussed my experience with the CAIAC project and frustrations around it, leading to the creation of Stability AI in its current state, in numerous investor meetings and interviews.

●      Stable Diffusion:

  • We have made repeated public statements that Stability AI was a collaborator in the development of the first release of Stable Diffusion, alongside the Computer and Vision learning group at LMU and Runway. 
  • The details of this collaboration have been shared publicly on Stable Diffusion’s Github under CompVis and Stability AI's website since its release. 
  • In addition, 3 of the 5 authors of latent diffusion & stable diffusion – work at Stability AI as you all know.
  • The best solution to this is to keep shipping.

●      Our partnership with Amazon:

  • We have a strategic business alliance with Amazon & AWS. As part of this strategic business alliance, AWS built an incredibly rare dedicated compute cluster completed in August 2022 to the requirements of Stability. The 4,000 A100 cluster has dedicated capacity on a single spline which optimizes performance and is one of the only ones there
  • As we scaled there were some payment timing issues (especially given credits etc), all of which were resolved by August.

●      Our payment of wages and payroll taxes:

  • There were some instances of delayed payments, all of which were quickly rectified with me personally covering the employees’ payroll at the time - this included waiting for grants for government schemes and others
  • We also paid impacted employees additional wages to compensate for the inconvenience and covered any related expenses they may have incurred such as overdraft fees. 
  • Since the end of 2021, there have been no missed salary payments in the regular course of operations. 
  • All payroll tax issues were resolved quickly and have been paid in full. We are constantly working to strengthen and improve all of these processes. 

●      Our efforts to strengthen and improve our HR processes:

  • We continuously strive to improve our work environment with low attrition from employees leaving for other employers since last summer, when we upgraded the management team. 
  • We hold weekly town halls, anonymous feedback sessions and other programs instituted by our new, experienced, HR team, so employees can help us understand where we can improve.

●      Our relationship with MidJourney:

  • Written correspondence in an investor memo clarified our relationship with MidJourney as such in exact terms:

“MidJourney is an independent organisation we support that implements and makes usable the image models we create. It currently has almost 57,000 members, 14,000 of whom are active and after our support for the beta has been scaling organically through a subscription that covers their compute costs. We would like to help them scale this aggressively, introducing aesthetic scoring and preference learning so the system improves for each user using the inputs of all users. We provide strategic and model support for MidJourney whose focus is not scaling for the sake of it, but improving access to these new models for as many people as possible.”

  • Last summer we were in discussions to co-create this and other products as joint efforts. We then pivoted to creating our own and then focusing on the infrastructure stack
  • This wording identifying MidJourney as an “independent organisation” clarifies our intent not to mislead readers of the presentation deck
  • We have always been very careful about this and were experimenting with different models where we could support not control communities before settling on our current business model - stable models are ours and commercial, we support open source in general more openly to keep things separate and clear

●      Zehra’s support for our company:

  • As the company grew and matured, a full reconciliation was done and any amounts owed from or to myself and Zehra were settled in full before the end of 2022 by our new, experienced finance team.

●      Our continued fundraising efforts:

  • Stability AI has significant runway, but, like many other start-ups, we remain engaged in discussions with strategic investors. We have not opened our data room at this time to the investors who have signed MNDA.

●      My autism research etc:

  • Anyone who has spoken to me knows the score on this, while I can be excitable I am very serious on this and will be putting out details soon once we get through this period and the appropriate models are built to take this forward.
  • The end of the piece makes the stance of the authors very clear given they interviewed the head of research and more for hours, they know of the talent in the company and leadership but balance was not the aim. It is very sad.  

It is in a way encouraging that after a huge number of interviews directly targeting those who have axes to grind against Stability AI this is the sum of what they could attack us with.

The company has moved from a pre-seed startup last year to a scale up this year and is well on its way to becoming a multinational with huge interest and increasing support in the most amazing transformative technology we have ever seen.

We are a diverse team coming together to build the foundation to activate humanity’s potential and there will be significant scrutiny rightly on everything we do.

We must focus on being open at our core, admitting our faults and constantly improving so we can achieve our mission and make the world a happier place.

The alternative is a closed panopticon which none of us want and we are fortunate to have an amazing and growing team that can deliver on this.

Let’s ship so they can’t ignore the good we do.

Thank you,

Emad

 

On Google, Palm 2 & Moats

Today Google announced/released applications based on PaLM 2, their new smol Language Model.

While the trend had been larger models until a year ago, something reflected in our calling them LLMs, Large Language Models, since these have become useful we have actually seen a move to flip from research to engineering to get them to a stage they can go to billions of people without using every GPU on Earth.

We have also seen that many of the parameters in these models and underlying data aren't actually required, we have been feeding them junk food.

The accompanying technical document is short on details on the final architecture, datasets etc but does show that models similar from 400m to 14b parameters achieve great performance, something we have see with the open source explosion around LLaMa and other base models recently as folk realised you can get large models to teach smaller ones.

This comes a few days after an article that was sent to me approximately 127387129837 times, 

Google "We Have No Moat, And Neither Does OpenAI"

To the surprise of many I didn't actually agree with this piece on a number of things, even though I am one of the bigger proponents for open source AI.

While the open language model acceleration has been wonderful to see, the truth is that a lot of the work around this has been by a handful of folk and mostly fine tunes and some adjustments to the base model.

Some of the details in the piece were also a bit odd, does anyone at Google actually use LoRA?

These techniques are nothing new and distillation and student teacher approaches are well known and actually the former pioneered at Google.

There have been a number of internal issues and walls that have prevented information sharing and even process optimisation - why we saw Chinchilla outperform the original PaLM yet the original still used for example.

Google started with innovation but innovation is a really difficult thing to build a good company around - the innovation is the catalyst but the ingredients to a good company are turning that research and innovation into engineering and operation (sound familiar?).

I wrote more on that here:

The narrative will likely turn as Google integrates Palm 2 and then Gemini into its various products to drive real customer value - maximising function and flow and reducing annoying frustration.

gmail.ai wen

Similarly while the models before they start slimming are already good enough, fast enough and cheap enough to make an impact, we will see this really take off into 2024 as we start to explore some of the limits of optimisation and engineering, something Google is good at (Amazon too actually).

This aligns with my call in January saying not to write off Google as they have all the right ingredients to be one of the couple of main players in this space, something that will become more and more apparent.

The dynamic this sets is interesting though - if the models are this good, this fast and getting better, where is the space for other proprietary players?

(Also why did they use Adobe Firefly in Bard instead of Imagen or Muse?)

I think folk need to consider what would be the impact if it turned out you could get GPT-4 level performance in 1 billion parameters or less, something that could work offline on a phone (maybe the cool Pixel Fold)?

Looking at how things are going I don't think this is inconceivable from proprietary models and we will continue to see massive innovation in this space at a pace few of us expect.

I think Google, OpenAI and a few others will lead on this and it will be really hard to compete on features, meaning focusing on the unchanging needs for a business around this will be critical.

We shall see, but for now Sundar came to dance.

edit: heh AI is awesome

On Hallucinations, Junk Food & Alignment

How to think of Generative AI

Generative AI is really complex. 

Like every hour there seems to be a new breakthrough, with supercomputers and matrix multiplications and more.

Except it isn't really, the code is actually quite condensed and the operators not too bad (the best course if you'd like is fast.ai for existing developers to whet their beaks).

When considering the impact of this technology, how it applies to you and your world, there is a really easy comparator I have found.

These models are like really talented graduates that occasionally go off their meds.

What would you do if you had an army of really talented grads available at the spin up of a GPU?

Dealing with uncertainty

Going off your meds is tough, right now I am trying to adjust my dosage of Elvanse for ADHD as its all a bit much right now and I have had a weird side effect of not being able to move because my body says sit down and type as everything is so important and the list is endless. 

Occasionally it allows me to have a bite to eat or go to the toilet. Very kind.

Building out my executive office and team from basically being a kinda family-run business partially out of an office and partially out of our home last year has allowed me to handle more and more, but it is still quite a crazy time.

In order to navigate this all I have done what we all do, set in place heuristics, patterns and processes to try to untangle all of this.

But these are all made up and the best approximation of how to deal with an incredibly uncertain time.

When we deal with stable environments and risk regimes, we usually pull some probability numbers out of our butt (20% chance of recession) and then do an expected utility calculation combining these for a probabilistic outcome.

When we deal with uncertainty we usually minimise for maximum regret, how will we feel if things reasonably go against us - for example AI that can do anything can do something bad too.

This has to be heuristic based as often we do things without any context or knowledge.

It is the same for our smol generative AI models, they come out of the pretraining university with trillions of tokens of knowledge on just about everything (know-it-alls are often insufferable tbh) as really creative liberal arts grads and then get RLHF'd into being accountants and customer service reps.

Taking the whole internet, warts and all and forcing the AI to watch these it is no wonder they come out a bit weird from their training runs.

But what they do have is a condensed neural net of weights that are principle based analysis of just about everything they have seen, allowing them to deal with unstructured and uncertain things, in contrast to big data based AI that can only extrapolate.

On hallucinations

Hallucination is a word we hear often with these models, but I think it's a misnomer. 

They are actually reasoning machines, not calculating machines and the fact that GPT-4 can pass just about every exam except English Lit (know-it-all ptth) is a crazy outcome for something that is probably only a few hundred gigabytes big (judging by NVIDIA saying the dual H100 NVLINK was designed for it especially for chatGPT, presumably GPT-4 at 200b parameters).

The fact that Stable Diffusion can generate any of anything in 2 gigabytes of totality is insane. A customised version is just a few kilobytes more.

These models were never meant to be fact based or compression - its impossible to compress information that much (if it is we are worth a trillion dollars, our Weissman Score being off the charts).

There is also the interesting thing about the outputs being views into another world, just as we compress the knowledge of the world around us and fill in the gaps (eg our optic nerve), so the world is a dynamic simulation to us..

Why did you do that

When thinking about interpretability, explicability etc this also puts things in an interesting light.

If you ask me why I do stuff, I can make a very logical seeming case, but really it is just post-hoc rationalisation of me operating by the heuristics and principles which I have developed over the years.

It is rarely 100% fact based, especially when dealing with decisions in the face of uncertainty.

For these models it is the same given the amount of data, if they are talented grads, there is no necessarily deterministic output (indeed try reducing the temperature of a language model to minimum and seeing if the outputs are always the same) - post hoc rationalisations are about as good as you'll get.

Junk food for the brain

In the light of the above, what do we have now?

We have really talented grads forced to watch the whole internet, then we spend months doing RLHF to make them more boring and in line what with we need.

The more we feed them the more we have seen their capabilities, but they are really turning weirder and weirder.

We are feeding them junk food and no wonder they are so unhealthy.

So what is the solution?

Organic, free-range AI

As the saying goes, you are what you eat, so we should feed these models better things.

No more web crawls, no more gigantic datasets with all sorts in (some interesting discussions on this shortly with respect to StableLM alpha and why it was the way it was..), instead lets do good quality datasets.

We funded the compute for the smart folk who did the Datacomp project as one of our grantees, the 12 billion image pairs is another huge dataset, but more interesting is the 1.5b image dataset that broke previous records on CLIP image to text quality through better data quality.

Aligning things more capable than us is really difficult.

My buddy JJ made a great point when he noted that alignment is orthogonal to freedom:


The only way to be fully sure that something more capable than you is fully aligned to what you want (which may be different to what Bob wants..) is to remove its freedom.

This is output focused, but a better way to contribute to a better chance of success is perhaps to fix the inputs as fast as we can even as folk race (I'll discuss the FLI letter another time).

Let's all get together to build good quality, open, permissive datasets so folk don't need to clog the arteries of these GPUs with junk, which often causes dreaded loss spikes.

I will share more of my thoughts on how to do this properly in a longer post on alignment and the design patterns around building these systems are changing and likely to continue to change going forward.

However, for now, if you're thinking about the impact of this tech, what would you do if you had an army of really talented grads that were kinda unhealthy and sometimes off your meds at your fingertips?

How would you make sure the next generation of grads was a bit more stable and balanced?

Let's start from there and move on.

On AI x Crypto

Montenegro is nice

This weekend I was at the AI x Crypto event at Zuzalu, a pop-up city in Montenegro.

It is really very nice there, do swing by Tivat:


It was great to meet the awesome folk trying hard to figure out this crazy time, I will speak more on the various things in upcoming posts (awesome to meet the amazing Vitalik Buterin in person (I'm a huge fan) & Grimes had some super interesting insights even if we completely disagreed on some things - stable diffusion moment for music incoming). 

AI x Crypto

When I moved Stability AI to focus on generative AI following some interesting experiences working on AI & Covid at the end of 2021 one of the original ideas was to have a DAO of DAOs for supporting the various communities for open source AI.

We experimented with various potential models for this with a goal of making it so that AI was not an inherently centralising force, but the infrastructure and tools were just not there.

This has been something interesting about crypto, in that while there is clearly interesting things in there it is questionable how much value has been able to be created.

Creating a system outside of the existing system has meant that much of the money has been made and lost on the interface between the two, be it hacks or speculation or otherwise.

The fundamental of crypto however is in the name - cryptography enables identity.

Identity and the flow of information against this was at the core of the Bitcoin whitepaper by the eponymous Satoshi Nakamoto, with the headline of the abstract being:

"A purely peer-to-peer version of electronic cash would allow online payments to be sent directly from one party to another without going through a financial institution."

This is an interesting thing with elements around sovereignty, coordination and more, but there was always something missing.

Something there that wasn't there before

The existing web was made of identity (Google, Facebook login) and centralised AI.

This routed information back and forth but was largely built on ads.

I always thought it was weird that there was no AI in Web3 as it was called.

This is because partially the big data systems required a lot of heft and standardisation to work, one reason you saw centralised exchanges and more emerge, especially at the interface between decentralised networks and the existing systems.

Existing systems are actually often slower on purpose too - things like remittances and instant payments are easy, but when you get instant payments you also get.. Silicon Valley Bank collapses (more on that soon too..).

The interesting thing about generative AI is that it is not big data, it is dense models trained on giant compute and structured datasets so the output is this curious thing, a model weight.

This is a file that compresses the principles extracted from the data in an impossibly small format that requires large energy up front but very little relatively speaking to run.

Stable diffusion was 100,000 gigabytes of images, 2 billion, into a 2 gigabyte file that powered 4 of the top 10 apps on the App Store in December.

That's the whole back end, not multiple dependencies and complex software. 

It runs on a phone now so we have infinite images in our pockets. 

How cool is that.

This enables new design patterns, especially when combined with identity and value transfer rails, where value can be considered equivalent to information that is important in changing a state (classic information theory a la Shannon encapsulated).

The Intelligent Internet

The moving of intelligence to the edge enables a concept I like to call the intelligent internet:


Standardised models customised to individuals under their own ownership, to paraphrase the classic crypto statement, not your crypto - not your weights, not your brain as we rely more on these will become important.

The governance of open systems is something we can learn a lot from all the hard work that has gone on in the crypto space (while hopefully avoiding some of the excesses).

Who should be deciding how datasets and models are built?

These things are not unbiased nor can they be I think.

Who should control the distribution and access?

Who should have access?

Satoshi Nakamoto also had an interesting statement:

"It takes advantage of the nature of information being easy to spread but hard to stifle."

The spreading of model weights and customisations of those weights that are even smaller is something with similar properties and combined with identity & value transfer rails can lead to some very interesting outcomes for the benefit of society.

How we deal with these is going to be very important to our communal future and something quite important to bring a range of viewpoints in.

Open source AI is critical for private knowledge and ownership and is not going away, but some more thoughts on what I have seen in this journey coming soon..

On Blogging & Effort

Mental models

Over the years we build up all sorts of mental models of how the world is.

One of the ones I have been using recently is that it is relatively easy to speak face to face, notwithstanding shyness, social anxiety and more.

Writing is harder, much harder, especially concise writing.

Blaise Pascal in 1657 wrote:

Je n’ai fait celle-ci plus longue que parce que je n’ai pas eu le loisir de la faire plus courte.

"I have made this longer than usual because I have not had time to make it shorter."
Hardest of all is communicating visually, with images incredibly difficult and other forms of visual communication like presentations incredibly painful.

In a presentation (hoho) a while ago I did this slide to show how it is:

Now with generative AI it is going to be more like this:

Maybe. Friends like Tome and others are working hard to make this a reality.

One of the main things here is that it is good to create and get stuff out.

Under pressure

As the CEO of one of the more differentiated companies in one of fastest growing and likely world-changing sectors we have ever seen its really not easy.

I've had loads of failures despite doing a bunch of interesting things, but never lived up to my own expectations of myself.

With Asperger's & ADHD it's been quite difficult and it's only recently I've been getting to grips with my own mind. 

Aphantasia and a lack of an internal voice also make the world strange versus others. 

It's a privileged position, although a bit lonely, that I find myself so I thought it might be nice to take my own advice and get things out, something I think may be interesting to folk out there.

This is part of a broader shift I want to bring at Stability AI where we will start building language and other models in the open - lots of folk want to help and its good to get things out there.

I am also going to be experimenting with different AI tools to see how the communication process can be eased - this post has no AI but future ones will.

Nerves and imposter syndrome

Even while writing this out right now I am feeling quite nervous - what if folk judge what I write, I say stupid things, I bore them etc

I think this is a very human thing and I am going to try not to let it stop me.

I'll write about whatever I find interesting and maybe you will too.

Away we go.