AIExplained posts

o1 can 'self-correct'. That's kinda significant.

Drawing on 3 new articles, an interview in the last 48 hours, and 4 papers, I'll argue that we should not let o1's 'ability to self-correct' go sailing past in the night.

Link for Offline Viewing and Download: 2024-10-03 12:42:49 +0000 UTC View Post

Pod 8: Do we have a straight shot to AGI? 'Don't teach, incentivize' - Let's Think Sip by Sip

Are 'Benchmarks All You Need'? And do we have any conceptual breakthroughs to go before text-based AGI? I bring in the latest OpenAI quotes and reflect deeply on what it all means.

Link for Download: 2024-09-30 14:58:11 +0000 UTC View Post

Is o1 No Longer a LLM? LeCun + New 'LRM' paper explained (+ exclusive interview clips)

A new paper from the last few days has dropped, and it's a good one. LLMs can now be said to plan, and I have all the analysis as well as exclusive clips from my interview with the lead author. And I don't believe any one else has reported that this breakthrough performance from o1 now exceeds av...

2024-09-24 14:05:00 +0000 UTC View Post

'Humanity's Last Exam' - I Doubt It

Less than 24 hours ago we got the claim that a multi-million dollar 'final test' for AI was being put together. But I ask questions about what it will achieve, drawing on evidence from 3 papers, Simple Bench, and my own analysis. Hopefully, this video will show you why 'o1 = AGI' claims leave a l...

2024-09-17 18:12:45 +0000 UTC View Post

The Struggle to Define 'AGI' - Controversial Terms in AI, Explained

Surely now that companies like OpenAI, whose only goal is to create an AGI, are worth $100B+, we have a settled definition of 'AGI' itself? No? Or even a set of rival definitions, each of which are well-defined? Well, strap yourself in, we are gonna find out where the term started and show you th...

2024-09-09 16:00:08 +0000 UTC View Post

10,000x Scaling Deep Dive, and a 5-year LLM Roadmap

A 20,000-word new report on AI scaling, and yes, I read it all to bring you the highlights. What are the biggest unanswered questions for whether we will scale models 10,000x and is there a deeper question that underlies them all? Plus new clips from Anthropic CEO, Simple update, Eric Schmidt and...

2024-09-01 21:04:42 +0000 UTC View Post

Simple Bench Exclusive Tour: I couldn’t find a good reasoning benchmark, so I made one.

Full results from the first Simple Bench run (including latest model updates), the new website, more insight into the questions and what the gaping hole in basic reasoning means, plus my plans going forward.

Link for Off-line Viewing and Download: 2024-08-19 14:44:44 +0000 UTC View Post

'The Bitter Lesson' - Controversial Terms in AI, Explained - New Series

The term 'bitter lesson' is thrown about a lot, but what does it actually mean? Does it leave humans irrelevant or is it about something deeper? Drawing on lessons from MuZero and the annotated original essay by Rich Sutton.

Link for Offline-viewing: 2024-08-12 09:21:24 +0000 UTC View Post

Pod 7: The Story Behind SIMPLE Bench, More Results, and Next Steps - Let's Think Sip by Sip

Mistral Large flops hard, but what exactly is this benchmark, what are some more of its questions, why is it different, and what is next? All, or at least some, of these questions will be answered.

2024-07-30 18:50:25 +0000 UTC View Post

'Emergent Behaviors' - Controversial Terms in AI, Explained - New Series

One of my favorite videos in the series! A new bonus series explaining some of the most controversial terms in artificial intelligence, this time covering the term 'emergent behaviors'. Deciding if you think models do - or do not - display emergent behaviors could shape your perspective on A...

2024-07-15 15:31:45 +0000 UTC View Post

Can ChatGPT Do Task X? It’s Surprisingly Hard to Answer

‘Can any model do [insert task]?’ is a much harder question than it seems. I’m going to give you five vivid categories, with unambiguous examples, drawing on 6 new papers, of the kind of detail that is so often lost in 2024 debates on AI.

Link for Off-line Watching and Download: <...

2024-07-04 18:17:18 +0000 UTC View Post

Pod 6: No One Agrees @ OpenAI if GPT-4o is 'a smart highschooler' + My Take on Murati, Altman and Sutskever - Let's Think Sip by Sip

There is a clear dividing line emerging at the height of OpenAI, and in AGI labs more broadly. This pod reflects on the 'reasoning' and 'scale' axes, including fascinating new comments from OpenAI researcher Noam Brown about his CTO, Murati, claiming GPT-4 as 'a smart highschooler'. Plus my take ...

2024-06-23 17:35:01 +0000 UTC View Post

'Open Source' - Controversial Terms in AI, Explained - New Series

A new bonus series (2/8 episodes) explaining some of the most controversial terms in artificial intelligence, this time covering the term 'open source'. In some quarters, it's the most controversial term of them all. Here, we mainly focus on the difference between open source and open weights - a...

2024-06-19 12:52:18 +0000 UTC View Post

Fired OpenAI researcher - 'OpenAI Planned to Sell AGI to China' and 'It's Coming by 2027' - Full Analysis of 165 page Doc

Recently fired OpenAI researcher Leopold Aschenbrenner has produced an essay that will either confirm him to be absolutely crazy, a target of an OpenAI lawsuit, or bizarrely prophetic. I went through all 165 pages, plus his recent 4.5 hr interview (and other less recent material) to bring you jus...

2024-06-07 19:40:24 +0000 UTC View Post

'Stochastic Parrot' - Controversial Terms in AI, Explained - New Series

A new bonus series (8 episodes) explaining some of the most controversial terms in artificial intelligence, starting with an OG term for LLMs, as 'stochastic parrots'. Find out where the term came from, why it stuck, and enter the debate over whether it is justified.

2024-06-03 13:02:00 +0000 UTC View Post

New Benchmark Madness, But Hope on the Horizon

This video won’t just show you the problem with a range of the most popular benchmarks (though it will do that, from MMLU-Pro to GPQA, GSM8K, LMSYS and more). It will show you a useable path forward, so that we might finally get benchmarks we can trust, that really get to the underlying capacit...

2024-05-20 14:25:21 +0000 UTC View Post

Prompt Injections in the AI Agent Era - Donato Capitella

Exclusive: The second, eye-opening instalment of AI Insiders the tutorial series on Prompt Injections - Donato Capitella on what the threat is, how it is changing, and what you can do about it, at any level.

Downloadable File for Off-line Viewing: 2024-05-17 12:56:20 +0000 UTC View Post

Pod 5: GPT 4o Reflections, Cryptic OpenAI Tweet, When to Declare AGI, and New Guests - Let's Think Sip by Sip

Let's take a moment to reflect on the import of GPT 4o and the cascading social ramifications of development and after development. Then, I investigate an interesting OpenAI tweet, talk aboutforthcoming guests and go deep on the decision of when to declare AGI (assuming we can define it). I end w...

2024-05-15 20:47:35 +0000 UTC View Post

Reflections on Sam Altman’s Recent Expectation-Setting on GPT-5

I believe the model that will end up being popularly known as GPT-5 has finished training. That comes not just from the analysis in my January video but also Sam Altman’s response to a question I...

2024-04-28 13:38:44 +0000 UTC View Post

Many-Shot Magic: 2 New Papers + 1 Failed Bet Show What Can Be Done with LLMs

Two recent papers (DeepMind + Anthropic tag-team) and a failed $10k bet have reminded people not to underestimate what models can learn from the data you give them in the prompt. Let me show you how this can be harnessed to get better results, even if you don’t have great demonstrations at hand...

2024-04-23 14:56:12 +0000 UTC View Post

SmartGPT Website Demo and Community Project

I have always wanted to have a web demo of SmartGPT, to show anyone how powerful basic prompting scaffolds can be. But I wanted it to be even more interesting than what I showed last year, so the iteration I'm sharing today incorporates one clear improvement to the system that got an unofficial 8...

2024-04-12 15:09:48 +0000 UTC View Post

Perplexity CEO on the Future of Search, and Why He's Not Scared of OpenAI or Google

Highlights from the interview with Aravind Srinivas, co-founder and CEO of Perplexity. Plus the news today not only of the first hints of instant search from OpenAI but of Google epochal shift to a search generative experience. I’ll put all this, and your questions, directly to the man who is h...

2024-04-04 20:12:42 +0000 UTC View Post

AI Jobs Warning: 36 Hours Later, Author Interviewed, Paper Analysed in Full, and Why I am Still Somewhat Optimistic

Yesterday’s dramatic Bloomberg headlines showcased an ‘AI Jobs Apocalypse’, warnings of ‘millions of jobs lost in next 3-4 years’, triggered by a new 44-page paper from London. I interview the IPPR lead author Carsten Jung and get to the bottom of it all, giving my critiques of the pape...

2024-03-28 23:25:52 +0000 UTC View Post

Pod 4: Unpredictability: AI, Content Creation, Timelines and Vernor Vinge - Let's Think Sip by Sip

The only theme for this episode is unpredictability, from the swirling new rumours of GPT-5 release dates from Business Insider, to the challenges of promoting interviews that don't happen, behind-the-scenes chats, how we can't rely on AGI Lab leader reassurances and key extracts from the portent...

2024-03-24 19:09:25 +0000 UTC View Post

A Note on Not Being Shocking, and Making Connections

I don’t often do personal updates, I just sprinkle them in, on the off chance anyone wants a bit more behind-the-scenes. Two things come to mind to mention today: the repercussions of not being shocking and of meeting AGI Insiders.

First, is it me or has AI coverage devolved a fa...

2024-03-15 13:42:37 +0000 UTC View Post

The AGI Lawsuit

What are just the most interesting details from the Musk-Altman Lawsuit? Can Gemini 1.5 help me sort through the morass of relevant tweets? I want to give you the history of the battle over the definition of those three key words - artificial general intelligence - and what it means for us all. 2024-03-03 21:58:16 +0000 UTC View Post

AI Professional Tips and Networking

This month has seen the launch of a Discord channel that I am very excited about. We have hundreds of incredible people on here at the bleeding edge of implementing and understanding AI, and so naturally, we need a place to exchange best practices, in a friendly and professional environment (whic...

2024-02-25 18:55:29 +0000 UTC View Post

$7 Trillion, a Bioweapon and a Nuke In Space - Under-the-Radar AI Safety Papers

Everything you missed in the world of AI threats because of Sora and Gemini. From Compute Overhang @Sama, to a laudable Bioweapon study from OpenAI, and from state-actors using GPT-4 to the future of warfare.

Goody-2: https://w...

2024-02-22 15:38:51 +0000 UTC View Post

Deepfakes - The Peril and Potential

Take a 14 minute tour with me of the cutting edge of deepfakes, from speech-to-speech to politics, YouTube and business. We'll discuss the upsides, including with a senior figure at Elevenlabs - and you'll get to hear my voice with a different content and personality - business potential, as well...

2024-02-15 17:20:08 +0000 UTC View Post

Perplexity CEO - Any questions?

As always, first dibs on questions for my interview guests goes to you guys. And I am lucky enough to able to have Aravind Srinivas, Perplexity founder and CEO, formerly of OpenAI, as a guest later this month. No guarantees for any question but the most upvoted ones will get a ve...

2024-02-08 11:56:25 +0000 UTC View Post

AIExplained

AIExplained activity