Prof. Jakob Foerster - ImageNet Moment for Reinforcement Learning?
Prof. Jakob Foerster, a leading AI researcher at Oxford University and Meta, and Chris Lu, a researcher at OpenAI -- they explain how AI is moving beyond just mimicking human behaviour to creating truly intelligent agents that can learn and solve problems on their own. Foerster champions open-source AI for responsible, decentralised development. He addresses AI scaling, goal misalignment (Goodhart's Law), and the need for holistic alignment, offering a quick look at the future of AI and how to guide it.SPONSOR MESSAGES:***CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!https://centml.ai/pricing/Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/***TRANSCRIPT/REFS:https://www.dropbox.com/scl/fi/yqjszhntfr00bhjh6t565/JAKOB.pdf?rlkey=scvny4bnwj8th42fjv8zsfu2y&dl=0 Prof. Jakob Foersterhttps://x.com/j_foersthttps://www.jakobfoerster.com/University of Oxford Profile: https://eng.ox.ac.uk/people/jakob-foerster/Chris Lu:https://chrislu.page/TOC1. GPU Acceleration and Training Infrastructure [00:00:00] 1.1 ARC Challenge Criticism and FLAIR Lab Overview [00:01:25] 1.2 GPU Acceleration and Hardware Lottery in RL [00:05:50] 1.3 Data Wall Challenges and Simulation-Based Solutions [00:08:40] 1.4 JAX Implementation and Technical Acceleration2. Learning Frameworks and Policy Optimization [00:14:18] 2.1 Evolution of RL Algorithms and Mirror Learning Framework [00:15:25] 2.2 Meta-Learning and Policy Optimization Algorithms [00:21:47] 2.3 Language Models and Benchmark Challenges [00:28:15] 2.4 Creativity and Meta-Learning in AI Systems3. Multi-Agent Systems and Decentralization [00:31:24] 3.1 Multi-Agent Systems and Emergent Intelligence [00:38:35] 3.2 Swarm Intelligence vs Monolithic AGI Systems [00:42:44] 3.3 Democratic Control and Decentralization of AI Development [00:46:14] 3.4 Open Source AI and Alignment Challenges [00:49:31] 3.5 Collaborative Models for AI DevelopmentREFS[[00:00:05] ARC Benchmark, Chollethttps://github.com/fchollet/ARC-AGI[00:03:05] DRL Doesn't Work, Irpanhttps://www.alexirpan.com/2018/02/14/rl-hard.html[00:05:55] AI Training Data, Data Provenance Initiativehttps://www.nytimes.com/2024/07/19/technology/ai-data-restrictions.html[00:06:10] JaxMARL, Foerster et al.https://arxiv.org/html/2311.10090v5[00:08:50] M-FOS, Lu et al.https://arxiv.org/abs/2205.01447[00:09:45] JAX Library, Google Researchhttps://github.com/jax-ml/jax[00:12:10] Kinetix, Mike and Michaelhttps://arxiv.org/abs/2410.23208[00:12:45] Genie 2, DeepMindhttps://deepmind.google/discover/blog/genie-2-a-large-scale-foundation-world-model/[00:14:42] Mirror Learning, Grudzien, Kuba et al.https://arxiv.org/abs/2208.01682[00:16:30] Discovered Policy Optimisation, Lu et al.https://arxiv.org/abs/2210.05639[00:24:10] Goodhart's Law, Goodharthttps://en.wikipedia.org/wiki/Goodhart%27s_law[00:25:15] LLM ARChitect, Franzen et al.https://github.com/da-fr/arc-prize-2024/blob/main/the_architects.pdf[00:28:55] AlphaGo, Silver et al.https://arxiv.org/pdf/1712.01815.pdf[00:30:10] Meta-learning, Lu, Towers, Foersterhttps://direct.mit.edu/isal/proceedings-pdf/isal2023/35/67/2354943/isal_a_00674.pdf[00:31:30] Emergence of Pragmatics, Yuan et al.https://arxiv.org/abs/2001.07752[00:34:30] AI Safety, Amodei et al.https://arxiv.org/abs/1606.06565[00:35:45] Intentional Stance, Dennetthttps://plato.stanford.edu/entries/ethics-ai/[00:39:25] Multi-Agent RL, Zhou et al.https://arxiv.org/pdf/2305.10091[00:41:00] Open Source Generative AI, Foerster et al.https://arxiv.org/abs/2405.08597<trunc, see PDF/YT>