A Day at ai-PULSE 2025
Publié le :
On Thursday, December 4th, I had the opportunity to participate in the ai-PULSE 2025 event, a day of keynotes and conferences dedicated to artificial intelligence, organized by Scaleway at Station F in Paris. In this post, I summarize the different presentations I was able to attend during this day. All the content that follows comes only from my notes, and I don’t add any additional information; there is only my personal interpretation.

Opening Keynotes
Yann LeCun & Pim de Witte - Replay
Yann LeCun recently left his position as Chief AI Scientist at Meta to build his own AI company, about which we don’t have much information yet. His speech during this opening keynote focused on the limits of text models: according to him, the future (and what we can equate to AGI) will not come from LLMs:
“Scaling LLM untill AGI is BS […] Not all our knowledge is text, in fact most of knowledge is not text”
The goal would then be to build a more global model, capable not of reasoning through text, but through thought. This model would be capable of understanding abstract representations, interacting with the real environment which is much more complex than the textual dimension.
The topic of data is also developed: textual data is very simple to retrieve thanks to the internet, while visual data is more complicated to find, but it contains much more information (and complexity) than in text. The idea of generating video content through simulation or video games is also mentioned, not forgetting smartglasses like the Meta Ray-Ban, which can allow retrieving a significant amount of visual data, potentially annotatable.
Finally, it is very likely that this question of a global model, capable of abstract representations and interactions, is the subject of Yann LeCun’s future venture. He notably mentions that everyone is today obsessed with LLMs, particularly Silicon Valley, and that there are therefore things to do in Paris itself; echoing the wealth of talent found in Europe, but whose potential is often underestimated.
Rémi Cadene (UMA) - Replay
Rémi Cadene went through Tesla to work on autonomous driving, then joined HuggingFace and built the LeRobot team, before co-founding UMA today, alongside Rob Knight, Pierre Sermanet and Simon Alibert. The UMA project, for Universal Mechanical Assistant (and also hUMAnity), aims to offer intelligent robots capable of interacting with the physical world, thus enabling a global improvement in everyone’s life, while also pushing economic growth.
According to him, the most difficult task today in robotics is dexterity, because it seems natural for us as humans to have “touch”, and we can easily grab many objects in different ways without even thinking about it. But it’s not that simple from a robotics perspective, and it’s something that is widely researched and worked on.
Finally, a note on Europe is added: the continent is full of talent, institutions and other capabilities; but it is also the source of a potential market, particularly thanks to its very powerful industrial context, as well as its aging population naturally pushing towards task automation.
Neil Zeghidour (Gradium) - Replay
Neil Zeghidour joined the French laboratory Kyutai after passing through Google and Meta, and he co-founds Gradium today, a company that aims to provide voice interaction solutions. The technical solution doesn’t come from nowhere since it relies heavily on several years of research within Kyutai.
Thus, the laboratory should be seen as an open science actor providing foundation models, but also capable of making major advances in the field, such as the first speech-to-speech model. While Gradium is the concrete application of these techniques to make them operational in an industrial environment, and thus satisfy significant market traction.
The technical aspect of Gradium is therefore very advanced, offering excellent voice quality with very good interaction; as evidenced by the superb on-stage demonstration with the Reachy robot (HuggingFace). Finally, the technical objective is now to push current limits, particularly on emotional understanding, context, and operation in a noisy environment, with multiple speakers.
Conferences
Inference Everywhere: optimizing performance - Replay
Steeve Morin, ZML
Steeve Morin, creator of ZML, presents the ZML solution. The difference between training and inference of models is notable: training is a research task, where we always seek more data and where Python reigns supreme. While inference is a production task, where performance and cost are the most important factors, and where Python becomes a bottleneck.
ZML then positions itself as an ecosystem enabling optimization of the inference part through different techniques. The base component, ZML, is built with Zig, MLIR and OpenXLA, supporting several chips (NVIDIA, AMD, Google TPU, AWS Trainium). Above this component are LLMD and ATTND.
The first is an inference engine with remarkable characteristics (unverified on my part): cold start in 10 seconds, a TTFT (Time To First Token) 3.6x lower, and approximately 5 to 30% more output throughput (I don’t remember if the baseline was mentioned, so the figures should be taken with a grain of salt and verified).
The second, ATTND, focuses on the Attention function, which is the most computationally expensive part in LLMs since it is quadratic. Here ATTND doesn’t brute-force this calculation, but rather calculates it like a graph, offering more than notable performance gains: 2x more compute capacity and 10x less network usage.
From lab to product with European voice model - Replay
Enrico Bertino, indigo.ai & Alexandre Défossez, Kyutai & Constance Morales, Scaleway
Following Neil Zeghidour’s Keynote, this conference goes into more technical details of the voice models developed by Kyutai. To begin, it is necessary to note the distinction between the two means of communication that are text and audio. Text being a very compact, efficient form of transmitting information, while audio contains much more disordered data, but which contains more precision since it can contain a certain rhythm, hesitation, tone…
In this context, audio can be a richer means of information communication, and working on these voice models allows adding emotion, something that is particularly lacking in AI. But developing this technology represents several challenges: biases and subjectivity are just as problematic as in LLMs, and the oral communication channel cannot be implemented in every situation (imagine if everyone working in an open-space spoke to their AI…).
Given this observation, there are two technical architectures for building these voice models, the first being a “cascading” model using a Speech-To-Text model, then an LLM, and finally a Text-To-Speech. Although this architecture allows easily adding features like function calls or RAG, there are latency problems and during a conversation with multiple participants potentially speaking at the same time, it becomes easily unmanageable.
This is why there is a second architecture that consists of performing Speech-To-Speech natively, offering very good performance with a delay of less than 200ms, an adaptable rhythm to a real conversation, and with potential for deployment directly on users’ machines (phone, computer, tablet.). These models are today difficult to adapt to all use cases, and they don’t have the intelligence of LLMs, but they have enormous potential that will probably become the norm for voice models.
Future challenges are also mentioned: language change within a conversation is a real issue, but the main bottleneck is at the level of compliance and security, which are major and inevitable challenges for a generalization of these models. Finally, the next step to develop the ecosystem would be the presence of an actor capable of catalyzing these subjects in the long term, like an equivalent to what GAFAM are for the United States, which notably allows a longer-term vision.
From Foundation models to Real-World Actions - Replay
Jean-Baptiste Kempf, Scaleway & Firas Abi Farraj, Enchanted Tools & Grégoire Linard, Enchanted Tools
This exchange session between Jean-Baptiste Kempf and two CTOs of Enchanted Tools, a company that creates robots capable of verbally exchanging with humans and having applications in several sectors, aimed to demystify aspects of robotics and AI.
The impact of AI in robotics is seen in several aspects: Deep Learning for perception, Reinforcement Learning for robustness, and finally the arrival of LLMs or VLMs which are still to be explored. Something is specific to AI in robotics: no complete model from A to Z exists, but it’s now possible.
The particularity of physical AI is that it must be done on local deployment (the robot itself), limiting computing capacity despite the significant amount of information and processing that would be necessary for a robot to interact with its environment. This is why today most methods rely on a hybrid solution combining LLMs for reasoning tasks, accompanied by lightweight models (vision, sensors, speech-to-text, text-to-speech). Indeed, today all robots integrate CNN algorithms or classic Machine Learning for basic tasks like object detection.
Having a complete solution that works entirely is not simple: hallucination is easy and it is therefore necessary to break down the solution into layers (hence the idea of a hybrid solution), but also to ensure security by adding various security barriers and safeguards, such as classification models to limit the robot’s actions and the types of actions it is capable of performing, all to protect the robot’s environment.
Finally, an essential aspect concerns social interactions, a subject on which it is important to work since it is an obvious part of the direct application of these robots. It is then interesting to look into the subject of finding a balance between computing performance and social connection, an aspect that is now widely explored, since there are even psychologists working on it.
Indeed, interaction varies depending on people (responding with energy to children, responding more slowly to the elderly) and this is a subject worked on by design in building robots with a cute or amusing appearance, and by software in developing systems imitating human behaviors or certain emotions. This is a necessary step and it’s good to be interested in it to enable concrete deployment of robots in our society.
Agentic Stack for Regulated Industries: Architecture Essentials - Replay
Han Heloir, Mistral AI
This conference talks about deploying AI solutions applied to regulated domains (insurance, banks, healthcare…) starting from a simple observation: most projects are demonstrators, often wonderful but which hide many flaws: lack of knowledge about data provenance and their processing, absence of traceability and thus a total lack of compliance.
This is why most companies only deliver prototypes, and nothing in production. Indeed, there is a great lack of visibility, observability and telemetry, with AI workflows no longer working in production environments and no traceability of the “assets” of an AI solution: models, prompts, datasets…
“Are you building AI to impress or are you building AI to last ?”
There is therefore a real need for visibility on the performance of these AI workflows with sustainable execution, clear management of assets that are used, but also simple observability through explorers, judges or dashboards. Thus, this will notably allow reusing these same assets with confidence thanks to unified catalogs, versioning, integration layers, APIs, and SDKs.
