Of course, on paper, classifying patents by SDG looks like a standard NLP task. In practice, patent data is dense, technical, uneven, and full of signals that are easy to overread. Not every patent belongs in the SDG framework, and relevant information can be spread neatly across the document. Then, if you want useful results at scale, you need more than a model that can produce a plausible label.
And here is what Cherif introduced at Dataquitaine : instead of using one large language model to process entire documents, we built a pipeline of smaller BERT-based models. Each of them being focused on specific segments and combined with explicit business rules.
The goal was simple: put the computation where the signal actually is, instead of asking one general-purpose model to interpret everything at once.

A core design choice
First, it is much faster than generic solutions. With more details, our approach delivers inference that is 50 times faster than a standard LLM-based setup. Second, it is easier to trust. Each decision is traceable and understandable. That matters in IP, where people need to know why a classification was made, not just receive an answer that sounds confident.
We also introduced a dedicated Non-SDG class from the start. That may sound like a small detail, but it solves a real problem for practicians : general-purpose models tend to force a classification even when the content is outside the scope of the task. That creates noise, weak labels, and a false sense of precision. Giving the system a clear way to say “this does not belong here” makes the overall result much more reliable.
The result is a system that is faster than standard LLM approaches, more interpretable, and more accurate than the usual benchmarks for this type of task.
More broadly, this work reflects a view we hold quite strongly at lipstip: bigger models are not automatically better models. In a field like IP, performance is not just about output. It is about control, traceability, and whether the tool can hold up under real operating conditions. And that is also why we care about frugal AI.
Frugal AI throughout the platform
Frugal AI is not just a buzzword for lipstip. If we took that path, it’s also an engineering decision : when the task is narrowed down, specific and high-stakes, the best answer will often be a system designed around the problem. Giant models who are built to cover everything will lack expertise and will consume much more energy than our solution.
For all of this reasons, Mohamed Cherif Sidhoum presentation at Dataquitaine, held at KEDGE Business School, was an unique opportunity to share that approach with experts from Nouvelle-Aquitaine and surrounding areas. It also reflects the link between academic research and applied product work that matters a lot to us. Mohamed Cherif is conducting this research as part of his PhD at LMAP, the mathematics laboratory of the Université de Pau et des Pays de l’Adour.





