Classifying patents by UN SDGs: lipstip at Dataquitaine

Last week, Mohamed Cherif Sidhoum, data scientist at lipstip, presented our work on patent classification by UN Sustainable Development Goals (SDG) at the 9th edition of Dataquitaine. This is a harder problem than it can sounds.

Glossaire

  • UN SDGs: The United Nations Sustainable Development Goals, a framework of global development objectives.
  • Patent: A legal right protecting a technical invention.
  • NLP: Natural Language Processing, the field focused on how computers analyze and understand text.
  • BERT: A family of language models designed for text understanding tasks.
  • LLM: A large language model trained to process and generate language at scale.
  • Frugal AI: An approach to AI that prioritizes efficiency, precision, and lower resource consumption.

Of course, on paper, classifying patents by SDG looks like a standard NLP task. In practice, patent data is dense, technical, uneven, and full of signals that are easy to overread. Not every patent belongs in the SDG framework, and relevant information can be spread neatly across the document. Then, if you want useful results at scale, you need more than a model that can produce a plausible label.

And here is what Cherif introduced at Dataquitaine : instead of using one large language model to process entire documents, we built a pipeline of smaller BERT-based models. Each of them being focused on specific segments and combined with explicit business rules. 

The goal was simple: put the computation where the signal actually is, instead of asking one general-purpose model to interpret everything at once.

1 hour long presentation to educate our regional data scientist community to our discoveries.

A core design choice

First, it is much faster than generic solutions. With more details, our approach delivers inference that is 50 times faster than a standard LLM-based setup. Second, it is easier to trust. Each decision is traceable and understandable. That matters in IP, where people need to know why a classification was made, not just receive an answer that sounds confident.

We also introduced a dedicated Non-SDG class from the start. That may sound like a small detail, but it solves a real problem for practicians : general-purpose models tend to force a classification even when the content is outside the scope of the task. That creates noise, weak labels, and a false sense of precision. Giving the system a clear way to say “this does not belong here” makes the overall result much more reliable.

The result is a system that is faster than standard LLM approaches, more interpretable, and more accurate than the usual benchmarks for this type of task.

More broadly, this work reflects a view we hold quite strongly at lipstip: bigger models are not automatically better models. In a field like IP, performance is not just about output. It is about control, traceability, and whether the tool can hold up under real operating conditions. And that is also why we care about frugal AI.

Frugal AI throughout the platform

Frugal AI is not just a buzzword for lipstip. If we took that path, it’s also an engineering decision : when the task is narrowed down, specific and high-stakes, the best answer will often be a system designed around the problem. Giant models who are built to cover everything will lack expertise and will consume much more energy than our solution.

For all of this reasons, Mohamed Cherif Sidhoum presentation at Dataquitaine, held at KEDGE Business School, was an unique opportunity to share that approach with experts from Nouvelle-Aquitaine and surrounding areas. It also reflects the link between academic research and applied product work that matters a lot to us. Mohamed Cherif is conducting this research as part of his PhD at LMAP, the mathematics laboratory of the Université de Pau et des Pays de l’Adour.

At Dataquitaine, lipstip presented a patent classification method aligned with the UN Sustainable Development Goals using several specialized BERT-based models instead of one large model. This approach is far faster, more interpretable, and more reliable for a technical field like intellectual property. Adding a dedicated Non-SDG class prevents forced classifications. The project also reflects lipstip’s broader commitment to frugal AI: targeted, efficient, and designed for real-world operating conditions.

Join the Thread

Join lipstip’s newsletter and stay ahead of the european IP transformation. News, features, exclusive content and opportunities : subscribe and do not miss a crumb.