Context
SABIA was a strategic R and D initiative for the Superior Labor Court of Brazil. The work was tied to the evolution of the Bem-Te-Vi platform, with the goal of improving jurisprudence analysis and process clustering while testing how continuous-learning patterns could fit a legal institution.
This was not a simple model-training exercise. The project asked whether research-grade NLP techniques could become a credible base for legal-domain tooling that needed to stay technically robust and legally meaningful over time.
My role
- Worked as AI Research Team Lead inside the initiative.
- Led an internal AI research sub-team across task delegation, methodology design, and technical quality.
- Helped connect experimentation, legal-domain constraints, and technical documentation so the research work stayed usable for the institution.
Problem
The court needed better ways to explore large volumes of legal text and cluster related processes without reducing the problem to simplistic labeling. At the same time, the platform needed a path toward continuous improvement rather than a static model that would age quickly once user needs changed.
That created a two-part challenge:
- build unsupervised NLP pipelines that could reveal useful structure in judicial text
- explore how user feedback could eventually become part of a longer-life learning loop
Architecture
The work combined a research stack with delivery-oriented data handling:
- ETL pipelines with Pandas and NumPy for judicial data preparation
- preprocessing, feature extraction, and analysis workflows for legal text
- unsupervised NLP experiments using Transformers, scikit-learn, and spaCy
- clustering and exploratory analysis for jurisprudence and process-group discovery
- visualization support for technical interpretation and stakeholder discussion
- an experimental LLML loop designed around future feedback-driven updates
The core architectural decision was to treat the research layer as something that should be inspectable, documented, and reusable, not as isolated experiments with no path forward.
Challenges
- Legal text contains patterns that are subtle, ambiguous, and highly dependent on domain interpretation.
- Unsupervised outputs are only useful when teams can explain what they mean and where they fail.
- Continuous-learning ideas in public-sector AI need clear experimental boundaries before they can influence operational systems.
Solution
I organized the work around a research pipeline that could move from raw judicial data to interpretable clustering outputs with enough structure for technical review and legal discussion. That included ETL, feature extraction, exploratory modeling, and documentation disciplined enough to support scientific reporting.
In parallel, I co-designed an LLML concept that could take future user feedback seriously without pretending the system was already ready for automatic production adaptation. That kept the work ambitious, but still grounded.
Impact
- Delivered unsupervised NLP pipelines that enabled clustering and exploration of legal case data.
- Initiated and validated a prototype LLML workflow for future adaptation through user feedback.
- Helped ensure the research outputs stayed technically sound and legally relevant through close collaboration with domain experts.