Context
SILVA (Sistema Inteligente de Levantamento, Vinculação e Análise) is TRF1’s institutional AI platform for judicial document intelligence. It covers three core workflows: triage of incoming attachments, organization of case backlogs, and batch generation of decision drafts (minutas). The system runs inside the PJe platform and is available to all second-instance chambers and the Vice-Presidency of the court.
The AI models powering SILVA were originally developed by the University of Brasília and later rebuilt from scratch by TRF1’s Innovation Lab. My work began when the system moved beyond research-phase infrastructure and needed production-grade ML engineering: stable pipelines, repeatable retraining, and APIs that legal teams could rely on daily.
My role
- Senior Machine Learning Engineer at TTY2000 inside the TRF1 modernization program.
- Led the refactoring of the core ML services that back SILVA’s classification and clustering features.
- Built orchestration and retraining pipelines to move model updates from ad-hoc manual work to controlled, versioned flows.
- Delivered the analyst-facing Django APIs consumed by more than 500 internal users across court chambers.
Problem
SILVA had proven its value as a research prototype. The challenge was making it an operationally reliable system at the scale of a major federal court. That meant addressing several gaps at once:
- Legacy ML code that was slow, hard to version, and dependent on undocumented manual steps
- No repeatable retraining flow: model updates required significant manual coordination each time
- API layer not robust enough for 500+ concurrent institutional users
- No observability over model behavior in production, making drift invisible
The court also expected the system to grow in capability — new Resource Object categories, finer classification taxonomies, and future integration with LLMs — which made the underlying architecture a long-term concern, not just a maintenance task.
Architecture
The modernized system is built around three layers that were previously disconnected:
ML pipeline layer
- XGBoost classifiers with TF-IDF feature extraction for document classification and case clustering
- DVC for dataset versioning and experiment reproducibility
- MLflow for experiment tracking, model registry, and deployment packaging
- Airflow DAGs for scheduled retraining, data validation, and pipeline orchestration
API and serving layer
- Django REST APIs serving classification results to court-facing PJe integrations
- PostgreSQL for structured case metadata and classification outputs
- Docker for containerized, environment-consistent deployment
Operational layer
- Versioned model artifacts with traceable lineage from training data to production model
- Pipeline observability so that classification drift and retraining triggers are visible to the team
Challenges
- Legal workflows demand high precision. A misclassified resource object sends a legal officer down the wrong analytical path, so improving classification accuracy and taxonomy coverage had direct operational consequences.
- The system had to keep running while being modernized. Refactoring core ML services without disrupting active users across multiple court chambers required careful staged rollout.
- Retraining cadence and data governance are institutional problems as much as technical ones. Getting alignment on when and how to retrain required working across engineering, legal, and operational stakeholders.
Solution
I treated the modernization as a delivery problem, not just a technical one. The priority was building the infrastructure that makes model updates safe and routine:
- Replaced ad-hoc training scripts with versioned, reproducible pipelines backed by DVC and Airflow.
- Introduced MLflow to make every experiment traceable and every model artifact auditable.
- Refactored the serving layer to reduce latency and improve reliability under the real load of 500+ daily users.
- Worked with the domain team to expand the Resource Object taxonomy, contributing the engineering that made finer-grained classification viable in production.
Impact
- Processing time reduced by 25% through refactoring of legacy ML services.
- Django APIs deployed and actively used by 500+ internal users across TRF1 second-instance chambers.
- Model update cycles shortened from weeks to days by replacing manual steps with Airflow-orchestrated retraining flows.
- Third Section Resource Object coverage expanded from 19 to 28 categories following production deployment, enabling more precise case routing and jurisprudence clustering.
- System now structured to support future LLM integration as a natural extension of the existing pipeline architecture.