TTY2000 | TRF1

SILVA — Judicial AI Platform at TRF1

SILVA is TRF1's institutional AI platform for judicial document triage, case-file organization, and batch minute generation. I led the ML modernization effort: refactoring legacy pipelines, building orchestration and retraining flows, and delivering the APIs that brought the system into daily use across all second-instance chambers.

Senior Machine Learning Engineer · Jan 2024 - Present

Stack

  • Python
  • Django
  • PostgreSQL
  • Airflow
  • DVC
  • MLflow
  • Docker
  • XGBoost
  • scikit-learn
  • TF-IDF

Primary impact

Modernized the core ML services behind SILVA, cutting processing time by 25% and enabling 500+ internal users through stable analyst-facing APIs inside TRF1's production PJe environment.

Outcomes

  • Processing time reduced by 25% through legacy ML system refactoring
  • Analyst-facing Django APIs delivered to 500+ internal users across TRF1 chambers
  • Model update cycles shortened from weeks to days via Airflow and MLflow pipelines
  • Third Section Resource Object taxonomy expanded from 19 to 28 categories after production launch

Context

SILVA (Sistema Inteligente de Levantamento, Vinculação e Análise) is TRF1’s institutional AI platform for judicial document intelligence. It covers three core workflows: triage of incoming attachments, organization of case backlogs, and batch generation of decision drafts (minutas). The system runs inside the PJe platform and is available to all second-instance chambers and the Vice-Presidency of the court.

The AI models powering SILVA were originally developed by the University of Brasília and later rebuilt from scratch by TRF1’s Innovation Lab. My work began when the system moved beyond research-phase infrastructure and needed production-grade ML engineering: stable pipelines, repeatable retraining, and APIs that legal teams could rely on daily.

My role

  • Senior Machine Learning Engineer at TTY2000 inside the TRF1 modernization program.
  • Led the refactoring of the core ML services that back SILVA’s classification and clustering features.
  • Built orchestration and retraining pipelines to move model updates from ad-hoc manual work to controlled, versioned flows.
  • Delivered the analyst-facing Django APIs consumed by more than 500 internal users across court chambers.

Problem

SILVA had proven its value as a research prototype. The challenge was making it an operationally reliable system at the scale of a major federal court. That meant addressing several gaps at once:

  • Legacy ML code that was slow, hard to version, and dependent on undocumented manual steps
  • No repeatable retraining flow: model updates required significant manual coordination each time
  • API layer not robust enough for 500+ concurrent institutional users
  • No observability over model behavior in production, making drift invisible

The court also expected the system to grow in capability — new Resource Object categories, finer classification taxonomies, and future integration with LLMs — which made the underlying architecture a long-term concern, not just a maintenance task.

Architecture

The modernized system is built around three layers that were previously disconnected:

ML pipeline layer

  • XGBoost classifiers with TF-IDF feature extraction for document classification and case clustering
  • DVC for dataset versioning and experiment reproducibility
  • MLflow for experiment tracking, model registry, and deployment packaging
  • Airflow DAGs for scheduled retraining, data validation, and pipeline orchestration

API and serving layer

  • Django REST APIs serving classification results to court-facing PJe integrations
  • PostgreSQL for structured case metadata and classification outputs
  • Docker for containerized, environment-consistent deployment

Operational layer

  • Versioned model artifacts with traceable lineage from training data to production model
  • Pipeline observability so that classification drift and retraining triggers are visible to the team

Challenges

  • Legal workflows demand high precision. A misclassified resource object sends a legal officer down the wrong analytical path, so improving classification accuracy and taxonomy coverage had direct operational consequences.
  • The system had to keep running while being modernized. Refactoring core ML services without disrupting active users across multiple court chambers required careful staged rollout.
  • Retraining cadence and data governance are institutional problems as much as technical ones. Getting alignment on when and how to retrain required working across engineering, legal, and operational stakeholders.

Solution

I treated the modernization as a delivery problem, not just a technical one. The priority was building the infrastructure that makes model updates safe and routine:

  • Replaced ad-hoc training scripts with versioned, reproducible pipelines backed by DVC and Airflow.
  • Introduced MLflow to make every experiment traceable and every model artifact auditable.
  • Refactored the serving layer to reduce latency and improve reliability under the real load of 500+ daily users.
  • Worked with the domain team to expand the Resource Object taxonomy, contributing the engineering that made finer-grained classification viable in production.

Impact

  • Processing time reduced by 25% through refactoring of legacy ML services.
  • Django APIs deployed and actively used by 500+ internal users across TRF1 second-instance chambers.
  • Model update cycles shortened from weeks to days by replacing manual steps with Airflow-orchestrated retraining flows.
  • Third Section Resource Object coverage expanded from 19 to 28 categories following production deployment, enabling more precise case routing and jurisprudence clustering.
  • System now structured to support future LLM integration as a natural extension of the existing pipeline architecture.

Next step

Need the broader background behind this work?

The about page connects these case studies to the rest of my delivery history across courts, agencies, and AI platform work.