How Nubank refactors millions of lines of code to improve engineering efficiency with Duplo

8x
engineering time efficiency gain
20x
cost savings
Vimeo

Overview

One of Nubank’s most critical, company-wide projects for 2023-2024 was a migration of their core ETL — an 8 year old, multi-million lines of code monolith — to sub-modules. To handle such a large refactor, their only option was a multi-year effort that distributed repetitive refactoring work across over one thousand of their engineers. With Duplo, however, this changed: engineers were able to delegate Duplo to handle their migrations and achieve a 12x efficiency improvement in terms of engineering hours saved, and over 20x cost savings. Among others, Data, Collections, and Risk business units verified and completed their migrations in weeks instead of months or years.

The Problem

Nubank was born into the tradition of centralized ETL FinServ architectures. To date, the monolith architecture had worked well for Nubank — it enabled the developer autonomy and flexibility that carried them through their hypergrowth phases. After 8 years, however, Nubank’s sheer volume of customer growth, as well as geographic and product expansion beyond their original credit card business, led to an entangled, behemoth ETL with countless cross-dependencies and no clear path to continuing to scale.

For Nubankers, business critical data transformations started taking increasingly long to run, with chains of dependencies as deep as 70 and insufficient formal agreements on who was responsible for maintaining what. As the company continued to grow, it became clear that the ETL would be a primary bottleneck to scale.

Nubank concluded that there was an urgent need to split up their monolithic ETL repository, amassing over 6 million lines of code, into smaller, more flexible sub-modules.

Nubank’s code migration was filled with the monotonous, repetitive work that engineers dread. Moving each data class implementation from one architecture to another while tracing imports correctly, performing multiple delicate refactoring steps, and accounting for any number of edge cases was highly tedious, even to do just once or twice. At Nubank’s scale, however, the total migration scope involved more than 1,000 engineers moving ~100,000 data class implementations over an expected timeline of 18 months.

In a world where engineering resources are scarce, such large-scale migrations and modernizations become massively expensive, time-consuming projects that distract from any engineering team’s core mission: building better products for customers. Unfortunately, this is the reality for many of the world’s largest organizations.

The Decision: an army of Duplos to tackle subtasks in parallel

At project outset in 2023, Nubank had no choice but to rely on their engineers to perform code changes manually. Migrating one data class was a highly discretionary task, with multiple variations, edge cases, and ad hoc decision-making — far too complex to be scriptable, but high-volume enough to be a significant manual effort.

Within weeks of Duplo’s launch, Nubank identified a clear opportunity to accelerate their refactor at a fraction of the engineering hours. Migration or large refactoring tasks are often fantastic projects for Duplo: after investing a small, fixed cost to teach Duplo how to approach sub-tasks, Duplo can go and complete the migration autonomously. A human is kept in the loop just to manage the project and approve Duplo’s changes.

The Solution: Custom ETL Migration Duplo

A task of this magnitude, with the vast number of variations that it had, was a ripe opportunity for fine-tuning. The Nubank team helped to collect examples of previous migrations their engineers had done manually, some of which were fed to Duplo for fine-tuning. The rest were used to create a benchmark evaluation set. Against this evaluation set, we observed a doubling of Duplo’s task completion scores after fine-tuning, as well as a 4x improvement in task speed. Roughly 40 minutes per sub-task dropped to 10, which made the whole migration start to look much cheaper and less time-consuming, allowing the company to devote more energy to new business and new value creation instead.

Duplo contributed to its own speed improvements by building itself classical tools and scripts it would later use on the most common, mechanical components of the migration. For instance, detecting the country extension of a data class (either ‘br’, ‘co’, or ‘mx’) based on its file path was a few-step process for each sub-task. Duplo’s script automatically turned this into a single step executable — improvements from which added up immensely across all tens of thousands of sub-tasks.

There is also a compounding advantage on Duplo’s learning. In the first weeks, it was common to see outstanding errors to fix, or small things Duplo wasn’t sure how to solve. But as Duplo saw more examples and gained familiarity with the task, it started to avoid rabbit holes more often and find faster solutions to previously-seen errors and edge cases. Much like a human engineer, we observed obvious speed and reliability improvements with every day Duplo worked on the migration.

Results: Delivering an 8-12x faster migration, lifting a burden from every engineer, and slashing migration costs by 20x.

“Duplo provided an easy way to reduce the number of engineering hours for the migration, in a way that was more stable and less prone to human error. Rather than engineers having to work across several files and complete an entire migration task 100%, they could just review Duplo’s changes, make minor adjustments, then merge their PR”

Jose Carlos Castro, Senior Product Manager

8-12x efficiency gains This is calculated by comparing the typical engineering hours required to complete a data class migration task against the total engineering hours spent prompting and reviewing Duplo’s work on the same task.
Over 20x cost savings on scope of the migration delegated to Duplo This is calculated by comparing the cost of running Duplo versus the hourly cost of an engineer completing that task. The significant savings are heavily driven by speed of task execution and cost effectiveness of Duplo relative to human engineering time – it does not even consider the value captured by completing the entire project months ahead of schedule!
Fewer dreaded migration tasks for Nubank engineers

Duplo, the AI
DevOps engineer

Crush your backlog with your personal AI DevOps team.

  • 1
    Ticket
    Integrate Slack, Web, and VS Code
  • 2
    Plan
    Quickly review Duplo's proposal
  • 3
    Execute
    Duplo makes the changes by itself
  • 4
    Test
    Review the results natively
Duplo, the AI software engineer
slack
Linear
Duplo, the AI software engineer Duplo, the AI software engineer
Duplo, the AI software engineer
Duplo, the AI software engineer
Duplo, the AI software engineer
  • Ticket
    Integrate Slack, Linear, and Jira
  • Plan
    Quickly review Duplo's proposal
  • Execute
    Duplo tests changes by itself
  • Test
    Review changes natively

Duplo is operating cloud infrastructure for hundreds of organizations.

Use cases

From deploying new applications, trouble-shooting existing infrastructure to achieving compliance, Duplo can clear your Devops backlog, and help you save operational cost.

Developer Self-service

  • Deploy, manage and scale apps
  • Enforce best practices
  • CI/CD Pipelines and Ephemeral enviornments
visual representation of the feature

Cloud Operations

  • Site reliability
  • Observability and Finops
  • IAC and CI/CD Automation
visual representation of the feature

Migrations and Modernizations

  • Migrate workloads from onprem to cloud
  • Move from VMs to Kubernetes
  • Workload discovery and cost analysis.
visual representation of the feature
visual representation of the feature

Code Migration + Refactors

  • Language migrations
  • Version upgrades
  • Codebase restructuring
visual representation of the feature

Data Engineering + Analysis

  • Data warehouse migrations
  • ETL development
  • Data cleaning and preprocessing
visual representation of the feature

Bugs + Backlog Work

  • Ticket resolution
  • CI/CD
  • First-draft PR creation for backlog tasks

Application
Deployment

  • Kubernetes, Serverless
  • Data Science Workloads
  • Cloud services like RDS, S3, Azure blog etc
  • CI/CD Pipelines
  • Ephemeral Enviornments

Comprehensive
Observability

  • Logging
  • Alerting
  • Metrics
  • SLA/SLO Monitoring
  • APM and Traces

And many others

  • Generate IAC for existing infrastructure.
  • Performance optimization
  • Generate Topology Diagrams
  • Access controls
  • Auditing and Reports

Build your own
Devops AI Engineer

Duplo provides a framework where you can plugin your own agents that implement your unique workflows.

visual
visual
Use when
When working in the backend repo
Approved new knowledge: When working in the backend repo
Rejected new knowledge: When working in the backend repo

Duplo learns your existing infrastructure &
operates it like a senior hire.

visual
visual
visual

Agentic Helpdesk

Ticketing system with AI Agents
that solve user requests in real time.

visual
visual
visual visual
Collaborate

Use Duplo in browser IM
or VS Code

Trouble Shooting from slack, edit in VS code or
use the browser interfac at any time.

visual

Able to work
with hundreds of tools

Duplo connects to your favorite MCP servers, from Asana to Zapier

Build together with Confluence
Build together with Airtable
Build together with Segment
Build together with Asana
Build together with Notion
Build together with Stripe
Build together with AWS
Build together with GitHub
Build together with Datadog
Build together with Linear
Build together with Databricks
Build together with Slack
Build together with Google Drive
Build together with Sentry
Build together with PostgreSQL
Build together with Azure
Build together with Snowflake
Build together with MongoDB

GitHub

Duplo can independently create PRs, respond to PR comments, review PRs, etc.

Linear

Assign Duplo tickets directly in Linear, or add the Duplo tag.

Slack

Assign Duplo tasks by tagging @Duplo in Slack. Duplo keeps you updated on progress in Slack replies.

Linear

Tag @Duplo directly in Linear tickets or add the Duplo tag to delegate tasks to Duplo.

GitHub
Duplo can independently create PRs, respond to PR comments, review PRs, etc.
Linear
Tag @Duplo directly in Linear tickets or add the Duplo tag to delegate tasks to Duplo.
Slack
Assign Duplo tasks by tagging @Duplo in Slack. Duplo keeps you updated on progress in Slack replies.
Industry leaders choose to

Build with Duplo

Hear from our customers