Skip to main content

Pipeline Technology Stack Analysis

Date: 2026-02-18 Question: Will the current TypeScript/Node.js stack be sufficient for the processing pipeline long-term, or will we need to migrate to more performant technologies?


1. Current Stack

ComponentTechnologyRole
API serverExpress.js (TypeScript)HTTP ingestion endpoint
Worker runtimeIn-memory FIFO queue (TypeScript)Job scheduling and execution
Pipeline logicPure TypeScript functionsGraph transform, diff, evaluation, evidence
DatabaseMongoDB 7All persistence
UIReact + ViteDashboard and exploration

All pipeline stages run in a single Node.js process. No external message queue. No separate worker processes.


2. Pipeline Stage Profiling

What each stage actually does (compute characteristics)

StagePrimary OperationCPU PatternMemory PatternI/O Pattern
TransformParse JSON, map node types, compute SHA256 entity IDsLight CPU (hashing)Proportional to graph size — all nodes in memorySingle read (HTTP body)
DiffCompare new entities vs existing, generate eventsLight CPU (object comparison)2× entity count (old + new in memory)One DB read (fetch all tenant entities)
UpsertMongoDB bulkWriteMinimal CPUBatch bufferMany DB writes
Path materializationBFS graph traversal per entityModerate CPU (graph traversal, depth-limited)Adjacency lists in memoryDB reads for entity relationships
Authority pathsHash computation, upsert, soft-removeLight CPUProportional to path countMany DB writes
EvaluationRule functions × entities × pathsModerate CPU (rule logic, evidence lookups)Entity + path + evidence in memoryMany DB reads
Evidence packsFetch data, build 9 sections, SHA256 hash, markdown renderModerate CPUOne pack at a timeMany DB reads, one DB write

Key observation

The pipeline is I/O-bound, not CPU-bound. Most time is spent waiting for MongoDB reads and writes. The actual computation (hashing, diffing, rule evaluation) is trivial compared to database round-trips.


3. Scale Scenarios

Scenario A: Single tenant, single connector (today)

MetricValue
Entities per sync~50-200
Authority paths~100-500
Findings~10-50
Evidence packs per sync~5-20
End-to-end time~2-5 seconds
Sync frequencyManual / hourly

Verdict: TypeScript is more than sufficient. Pipeline completes in seconds. No optimization needed.

Scenario B: 10 tenants, 2 connectors each (early growth)

MetricValue
Entities per sync~200-1,000
Authority paths~500-5,000
Findings~50-500
Evidence packs per sync~20-100
End-to-end time~10-30 seconds
Concurrent syncs2-3 (FIFO queue, sequential)

Verdict: TypeScript is sufficient. The FIFO queue serializes work, so syncs don't compete for memory. The bottleneck is MongoDB write throughput, not Node.js CPU. Total pipeline throughput: ~100-200 syncs/hour.

Scenario C: 100 tenants, 3 connectors each (growth)

MetricValue
Entities per sync~1,000-5,000
Authority paths~5,000-50,000
Findings~500-5,000
Evidence packs per sync~100-1,000
End-to-end time~1-5 minutes
Concurrent syncs needed10-20

Verdict: TypeScript can handle it, but architecture changes matter more than language:

  1. In-memory FIFO queue becomes the bottleneck — syncs queue up behind each other. Need an external queue (BullMQ/Redis, SQS) and multiple worker processes.
  2. MongoDB connection pooling — need connection pool per worker, cluster-aware driver.
  3. Path materialization — BFS traversal of 5,000 entities generates significant intermediate state. Still fits in Node.js memory (objects are small), but GC pressure increases.
  4. Evidence packs — 1,000 packs per sync is the first stage that benefits from parallelization. Spawn N worker processes.

The fix is architectural (external queue + worker scaling), not linguistic (rewrite in Go/Rust).

Scenario D: 1,000+ tenants, 5+ connectors each (enterprise scale)

MetricValue
Entities per sync~5,000-50,000
Authority paths~50,000-500,000
Findings~5,000-50,000
Evidence packs per sync~1,000-10,000
End-to-end time~5-30 minutes
Concurrent syncs needed50-100

Verdict: This is where language choice starts to matter — but only for specific stages.


4. Where TypeScript Hits Limits (and Where It Doesn't)

Never a problem in TypeScript

OperationWhy
HTTP API serverExpress handles thousands of req/s. API is thin (validation + enqueue).
Rule evaluationPure functions, small inputs. 12 rules × 5,000 entities = 60,000 function calls in <1 second.
JSON serializationV8's JSON.parse/stringify is among the fastest in any language.
SHA256 hashingNode.js crypto module uses OpenSSL C bindings. Same speed as Go/Rust.
MongoDB operationsmongo driver is async I/O. Language doesn't matter — you're waiting for the network.
Evidence pack assemblyFetch data, build objects. I/O-bound, not CPU-bound.

Potential problems at enterprise scale

OperationIssueThresholdMitigation
BFS graph traversal (path materialization)50,000 entities × depth-4 BFS = significant intermediate memory>50K entities per syncStream entities from DB instead of loading all into memory. Use cursor-based traversal.
Diff computationRequires all existing entities in memory for comparison>100K entities per tenantIncremental diff: only fetch entities matching incoming source_system + source_id. Already possible.
GC pausesV8 garbage collector can cause 50-200ms pauses on heaps >1.5GB>1.5GB heapReduce object churn. Use typed arrays for hash computation. Split into worker threads.
Single-threaded event loopOne CPU-heavy sync blocks all other syncs>10 concurrent CPU-heavy syncsUse worker_threads for CPU-bound stages, or separate worker processes.
Markdown renderingLarge evidence packs with extensive evidence generate significant string concatenation>10K evidence packs per syncStream to file instead of building in memory. Unlikely to be a real issue.

5. Comparison with Alternative Technologies

Go

AspectAdvantage over TypeScriptDisadvantage
ConcurrencyGoroutines are lighter than Node.js async tasksMarginal — both are async I/O. Go wins only for CPU-parallel work.
MemoryNo GC pauses for heaps <10GB (concurrent GC)Requires rewriting all domain types, rules, evidence logic
CPU throughput~2-5x faster for pure computationPipeline is I/O-bound — 2x CPU doesn't help when you're waiting for MongoDB
Type systemComparable (Go generics + structs vs TS interfaces)Loss of shared types with React UI
EcosystemStrong for infrastructure/pipeline toolingWeaker for web UI, testing frameworks

When Go would matter: If graph traversal or rule evaluation becomes CPU-bound at >50K entities AND worker_threads isn't sufficient.

Rust

AspectAdvantage over TypeScriptDisadvantage
MemoryZero-cost abstractions, no GCMassive rewrite cost. 10x development time for equivalent features.
CPU throughput~5-20x faster for pure computationPipeline is I/O-bound. Rust waiting for MongoDB is same speed as TS waiting for MongoDB.
SafetyMemory safety guaranteesTypeScript already provides type safety. Memory bugs are not our failure mode.

When Rust would matter: Never, for this pipeline. The compute-to-I/O ratio doesn't justify it. Rust shines for real-time systems, embedded, or CPU-bound data processing (video encoding, ML inference). Our pipeline is "read from DB, apply rules, write to DB."

Python

AspectAdvantage over TypeScriptDisadvantage
Data processingpandas, numpy for batch analyticsWe don't do statistical analysis — rules are deterministic
ML/AIIf we ever need ML-based classificationFounder requirement: no ML, no heuristics, deterministic only
EcosystemStrong for data engineeringSlower runtime, GIL limits concurrency

When Python would matter: If the platform adds ML-based anomaly detection. Current design constraint explicitly forbids this.


6. The Real Bottleneck Progression

Based on pipeline profiling, bottlenecks will hit in this order as scale increases:

Scale            Bottleneck                  Fix
────── ────────── ───
~200 entities None Current stack works
~1K entities FIFO queue serialization External queue (BullMQ/Redis)
~5K entities Single worker process Multiple worker processes
~10K entities MongoDB write throughput Batch size tuning, write concern
~50K entities In-memory entity diff Incremental diff (cursor-based)
~100K entities GC pauses during traversal worker_threads for BFS stage
~500K entities Single MongoDB instance MongoDB replica set + read preference
~1M entities All of the above combined Consider Go for traversal stage only

Key insight: You hit 5 architectural bottlenecks before you hit a language bottleneck. Rewriting in Go or Rust before solving the architectural issues would not improve performance.


7. Recommendation

Short term (W1.1, 0-6 months): Stay with TypeScript

  • The pipeline is I/O-bound. Node.js async I/O is as fast as any language for this workload.
  • Shared types between API, pipeline, and UI reduce bugs and development time.
  • The team knows TypeScript. Context-switching to a new language slows delivery.
  • All 12 evaluator rules are pure functions — easy to test, easy to profile.

Medium term (6-18 months): Architectural changes, same language

When reaching 10+ tenants with 1,000+ entities each:

  1. Replace in-memory queue with BullMQ/Redis — enables multiple worker processes, job persistence, retry with backoff, priority queues.
  2. Run N worker processes — one per CPU core. Each dequeues and processes independently. Node.js cluster module or PM2.
  3. Incremental diff — don't load all tenant entities. Query only matching source_system + source_id pairs.
  4. MongoDB read replicas — route read-heavy operations (evaluation, evidence pack reads) to secondaries.

These changes keep TypeScript and multiply throughput 10-50x.

Long term (18+ months): Selective rewrite if data shows need

If profiling at >50K entities per sync shows that BFS graph traversal or rule evaluation is genuinely CPU-bound (>50% of sync time in CPU, not I/O):

  1. Extract traversal stage to a Go microservice — accepts entity graph, returns authority paths. Called from the TypeScript pipeline via HTTP or gRPC.
  2. Keep everything else in TypeScript — API, evaluation rules, evidence packs, UI all stay.
  3. Never rewrite the whole platform — the ROI is negative. A full rewrite costs 6-12 months and introduces regressions with zero feature progress.

What NOT to do

Anti-patternWhy
Rewrite everything in Go/Rust nowPipeline is I/O-bound. New language doesn't help. Costs 6+ months with zero feature progress.
Add Kafka/RabbitMQ for job queueMassive infrastructure complexity. BullMQ + Redis gives the same benefits with 10% of the ops burden.
Premature worker_threadsAdds complexity (shared memory, message passing). Only justified when profiling shows GC or CPU as the bottleneck.
Add a separate "pipeline service"Splits the codebase for no performance gain. Monolith is faster to develop and deploy until there's a specific scaling reason to split.

8. Decision Summary

QuestionAnswer
Is TypeScript sufficient for W1.1?Yes. No contest.
Is TypeScript sufficient for 100 tenants?Yes, with architectural changes (external queue, worker processes).
Is TypeScript sufficient for 1,000+ tenants?Probably yes, with incremental diff + read replicas. Profile first.
When would we consider Go?When profiling shows >50% CPU time in graph traversal at >50K entities. Extract that one stage.
When would we consider Rust?Never for this pipeline. Wrong tool for I/O-bound batch processing.
When would we consider Python?Only if we add ML-based detection, which the current design explicitly forbids.

Bottom line: Fix the architecture (queue, workers, incremental diff) before considering a language change. The first 5 performance walls are all architectural, not linguistic.


Next Action

Status: adopted — shipped TypeScript/Node.js confirmed as primary platform stack; Python for connectors. Reflected in 11-platform-mental-model.md and current repo structure. No further action required.