Pipeline Technology Stack Analysis
Date: 2026-02-18 Question: Will the current TypeScript/Node.js stack be sufficient for the processing pipeline long-term, or will we need to migrate to more performant technologies?
1. Current Stack
| Component | Technology | Role |
|---|---|---|
| API server | Express.js (TypeScript) | HTTP ingestion endpoint |
| Worker runtime | In-memory FIFO queue (TypeScript) | Job scheduling and execution |
| Pipeline logic | Pure TypeScript functions | Graph transform, diff, evaluation, evidence |
| Database | MongoDB 7 | All persistence |
| UI | React + Vite | Dashboard and exploration |
All pipeline stages run in a single Node.js process. No external message queue. No separate worker processes.
2. Pipeline Stage Profiling
What each stage actually does (compute characteristics)
| Stage | Primary Operation | CPU Pattern | Memory Pattern | I/O Pattern |
|---|---|---|---|---|
| Transform | Parse JSON, map node types, compute SHA256 entity IDs | Light CPU (hashing) | Proportional to graph size — all nodes in memory | Single read (HTTP body) |
| Diff | Compare new entities vs existing, generate events | Light CPU (object comparison) | 2× entity count (old + new in memory) | One DB read (fetch all tenant entities) |
| Upsert | MongoDB bulkWrite | Minimal CPU | Batch buffer | Many DB writes |
| Path materialization | BFS graph traversal per entity | Moderate CPU (graph traversal, depth-limited) | Adjacency lists in memory | DB reads for entity relationships |
| Authority paths | Hash computation, upsert, soft-remove | Light CPU | Proportional to path count | Many DB writes |
| Evaluation | Rule functions × entities × paths | Moderate CPU (rule logic, evidence lookups) | Entity + path + evidence in memory | Many DB reads |
| Evidence packs | Fetch data, build 9 sections, SHA256 hash, markdown render | Moderate CPU | One pack at a time | Many DB reads, one DB write |
Key observation
The pipeline is I/O-bound, not CPU-bound. Most time is spent waiting for MongoDB reads and writes. The actual computation (hashing, diffing, rule evaluation) is trivial compared to database round-trips.
3. Scale Scenarios
Scenario A: Single tenant, single connector (today)
| Metric | Value |
|---|---|
| Entities per sync | ~50-200 |
| Authority paths | ~100-500 |
| Findings | ~10-50 |
| Evidence packs per sync | ~5-20 |
| End-to-end time | ~2-5 seconds |
| Sync frequency | Manual / hourly |
Verdict: TypeScript is more than sufficient. Pipeline completes in seconds. No optimization needed.
Scenario B: 10 tenants, 2 connectors each (early growth)
| Metric | Value |
|---|---|
| Entities per sync | ~200-1,000 |
| Authority paths | ~500-5,000 |
| Findings | ~50-500 |
| Evidence packs per sync | ~20-100 |
| End-to-end time | ~10-30 seconds |
| Concurrent syncs | 2-3 (FIFO queue, sequential) |
Verdict: TypeScript is sufficient. The FIFO queue serializes work, so syncs don't compete for memory. The bottleneck is MongoDB write throughput, not Node.js CPU. Total pipeline throughput: ~100-200 syncs/hour.
Scenario C: 100 tenants, 3 connectors each (growth)
| Metric | Value |
|---|---|
| Entities per sync | ~1,000-5,000 |
| Authority paths | ~5,000-50,000 |
| Findings | ~500-5,000 |
| Evidence packs per sync | ~100-1,000 |
| End-to-end time | ~1-5 minutes |
| Concurrent syncs needed | 10-20 |
Verdict: TypeScript can handle it, but architecture changes matter more than language:
- In-memory FIFO queue becomes the bottleneck — syncs queue up behind each other. Need an external queue (BullMQ/Redis, SQS) and multiple worker processes.
- MongoDB connection pooling — need connection pool per worker, cluster-aware driver.
- Path materialization — BFS traversal of 5,000 entities generates significant intermediate state. Still fits in Node.js memory (objects are small), but GC pressure increases.
- Evidence packs — 1,000 packs per sync is the first stage that benefits from parallelization. Spawn N worker processes.
The fix is architectural (external queue + worker scaling), not linguistic (rewrite in Go/Rust).
Scenario D: 1,000+ tenants, 5+ connectors each (enterprise scale)
| Metric | Value |
|---|---|
| Entities per sync | ~5,000-50,000 |
| Authority paths | ~50,000-500,000 |
| Findings | ~5,000-50,000 |
| Evidence packs per sync | ~1,000-10,000 |
| End-to-end time | ~5-30 minutes |
| Concurrent syncs needed | 50-100 |
Verdict: This is where language choice starts to matter — but only for specific stages.
4. Where TypeScript Hits Limits (and Where It Doesn't)
Never a problem in TypeScript
| Operation | Why |
|---|---|
| HTTP API server | Express handles thousands of req/s. API is thin (validation + enqueue). |
| Rule evaluation | Pure functions, small inputs. 12 rules × 5,000 entities = 60,000 function calls in <1 second. |
| JSON serialization | V8's JSON.parse/stringify is among the fastest in any language. |
| SHA256 hashing | Node.js crypto module uses OpenSSL C bindings. Same speed as Go/Rust. |
| MongoDB operations | mongo driver is async I/O. Language doesn't matter — you're waiting for the network. |
| Evidence pack assembly | Fetch data, build objects. I/O-bound, not CPU-bound. |
Potential problems at enterprise scale
| Operation | Issue | Threshold | Mitigation |
|---|---|---|---|
| BFS graph traversal (path materialization) | 50,000 entities × depth-4 BFS = significant intermediate memory | >50K entities per sync | Stream entities from DB instead of loading all into memory. Use cursor-based traversal. |
| Diff computation | Requires all existing entities in memory for comparison | >100K entities per tenant | Incremental diff: only fetch entities matching incoming source_system + source_id. Already possible. |
| GC pauses | V8 garbage collector can cause 50-200ms pauses on heaps >1.5GB | >1.5GB heap | Reduce object churn. Use typed arrays for hash computation. Split into worker threads. |
| Single-threaded event loop | One CPU-heavy sync blocks all other syncs | >10 concurrent CPU-heavy syncs | Use worker_threads for CPU-bound stages, or separate worker processes. |
| Markdown rendering | Large evidence packs with extensive evidence generate significant string concatenation | >10K evidence packs per sync | Stream to file instead of building in memory. Unlikely to be a real issue. |
5. Comparison with Alternative Technologies
Go
| Aspect | Advantage over TypeScript | Disadvantage |
|---|---|---|
| Concurrency | Goroutines are lighter than Node.js async tasks | Marginal — both are async I/O. Go wins only for CPU-parallel work. |
| Memory | No GC pauses for heaps <10GB (concurrent GC) | Requires rewriting all domain types, rules, evidence logic |
| CPU throughput | ~2-5x faster for pure computation | Pipeline is I/O-bound — 2x CPU doesn't help when you're waiting for MongoDB |
| Type system | Comparable (Go generics + structs vs TS interfaces) | Loss of shared types with React UI |
| Ecosystem | Strong for infrastructure/pipeline tooling | Weaker for web UI, testing frameworks |
When Go would matter: If graph traversal or rule evaluation becomes CPU-bound at >50K entities AND worker_threads isn't sufficient.
Rust
| Aspect | Advantage over TypeScript | Disadvantage |
|---|---|---|
| Memory | Zero-cost abstractions, no GC | Massive rewrite cost. 10x development time for equivalent features. |
| CPU throughput | ~5-20x faster for pure computation | Pipeline is I/O-bound. Rust waiting for MongoDB is same speed as TS waiting for MongoDB. |
| Safety | Memory safety guarantees | TypeScript already provides type safety. Memory bugs are not our failure mode. |
When Rust would matter: Never, for this pipeline. The compute-to-I/O ratio doesn't justify it. Rust shines for real-time systems, embedded, or CPU-bound data processing (video encoding, ML inference). Our pipeline is "read from DB, apply rules, write to DB."
Python
| Aspect | Advantage over TypeScript | Disadvantage |
|---|---|---|
| Data processing | pandas, numpy for batch analytics | We don't do statistical analysis — rules are deterministic |
| ML/AI | If we ever need ML-based classification | Founder requirement: no ML, no heuristics, deterministic only |
| Ecosystem | Strong for data engineering | Slower runtime, GIL limits concurrency |
When Python would matter: If the platform adds ML-based anomaly detection. Current design constraint explicitly forbids this.
6. The Real Bottleneck Progression
Based on pipeline profiling, bottlenecks will hit in this order as scale increases:
Scale Bottleneck Fix
────── ────────── ───
~200 entities None Current stack works
~1K entities FIFO queue serialization External queue (BullMQ/Redis)
~5K entities Single worker process Multiple worker processes
~10K entities MongoDB write throughput Batch size tuning, write concern
~50K entities In-memory entity diff Incremental diff (cursor-based)
~100K entities GC pauses during traversal worker_threads for BFS stage
~500K entities Single MongoDB instance MongoDB replica set + read preference
~1M entities All of the above combined Consider Go for traversal stage only
Key insight: You hit 5 architectural bottlenecks before you hit a language bottleneck. Rewriting in Go or Rust before solving the architectural issues would not improve performance.
7. Recommendation
Short term (W1.1, 0-6 months): Stay with TypeScript
- The pipeline is I/O-bound. Node.js async I/O is as fast as any language for this workload.
- Shared types between API, pipeline, and UI reduce bugs and development time.
- The team knows TypeScript. Context-switching to a new language slows delivery.
- All 12 evaluator rules are pure functions — easy to test, easy to profile.
Medium term (6-18 months): Architectural changes, same language
When reaching 10+ tenants with 1,000+ entities each:
- Replace in-memory queue with BullMQ/Redis — enables multiple worker processes, job persistence, retry with backoff, priority queues.
- Run N worker processes — one per CPU core. Each dequeues and processes independently. Node.js
clustermodule or PM2. - Incremental diff — don't load all tenant entities. Query only matching
source_system + source_idpairs. - MongoDB read replicas — route read-heavy operations (evaluation, evidence pack reads) to secondaries.
These changes keep TypeScript and multiply throughput 10-50x.
Long term (18+ months): Selective rewrite if data shows need
If profiling at >50K entities per sync shows that BFS graph traversal or rule evaluation is genuinely CPU-bound (>50% of sync time in CPU, not I/O):
- Extract traversal stage to a Go microservice — accepts entity graph, returns authority paths. Called from the TypeScript pipeline via HTTP or gRPC.
- Keep everything else in TypeScript — API, evaluation rules, evidence packs, UI all stay.
- Never rewrite the whole platform — the ROI is negative. A full rewrite costs 6-12 months and introduces regressions with zero feature progress.
What NOT to do
| Anti-pattern | Why |
|---|---|
| Rewrite everything in Go/Rust now | Pipeline is I/O-bound. New language doesn't help. Costs 6+ months with zero feature progress. |
| Add Kafka/RabbitMQ for job queue | Massive infrastructure complexity. BullMQ + Redis gives the same benefits with 10% of the ops burden. |
| Premature worker_threads | Adds complexity (shared memory, message passing). Only justified when profiling shows GC or CPU as the bottleneck. |
| Add a separate "pipeline service" | Splits the codebase for no performance gain. Monolith is faster to develop and deploy until there's a specific scaling reason to split. |
8. Decision Summary
| Question | Answer |
|---|---|
| Is TypeScript sufficient for W1.1? | Yes. No contest. |
| Is TypeScript sufficient for 100 tenants? | Yes, with architectural changes (external queue, worker processes). |
| Is TypeScript sufficient for 1,000+ tenants? | Probably yes, with incremental diff + read replicas. Profile first. |
| When would we consider Go? | When profiling shows >50% CPU time in graph traversal at >50K entities. Extract that one stage. |
| When would we consider Rust? | Never for this pipeline. Wrong tool for I/O-bound batch processing. |
| When would we consider Python? | Only if we add ML-based detection, which the current design explicitly forbids. |
Bottom line: Fix the architecture (queue, workers, incremental diff) before considering a language change. The first 5 performance walls are all architectural, not linguistic.
Next Action
Status: adopted — shipped
TypeScript/Node.js confirmed as primary platform stack; Python for connectors. Reflected in 11-platform-mental-model.md and current repo structure. No further action required.