Pipeline Technology Stack Analysis

Date: 2026-02-18 Question: Will the current TypeScript/Node.js stack be sufficient for the processing pipeline long-term, or will we need to migrate to more performant technologies?

1. Current Stack

Component	Technology	Role
API server	Express.js (TypeScript)	HTTP ingestion endpoint
Worker runtime	In-memory FIFO queue (TypeScript)	Job scheduling and execution
Pipeline logic	Pure TypeScript functions	Graph transform, diff, evaluation, evidence
Database	MongoDB 7	All persistence
UI	React + Vite	Dashboard and exploration

All pipeline stages run in a single Node.js process. No external message queue. No separate worker processes.

2. Pipeline Stage Profiling

What each stage actually does (compute characteristics)

Stage	Primary Operation	CPU Pattern	Memory Pattern	I/O Pattern
Transform	Parse JSON, map node types, compute SHA256 entity IDs	Light CPU (hashing)	Proportional to graph size — all nodes in memory	Single read (HTTP body)
Diff	Compare new entities vs existing, generate events	Light CPU (object comparison)	2× entity count (old + new in memory)	One DB read (fetch all tenant entities)
Upsert	MongoDB bulkWrite	Minimal CPU	Batch buffer	Many DB writes
Path materialization	BFS graph traversal per entity	Moderate CPU (graph traversal, depth-limited)	Adjacency lists in memory	DB reads for entity relationships
Authority paths	Hash computation, upsert, soft-remove	Light CPU	Proportional to path count	Many DB writes
Evaluation	Rule functions × entities × paths	Moderate CPU (rule logic, evidence lookups)	Entity + path + evidence in memory	Many DB reads
Evidence packs	Fetch data, build 9 sections, SHA256 hash, markdown render	Moderate CPU	One pack at a time	Many DB reads, one DB write

Key observation

The pipeline is I/O-bound, not CPU-bound. Most time is spent waiting for MongoDB reads and writes. The actual computation (hashing, diffing, rule evaluation) is trivial compared to database round-trips.

3. Scale Scenarios

Scenario A: Single tenant, single connector (today)

Metric	Value
Entities per sync	~50-200
Authority paths	~100-500
Findings	~10-50
Evidence packs per sync	~5-20
End-to-end time	~2-5 seconds
Sync frequency	Manual / hourly

Verdict: TypeScript is more than sufficient. Pipeline completes in seconds. No optimization needed.

Scenario B: 10 tenants, 2 connectors each (early growth)

Metric	Value
Entities per sync	~200-1,000
Authority paths	~500-5,000
Findings	~50-500
Evidence packs per sync	~20-100
End-to-end time	~10-30 seconds
Concurrent syncs	2-3 (FIFO queue, sequential)

Verdict: TypeScript is sufficient. The FIFO queue serializes work, so syncs don't compete for memory. The bottleneck is MongoDB write throughput, not Node.js CPU. Total pipeline throughput: ~100-200 syncs/hour.

Scenario C: 100 tenants, 3 connectors each (growth)

Metric	Value
Entities per sync	~1,000-5,000
Authority paths	~5,000-50,000
Findings	~500-5,000
Evidence packs per sync	~100-1,000
End-to-end time	~1-5 minutes
Concurrent syncs needed	10-20

Verdict: TypeScript can handle it, but architecture changes matter more than language:

In-memory FIFO queue becomes the bottleneck — syncs queue up behind each other. Need an external queue (BullMQ/Redis, SQS) and multiple worker processes.
MongoDB connection pooling — need connection pool per worker, cluster-aware driver.
Path materialization — BFS traversal of 5,000 entities generates significant intermediate state. Still fits in Node.js memory (objects are small), but GC pressure increases.
Evidence packs — 1,000 packs per sync is the first stage that benefits from parallelization. Spawn N worker processes.

The fix is architectural (external queue + worker scaling), not linguistic (rewrite in Go/Rust).

Scenario D: 1,000+ tenants, 5+ connectors each (enterprise scale)

Metric	Value
Entities per sync	~5,000-50,000
Authority paths	~50,000-500,000
Findings	~5,000-50,000
Evidence packs per sync	~1,000-10,000
End-to-end time	~5-30 minutes
Concurrent syncs needed	50-100

Verdict: This is where language choice starts to matter — but only for specific stages.

4. Where TypeScript Hits Limits (and Where It Doesn't)

Never a problem in TypeScript

Operation	Why
HTTP API server	Express handles thousands of req/s. API is thin (validation + enqueue).
Rule evaluation	Pure functions, small inputs. 12 rules × 5,000 entities = 60,000 function calls in <1 second.
JSON serialization	V8's JSON.parse/stringify is among the fastest in any language.
SHA256 hashing	Node.js crypto module uses OpenSSL C bindings. Same speed as Go/Rust.
MongoDB operations	mongo driver is async I/O. Language doesn't matter — you're waiting for the network.
Evidence pack assembly	Fetch data, build objects. I/O-bound, not CPU-bound.

Potential problems at enterprise scale

Operation	Issue	Threshold	Mitigation
BFS graph traversal (path materialization)	50,000 entities × depth-4 BFS = significant intermediate memory	>50K entities per sync	Stream entities from DB instead of loading all into memory. Use cursor-based traversal.
Diff computation	Requires all existing entities in memory for comparison	>100K entities per tenant	Incremental diff: only fetch entities matching incoming source_system + source_id. Already possible.
GC pauses	V8 garbage collector can cause 50-200ms pauses on heaps >1.5GB	>1.5GB heap	Reduce object churn. Use typed arrays for hash computation. Split into worker threads.
Single-threaded event loop	One CPU-heavy sync blocks all other syncs	>10 concurrent CPU-heavy syncs	Use `worker_threads` for CPU-bound stages, or separate worker processes.
Markdown rendering	Large evidence packs with extensive evidence generate significant string concatenation	>10K evidence packs per sync	Stream to file instead of building in memory. Unlikely to be a real issue.

5. Comparison with Alternative Technologies

Go

Aspect	Advantage over TypeScript	Disadvantage
Concurrency	Goroutines are lighter than Node.js async tasks	Marginal — both are async I/O. Go wins only for CPU-parallel work.
Memory	No GC pauses for heaps <10GB (concurrent GC)	Requires rewriting all domain types, rules, evidence logic
CPU throughput	~2-5x faster for pure computation	Pipeline is I/O-bound — 2x CPU doesn't help when you're waiting for MongoDB
Type system	Comparable (Go generics + structs vs TS interfaces)	Loss of shared types with React UI
Ecosystem	Strong for infrastructure/pipeline tooling	Weaker for web UI, testing frameworks

When Go would matter: If graph traversal or rule evaluation becomes CPU-bound at >50K entities AND worker_threads isn't sufficient.

Rust

Aspect	Advantage over TypeScript	Disadvantage
Memory	Zero-cost abstractions, no GC	Massive rewrite cost. 10x development time for equivalent features.
CPU throughput	~5-20x faster for pure computation	Pipeline is I/O-bound. Rust waiting for MongoDB is same speed as TS waiting for MongoDB.
Safety	Memory safety guarantees	TypeScript already provides type safety. Memory bugs are not our failure mode.

When Rust would matter: Never, for this pipeline. The compute-to-I/O ratio doesn't justify it. Rust shines for real-time systems, embedded, or CPU-bound data processing (video encoding, ML inference). Our pipeline is "read from DB, apply rules, write to DB."

Python

Aspect	Advantage over TypeScript	Disadvantage
Data processing	pandas, numpy for batch analytics	We don't do statistical analysis — rules are deterministic
ML/AI	If we ever need ML-based classification	Founder requirement: no ML, no heuristics, deterministic only
Ecosystem	Strong for data engineering	Slower runtime, GIL limits concurrency

When Python would matter: If the platform adds ML-based anomaly detection. Current design constraint explicitly forbids this.

6. The Real Bottleneck Progression

Based on pipeline profiling, bottlenecks will hit in this order as scale increases:

Scale            Bottleneck                  Fix
──────           ──────────                  ───
~200 entities    None                        Current stack works
~1K entities     FIFO queue serialization    External queue (BullMQ/Redis)
~5K entities     Single worker process       Multiple worker processes
~10K entities    MongoDB write throughput     Batch size tuning, write concern
~50K entities    In-memory entity diff        Incremental diff (cursor-based)
~100K entities   GC pauses during traversal  worker_threads for BFS stage
~500K entities   Single MongoDB instance     MongoDB replica set + read preference
~1M entities     All of the above combined   Consider Go for traversal stage only

Key insight: You hit 5 architectural bottlenecks before you hit a language bottleneck. Rewriting in Go or Rust before solving the architectural issues would not improve performance.

7. Recommendation

Short term (W1.1, 0-6 months): Stay with TypeScript

The pipeline is I/O-bound. Node.js async I/O is as fast as any language for this workload.
Shared types between API, pipeline, and UI reduce bugs and development time.
The team knows TypeScript. Context-switching to a new language slows delivery.
All 12 evaluator rules are pure functions — easy to test, easy to profile.

Medium term (6-18 months): Architectural changes, same language

When reaching 10+ tenants with 1,000+ entities each:

Replace in-memory queue with BullMQ/Redis — enables multiple worker processes, job persistence, retry with backoff, priority queues.
Run N worker processes — one per CPU core. Each dequeues and processes independently. Node.js cluster module or PM2.
Incremental diff — don't load all tenant entities. Query only matching source_system + source_id pairs.
MongoDB read replicas — route read-heavy operations (evaluation, evidence pack reads) to secondaries.

These changes keep TypeScript and multiply throughput 10-50x.

Long term (18+ months): Selective rewrite if data shows need

If profiling at >50K entities per sync shows that BFS graph traversal or rule evaluation is genuinely CPU-bound (>50% of sync time in CPU, not I/O):

Extract traversal stage to a Go microservice — accepts entity graph, returns authority paths. Called from the TypeScript pipeline via HTTP or gRPC.
Keep everything else in TypeScript — API, evaluation rules, evidence packs, UI all stay.
Never rewrite the whole platform — the ROI is negative. A full rewrite costs 6-12 months and introduces regressions with zero feature progress.

What NOT to do

Anti-pattern	Why
Rewrite everything in Go/Rust now	Pipeline is I/O-bound. New language doesn't help. Costs 6+ months with zero feature progress.
Add Kafka/RabbitMQ for job queue	Massive infrastructure complexity. BullMQ + Redis gives the same benefits with 10% of the ops burden.
Premature worker_threads	Adds complexity (shared memory, message passing). Only justified when profiling shows GC or CPU as the bottleneck.
Add a separate "pipeline service"	Splits the codebase for no performance gain. Monolith is faster to develop and deploy until there's a specific scaling reason to split.

8. Decision Summary

Question	Answer
Is TypeScript sufficient for W1.1?	Yes. No contest.
Is TypeScript sufficient for 100 tenants?	Yes, with architectural changes (external queue, worker processes).
Is TypeScript sufficient for 1,000+ tenants?	Probably yes, with incremental diff + read replicas. Profile first.
When would we consider Go?	When profiling shows >50% CPU time in graph traversal at >50K entities. Extract that one stage.
When would we consider Rust?	Never for this pipeline. Wrong tool for I/O-bound batch processing.
When would we consider Python?	Only if we add ML-based detection, which the current design explicitly forbids.

Bottom line: Fix the architecture (queue, workers, incremental diff) before considering a language change. The first 5 performance walls are all architectural, not linguistic.

Next Action

Status: adopted — shipped TypeScript/Node.js confirmed as primary platform stack; Python for connectors. Reflected in 11-platform-mental-model.md and current repo structure. No further action required.

1. Current Stack​

2. Pipeline Stage Profiling​

What each stage actually does (compute characteristics)​

Key observation​

3. Scale Scenarios​

Scenario A: Single tenant, single connector (today)​

Scenario B: 10 tenants, 2 connectors each (early growth)​

Scenario C: 100 tenants, 3 connectors each (growth)​

Scenario D: 1,000+ tenants, 5+ connectors each (enterprise scale)​

4. Where TypeScript Hits Limits (and Where It Doesn't)​

Never a problem in TypeScript​

Potential problems at enterprise scale​

5. Comparison with Alternative Technologies​

Go​

Rust​

Python​

6. The Real Bottleneck Progression​

7. Recommendation​

Short term (W1.1, 0-6 months): Stay with TypeScript​

Medium term (6-18 months): Architectural changes, same language​

Long term (18+ months): Selective rewrite if data shows need​

What NOT to do​

8. Decision Summary​

Next Action​