TCL/TK in production in 2018

Let me describe the stack I walked into at Talentera in June 2018: a multi-tenant B2B recruitment SaaS serving government ministries and enterprise clients across the MENA region, running a TCL/TK core on AOL Server 4.x, with PHP 7 on the side for the newer features, Node.js for some API layers, and a Camunda BPMN engine for workflow orchestration. This was not a prototype. This was production. This was processing real hiring decisions for real governments. And yes, TCL 8.4 and 8.6, in the year of our lord 2018.

I know what you’re thinking. Let me stop you.

TCL/TK is not a joke

The first reaction most engineers have when they see TCL in a production stack is “oh, they haven’t gotten around to migrating yet.” That framing is wrong and it leads to bad decisions.

TCL — Tool Command Language — was born in 1988 and designed for exactly one thing: scripting and gluing heterogeneous systems together. AOL Server, which was Talentera’s web runtime, uses TCL as its native extension language. AOL Server itself is genuinely impressive infrastructure: event-driven, multi-threaded (with a thread pool, not one-thread-per-request), efficient connection pooling built in. The original Naviserver fork is still under active development because the model actually works. When Google’s web infrastructure team was still doing first principles work in the early 2000s, AOL Server architecture got cited approvingly. This isn’t legacy by decay — it’s legacy by longevity, which is a different thing.

TCL 8.4 and 8.6 aren’t version numbers you’d pick in 2018 if you were starting fresh, but they’re version numbers that mean your runtime is stable. No breaking changes. No surprise deprecations. Every line of TCL code written before you arrived still runs the way it was written. For a multi-tenant SaaS where uptime directly affects governments running hiring rounds for thousands of candidates, boring stability is a competitive advantage.

The language itself — Tcl has a peculiar beauty. Everything is a string. Commands are just lists. The execution model is remarkably composable. For the kind of request processing Talentera was doing — forming queries, assembling responses, manipulating structured data in tight server-side loops — it’s honestly fine. Not great, but fine. And “fine, stable, and already written” beats “better, but requires a rewrite” approximately every time when you have enterprise SLAs to maintain.

What “legacy stack” actually costs

Here’s the honest number: the real cost of a legacy stack isn’t the runtime. It’s the operational surface.

AOL Server + TCL doesn’t have a rich modern observability story. Adding distributed tracing when your core is a TCL namespace running inside a thread pool on a 2003-era server model requires you to solve problems that the open-source ecosystem has already solved for Go, Java, and Node.js — but not for this. That’s engineering time that doesn’t go toward features. When I joined, the team was spending a disproportionate amount of cognitive overhead on “how do we get visibility into what TCL is doing in production” compared to “what should TCL be doing next.”

That’s the actual problem to solve. Not “rewrite TCL” — that’s a multi-year project that might kill the company — but “reduce the cognitive overhead of operating TCL so the team can focus on building product.”

The answer we landed on: don’t migrate the TCL core. Instead, isolate it.

DDD + microservices as a wrapper, not a replacement

The architectural approach I led was domain-driven design layered around the TCL core, not instead of it. Camunda BPMN became the workflow orchestrator for the processes that were growing too complex to manage in pure TCL — multi-stage approval chains, conditional branching for different government client configurations, document handling pipelines. Camunda speaks Java, which meant Java 8 services that owned the workflow definition layer, with TCL acting as one of the worker types executing specific task implementations.

Keycloak went in for identity and access management — OAuth2 and OpenID Connect across the multi-tenant landscape. This was the right call in 2018. Before Keycloak, tenant identity was handled in a way that was… artisanal. Custom session tokens, per-tenant auth logic scattered across request handlers, limited federation support. Keycloak centralised that, gave us standards-compliant auth, and let the newer services (Kotlin microservices, Node.js API layers) participate in the same identity model without needing to understand the TCL session internals.

RabbitMQ took the async messaging. Redis took the caching. The new mobile features — Ionic and eventually Flutter — talked to Node.js API gateways that translated between the mobile clients’ expectations and the TCL-based backend reality. None of this required touching the TCL core. The TCL core kept running. The new services grew up around it like a city growing around a river.

The week I shipped Kotlin and TCL in the same sprint

There’s a specific kind of cognitive whiplash that comes from context-switching between TCL 8.4 and Kotlin in the same work week. In TCL, you’re thinking about string manipulation and namespace management and making sure your proc argument lists are right. In Kotlin, you’re thinking about coroutines and data classes and null safety. The paradigms are so different they’re almost orthogonal.

What I found, which surprised me, was that this is actually fine. The strict separation of concerns that the architecture enforced — TCL owns the request handling layer, Kotlin owns the new domain services, Camunda owns the process definitions — meant that context switching was clean. When I was in TCL I was thinking about TCL problems. When I was in Kotlin I was thinking about Kotlin problems. The contexts didn’t bleed because the boundaries were real.

This is one of the underrated arguments for explicitly bounded architecture: it makes multi-technology shops operationally tractable. If you’re building a monolith and trying to introduce a new language, you’re going to have boundary confusion everywhere — where does the PHP end and the Node.js begin? When you’ve done DDD properly, the boundary is a deployment boundary, not just a namespace. That clarity matters when your team is switching between a 30-year-old scripting language and a modern JVM language in the same sprint.

What migration actually looks like

By the end of my time at Talentera I had a clear view of what the migration path was, even though we hadn’t completed it. The answer isn’t “rewrite TCL.” The answer is:

Identify the domains where the TCL implementation is actually causing problems. Not “this is old,” but “this is actively limiting us” — places where the lack of a richer type system or the limited observability story is costing real engineering time.
Define a new service boundary for that domain. Own the data for that domain in the new service.
Have Camunda mediate the transition — the workflow engine can route tasks to the new service as the new service comes online, while the TCL implementation handles the same tasks in parallel until you’re confident.
When you’ve drained the TCL implementation of traffic for that domain, retire it.

That’s a migration timeline measured in quarters, not years. And it doesn’t require you to rewrite everything at once. The critical insight is that “migration” is not “replacement” — it’s “systematic reallocation of ownership.” TCL doesn’t need to go away. It just needs to stop owning domains it’s not good at owning.

The thing nobody tells you about inheriting legacy stacks

When you join a company running a stack like this, there’s a temptation to make your mark by proposing a full rewrite. Resisting that temptation is one of the most important architectural decisions you’ll make. A full rewrite of a production multi-tenant SaaS is a bet-the-company project. It almost never finishes on schedule. The new thing almost never replicates all the edge cases the old thing handled. And during the rewrite, you’re not building new product — you’re building a replacement for existing product. Your competitors are not standing still.

The more valuable skill is learning to read a stack charitably. TCL on AOL Server isn’t technical debt because it’s old. It’s technical debt in specific places where its operational model creates real friction. Understand where those places are. Fix those places specifically. Leave the rest alone.

I shipped Kotlin microservices in that environment. I shipped BPMN workflow orchestration. I shipped OAuth2-federated identity across tenants. The TCL core kept running the whole time.

That’s the job.