A Cross-Instance Distributed Lock: Solving Nonce Collisions in a Telegram Trading Bot

Oftentimes, when building horizontally scalable systems, you encounter a challenge where multiple instances of a service need to coordinate access to a shared resource, ensuring that no matter how many instances you have running, only one can use a resource at a time. In the context of my project Pulsonic Trading Bot, this shared resource was wallet transaction signing and broadcasting without nonce collision.
In this article, I will share how I solved this problem by creating a horizontally scalable transaction signing and broadcasting system using a distributed lock implemented with Redis.
The Problem: Nonce Collisions in a Telegram Trading Bot
What is a nonce in EVM blockchains?
On EVM blockchains like Ethereum, a transaction consists of several fields, one of which is the nonce. The nonce is a unique number that represents the count of transactions sent from a particular address. It serves two main purposes:
Ordering: The nonce ensures that transactions from the same address are processed in the intended order, a transaction with a nonce 9 for example, will not be mined/processed until all transactions with nonces 0-8 have been mined/processed.
Uniqueness: A nonce can only be used once, this provides a mechanism to protect against replay attack (each transaction can be only executed once).
Why do nonce collisions happen?
Since we have multiple instances of the signing service running at the same time, different instances could pick up a queued transaction for the same wallet at the same time, they both read the blockchain for that wallet nonce, and they both get the same nonce, so both sign their transactions with the same nonce and broadcast them, the network will see the first transaction with that nonce and reject the second one as a duplicate.
Initial Solution attempts: BullMQ
Since my system was designed and uses BullMQ for many of the background processing, I looked for a solution within BullMQ; unfortunately, BullMQ only offers concurrency control on a queue level, not on a resource level.
This means I can set the concurrency level of the signing queue to 1, but this will mean every transaction from every wallet of our users will be processed one at a time. This clearly will not scale well, so it was an easy pass on this one.
The Solution: A Cross-Instance Distributed Lock with Redis
To solve this problem, I implemented a distributed lock using Redis. The lock can be acquired for a wallet address by a signing instance, and while the lock is held by an instance, any other instance that tries to sign a transaction for the same wallet will have to wait or reschedule the transaction and move to the next one in the queue.
Why Redis
Some would argue that we could use a file lock or a database lock, but I chose Redis for several reasons:
- Our bot is setup on two servers in two different locations, so that rules out file locks
- We are using mongodb, while atomic operations are supported with transactions, mongodb serializes writes and that will hurt other unrelated workloads, additionally who wants to turn his main database to a lock manager.
- We already using redis as a caching and streaming solution.
- Redis operations are atomic and have a built-in TTL, so using
SET key value NX PX ttlis atomic and could be done in one command. - lua scripts will allow us to implement safer release and extend lock operations.
- Redis is battle tested and used in distributed locks implementations (Redlock).
Why not Redlock?
I could have used Redlock.
I didn’t — partly because the problem didn’t require it, and partly because building things is how I understand them.
This lock was created to solve a very specific problem: serializing nonce usage across signer instances. The blockchain already enforces the final safety guarantees, so a simpler Redis-based lock with TTL and ownership checks is sufficient in practice.
Beyond that, implementing the lock myself made the failure modes explicit and easy to reason about. I knew exactly what could go wrong, why it was acceptable, and how the system would behave under stress or partial failure.
Sometimes the right choice isn’t the most complex or “official” solution — it’s the one you fully understand and can confidently operate.
Lock Requirements
The distributed lock implementation should have the following features:
- Acquire Lock: a caller should be able to acquire a lock for a resource, width a specific TTL (to prevent deadlocks) and an optional max wait time (to prevent waiting forever).
- Release Lock: An instance should be able to release the lock it holds for a wallet address once it has finished signing and broadcasting the transaction.
- Extend Lock: An instance should be able to extend the TTL of a lock if it needs more time to complete the signing and broadcasting process, this is important to prevent the lock from expiring while the instance is still working on the transaction, which could lead to another instance acquiring the lock and causing a nonce collision.
- Safe The lock can only be released by the one that acquired it or by reaching its TTL.
Implementation
Whenever I have to implement something new like this, what I usually do is to isolate the implementation, for example now, I am going to forget about the nonce duplication issue and focus solely on the lock implementation, for that, I’m just gonna initiate a new project and implement the lock there.
# create a new directory
mkdir distributed-lock
# enter the directory
cd distributed-lock
# initialize a new npm project
npm init -y
# install typescript
npm install -D typescript @types/node
# initialize typescript configuration
npx tsc --init
# install redis client
npm install redis
# create a new directory
mkdir distributed-lock
# enter the directory
cd distributed-lock
# initialize a new npm project
pnpm init
# install typescript
pnpm install -D typescript @types/node
# initialize typescript configuration
pnpm exec tsc --init
# install redis client
pnpm install redis
# create a new directory
mkdir distributed-lock
# enter the directory
cd distributed-lock
# initialize a new npm project
yarn init -y
# install typescript
yarn add -D typescript @types/node
# initialize typescript configuration
npx tsc --init
# install redis client
yarn add redis
# create a new directory
mkdir distributed-lock
# enter the directory
cd distributed-lock
# initialize a new npm project
bun init -y
# install redis client
bun add redis
Open the folder in your favorite IDE, and let’s adjust the tsconfig.json and package.json.
// ....
"compilerOptions": {
"module": "nodenext",
"moduleResolution": "NodeNext",
"outDir": "dist",
"sourceMap": false,
"declarationMap": false,
"types": ["node"],
},
//...
// ....
"type": "module",
"scripts": {
// ...
"start": "tsc && node ./dist/index.js"
},
//...
Let’s create index.ts and create a race condition to start the implementation of the lock.
The idea here is to simulate multiple instances reading the nonce and updating it, in this example we create a file that will hold the nonce for the wallet, and we will loop n times, each time reading the old nonce from the file, incrementing in memory and then saving it back to the file, if we run this script concurrently we will see issues with the final nonce value.
import fs from 'fs';
const walletName = '0';
const nonceFileName = `nonce-${walletName}.txt`;
const incrementCount = 10;
function resetNonce() {
fs.writeFileSync(nonceFileName, '0');
}
resetNonce();
/*
simulate incrementing nonces for each wallet `n` times
each time we read the nonce (simulate RPC request getNonce())
and then increment in memory, and then save to file
*/
for (let i = 0; i < incrementCount; i++) {
let nonce = parseInt(fs.readFileSync(nonceFileName, 'utf-8'), 10);
nonce++;
fs.writeFileSync(nonceFileName, nonce.toString());
}
If we run this script once and then check the nonce-0.txt file, we will see the final nonce value is 10.
npx tsc && node ./dist/index.js && cat nonce-0.txt
The output will be 10.
Let’s install the concurrently package to run multiple instances of the script at the same time.
npm install -D concurrently
pnpm install -D concurrently
yarn add -D concurrently
bun add -D concurrently
And for convenience, we can add a start:many script to run two instances of the script at the same time.
"scripts": {
// ...
"start:many": "tsc && concurrently \"node ./dist/index.js\" \"node ./dist/index.js\""
},
Running the start:many script, we expect the final nonce value to be 20, but we will get a different value and probably even NaN.
npm run start:many && cat nonce-0.txt
Yep, we got Nan, caused by partial writes and concurrent truncation, parseInt couldn’t parse the value and got NaN, this is the race condition we are trying to solve.
Let’s create a docker-compose.yml file to start a Redis instance.
services:
redis:
image: redis:latest
ports:
- '6379:6379'
Start the Redis instance using Docker Compose.
docker compose up -d
Let’s create a Redis client to be used later by the lock implementation. Create a new file redisClient.ts.
import { createClient, type RedisClientType } from 'redis';
const client: RedisClientType = createClient({
url: `redis://localhost:6379`, // change with your host and port if needed
socket: {
connectTimeout: 5000
}
});
/*
If using an older version of Node that doesn't await top-level
Use something like
const redisClientPromise = client.connect();
Then whenever you need the client, do
const redisClient = await redisClientPromise;
*/
export const redisClient: RedisClientType = await client.connect();
Our DistributedLock class will have at least 3 methods, acquireLock, releaseLock, and touch.
import type { RedisClientType } from 'redis';
import { redisClient } from './redisClient.js';
class DistributedLock {
// to avoid any key collisions
private static NAMESPACE: string = `cache-DistributedLock`;
private redisClient: RedisClientType;
constructor(redisClient: RedisClientType) {
this.redisClient = redisClient;
}
async acquireLock(resource: string, maxWaitTime: number, ttl: number): Promise<boolean> {
//
return true;
}
async releaseLock(resource: string): Promise<void> {
//
}
async touch(resource: string, ttl: number): Promise<boolean> {
//
return true;
}
}
export const lockManager = new DistributedLock(redisClient);
In order to acquire a lock, we need to write a key into Redis, and only write it if that key does not exists, fortunately redis has first class support for this using SET and the NX (Not eXists) option. so our `acquireLock method will look like this:
//...
async acquireLock(resource: string, maxWaitTime: number, ttl: number): Promise<boolean> {
const startTime = performance.now();
const lockKey = `${DistributedLock.NAMESPACE}-lock-${resource}`;
while (true) {
const result = await this.redisClient.set(lockKey, "TRUE", {
condition: "NX",
expiration: {
type: "PX", // ttl in milliseconds, if you want to use seconds, use EX and ttl in seconds
value: ttl
}
});
if (result === 'OK') {
return true;
}
if (performance.now() - startTime >= maxWaitTime) { // respect max wait
return false;
}
await new Promise((resolve) => setTimeout(resolve, 100));
}
}
//...
Pretty straightforward. Now we have an initial lock acquisition, let’s create a release mechanism.
To release, we need to delete the key from Redis; that’s also straightforward, so our releaseLock method will look like this:
async releaseLock(resource: string): Promise<void> {
const lockKey = `${DistributedLock.NAMESPACE}-lock-${resource}`;
await this.redisClient.del(lockKey);
}
Now let’s try to use this lock in our index.ts to solve the race condition issues.
import fs from 'fs';
import { lockManager } from './DistributedLock.js';
import { redisClient } from './redisClient.js';
const walletName = '0';
const nonceFileName = `nonce-${walletName}.txt`;
const incrementCount = 10;
function resetNonce() {
fs.writeFileSync(nonceFileName, '0');
}
resetNonce();
/*
simulate incrementing nonces for each wallet `n` times
each time we read the nonce (simulate RPC request getNonce())
and then increment in memory, and then save to file
*/
for (let i = 0; i < incrementCount; i++) {
const locked = await lockManager.acquireLock(walletName, 5000, 5000);
if (!locked) {
console.log(`Failed to acquire lock for wallet ${walletName} on attempt ${i}`);
continue;
}
try {
let nonce = parseInt(fs.readFileSync(nonceFileName, 'utf-8'), 10);
nonce++;
fs.writeFileSync(nonceFileName, nonce.toString());
} finally {
await lockManager.releaseLock(walletName);
}
}
redisClient.quit();
Now running the start:many script, we should see the final nonce value is 20 as expected, and no more NaN values.
npm run start:many && cat nonce-0.txt
Even when we change the incrementCount to a higher value like 1_000_000, we get exactly 2_000_000.
So we did it right? well not quite enough, I mean this would work, but this implementation falls short on the 4th requirement, only the one that acquired the lock can release it. currently anyone needing resource x could call releaseLock(x) and release the lock even tho that lock never belonged to them. This should not be allowed.
To fix this, we generate a random UUID when acquiring the lock and save it as the value, and also change the release method to only delete the key if the value matches the UUID. This way, only the one locking the resource can release it.
let’s adjust the acquireLock method to generate a uuid and save it as the value of the lock key
import crypto from 'crypto';
//...
async acquireLock(resource: string, maxWaitTime: number, ttl: number): Promise<string | null> {
const startTime = performance.now();
const lockKey = `${DistributedLock.NAMESPACE}-lock-${resource}`;
const value = crypto.randomBytes(16).toString('hex');
while (true) {
const result = await this.redisClient.set(lockKey, value, {
condition: "NX",
expiration: {
type: "PX", // ttl in milliseconds, if you want to use seconds, use EX and ttl in seconds
value: ttl
}
});
if (result === 'OK') {
return value;
}
if (performance.now() - startTime >= maxWaitTime) { // respect max wait
return null;
}
await new Promise((resolve) => setTimeout(resolve, 100));
}
}
//...
We use the crypto module to generate a random string and use it as the lock value. We also changed the return type of the function; now the function returns the lock value, which the caller could use later to release the lock.
Now to the tricky part, we need to change the lock release method; we can no longer just delete the key from Redis; the value must match the lock value to be released.
We can’t do something like this either.
async releaseLock(resource: string, value: string): Promise<void> {
const lockKey = `${DistributedLock.NAMESPACE}-lock-${resource}`;
const currentValue = await this.redisClient.get(lockKey);
if (currentValue === value) {
await this.redisClient.del(lockKey);
}
}
Because this is not atomic, there is no guarantee that the value won’t change between the time we read the value and the time we delete the key.
In order to achieve this, we need to use a Lua script. Redis allows us to run lua scripts, those scripts are executed atomically, bellow if the script we are going to use.
if redis.call("get", KEYS[1]) == ARGV[1] then
return redis.call("del", KEYS[1])
else
return 0
end
This script checks if the value of the provided key matches the provided value and only deletes the key if the value matches.
Now our releaseLock method will look like this:
//...
async releaseLock(resource: string, value: string): Promise<boolean> {
const lockKey = `${DistributedLock.NAMESPACE}-lock-${resource}`;
const luaScript = `
if redis.call("get", KEYS[1]) == ARGV[1] then
return redis.call("del", KEYS[1])
else
return 0
end
`;
const result = await this.redisClient.eval(luaScript, {
keys: [lockKey],
arguments: [value]
});
if (result === 1) {
return true;
}
return false;
}
//...
Now only the instance that requires the lock knows the value that can release it. Let’s adjust the index.ts to use the new acquire and release methods.
import fs from 'fs';
import { lockManager } from './DistributedLock.js';
import { redisClient } from './redisClient.js';
const walletName = '0';
const nonceFileName = `nonce-${walletName}.txt`;
const incrementCount = 10;
function resetNonce() {
fs.writeFileSync(nonceFileName, '0');
}
resetNonce();
/*
simulate incrementing nonces for each wallet `n` times
each time we read the nonce (simulate RPC request getNonce())
and then increment in memory, and then save to file
*/
for (let i = 0; i < incrementCount; i++) {
const lockValue = await lockManager.acquireLock(walletName, 5000, 5000);
if (!lockValue) {
console.log(`Failed to acquire lock for wallet ${walletName} on attempt ${i}`);
continue;
}
try {
let nonce = parseInt(fs.readFileSync(nonceFileName, 'utf-8'), 10);
nonce++;
fs.writeFileSync(nonceFileName, nonce.toString());
} finally {
await lockManager.releaseLock(walletName, lockValue);
}
}
redisClient.quit();
Let’s run the start:many script again to make sure everything is working as expected.
npm run start:many && cat nonce-0.txt
Yep, all good, we’re still getting 20 as the final value.
Now let’s improve the dev experience a bit, instead of the locker calling releaseLock and passing the parameters, we can follow a very common pattern in javascript and make releaseLock return an object that has a release method, this way the locker can just call lock.release() without worrying about passing the parameters, and we can also add an extend method to the same object to allow extending the lock TTL if needed.
import type { RedisClientType } from "redis";
import { redisClient } from "./redisClient.js";
import crypto from 'crypto';
type AcquireLockReturnType = {
release: () => Promise<boolean>;
touch: () => Promise<boolean>;
};
//...
async acquireLock(resource: string, maxWaitTime: number, ttl: number): Promise<AcquireLockReturnType | null> {
const startTime = performance.now();
const lockKey = `${DistributedLock.NAMESPACE}-lock-${resource}`;
const value = crypto.randomBytes(16).toString('hex');
while (true) {
const result = await this.redisClient.set(lockKey, value, {
condition: "NX",
expiration: {
type: "PX", // ttl in milliseconds, if you want to use seconds, use EX and ttl in seconds
value: ttl
}
});
if (result === 'OK') {
return {
release: () => this.releaseLock(resource, value),
touch: () => this.touch(resource, ttl) // we change this later
};
}
if (performance.now() - startTime >= maxWaitTime) { // respect max wait
return null;
}
await new Promise((resolve) => setTimeout(resolve, 100));
}
}
//...
Let’s also update the index.ts to use the new return type of the acquireLock method.
//...
for (let i = 0; i < incrementCount; i++) {
const lock = await lockManager.acquireLock(walletName, 5000, 5000);
if (!lock) {
console.log(`Failed to acquire lock for wallet ${walletName} on attempt ${i}`);
continue;
}
try {
let nonce = parseInt(fs.readFileSync(nonceFileName, 'utf-8'), 10);
nonce++;
fs.writeFileSync(nonceFileName, nonce.toString());
} finally {
await lock.release();
}
}
//...
Now the code is much cleaner and easier to understand. You receive a lock object, you can either release it by calling lock.release() or extend it by calling lock.touch().
Let’s implement the touch method to allow extending the lock TTL. This is needed so processes executing long tasks can extend the lock as they go; it’s better than locking for a long time and risking the lock to persist if the task fails to release it.
touch is similar to release, we needlua script to check if the value matches before extending the key’s ttl, below is the script we will use
if redis.call("GET", KEYS[1]) == ARGV[1] then
return redis.call("PEXPIRE", KEYS[1], ARGV[2])
else
return 0
end
The script first reads the value and compares it to the provided value. If they match, the expiration time is extended by the provided argument; nothing happens.
The touch function will look like this:
//...
async touch(resource: string, lockValue: string, ttl: number): Promise<boolean> {
const lockKey = `${DistributedLock.NAMESPACE}-lock-${resource}`;
const luaScript = `
if redis.call("GET", KEYS[1]) == ARGV[1] then
return redis.call("PEXPIRE", KEYS[1], ARGV[2])
else
return 0
end
`;
const result = await this.redisClient.eval(luaScript, {
keys: [lockKey],
arguments: [lockValue, ttl.toString()]
});
return result === 1;
}
Adjust the acquireLock method to pass the lock value to the touch method.
//..
async acquireLock(resource: string, maxWaitTime: number, ttl: number): Promise<AcquireLockReturnType | null> {
//...
if (result === 'OK') {
return {
release: () => this.releaseLock(resource, value),
touch: (newTtl: number) => this.touch(resource, value, newTtl)
};
}
//...
We have a usable distributed lock now. Let’s adjust index.ts to test the touch method. We can achieve this by simulating a long-running task. I purposely tightened the ttl to 100ms.
//...
for (let i = 0; i < incrementCount; i++) {
const lock = await lockManager.acquireLock(walletName, 10000, 100);
if (!lock) {
console.log(`Failed to acquire lock for wallet ${walletName} on attempt ${i}`);
continue;
}
try {
await new Promise((resolve) => setTimeout(resolve, 90)); // simulate some delay in processing
// we touch the lock
await lock.touch(100); // extend lock by 0.1 seconds
await new Promise((resolve) => setTimeout(resolve, 90)); // simulate some more delay in processing
// if the touch is not working, the lock will expire after 0.1 seconds, and our access to the file here is in violation
let nonce = parseInt(fs.readFileSync(nonceFileName, 'utf-8'), 10);
nonce++;
fs.writeFileSync(nonceFileName, nonce.toString());
} finally {
await lock.release();
}
}
//...
You can now run the start:many script again to make sure everything is working as expected.
npm run start:many && cat nonce-0.txt
Yep, all good, we’re still getting 20 as the final value, and we have successfully implemented a cross-instance distributed lock using Redis.
Final Code
import type { RedisClientType } from 'redis';
import { redisClient } from './redisClient.js';
import crypto from 'crypto';
type AcquireLockReturnType = {
release: () => Promise<boolean>;
touch: (ttl: number) => Promise<boolean>;
};
class DistributedLock {
// to avoid any key collisions
private static NAMESPACE: string = `cache-DistributedLock`;
private redisClient: RedisClientType;
constructor(redisClient: RedisClientType) {
this.redisClient = redisClient;
}
async acquireLock(
resource: string,
maxWaitTime: number,
ttl: number
): Promise<AcquireLockReturnType | null> {
const startTime = performance.now();
const lockKey = `${DistributedLock.NAMESPACE}-lock-${resource}`;
const value = crypto.randomBytes(16).toString('hex');
while (true) {
const result = await this.redisClient.set(lockKey, value, {
condition: 'NX',
expiration: {
type: 'PX', // ttl in milliseconds, if you want to use seconds, use EX and ttl in seconds
value: ttl
}
});
if (result === 'OK') {
return {
release: () => this.releaseLock(resource, value),
touch: (newTtl: number) => this.touch(resource, value, newTtl)
};
}
if (performance.now() - startTime >= maxWaitTime) {
// respect max wait
return null;
}
await new Promise((resolve) => setTimeout(resolve, 100));
}
}
async releaseLock(resource: string, value: string): Promise<boolean> {
const lockKey = `${DistributedLock.NAMESPACE}-lock-${resource}`;
const luaScript = `
if redis.call("get", KEYS[1]) == ARGV[1] then
return redis.call("del", KEYS[1])
else
return 0
end
`;
const result = await this.redisClient.eval(luaScript, {
keys: [lockKey],
arguments: [value]
});
if (result === 1) {
return true;
}
return false;
}
async touch(resource: string, lockValue: string, ttl: number): Promise<boolean> {
const lockKey = `${DistributedLock.NAMESPACE}-lock-${resource}`;
const luaScript = `
if redis.call("GET", KEYS[1]) == ARGV[1] then
return redis.call("PEXPIRE", KEYS[1], ARGV[2])
else
return 0
end
`;
const result = await this.redisClient.eval(luaScript, {
keys: [lockKey],
arguments: [lockValue, ttl.toString()]
});
return result === 1;
}
}
export const lockManager = new DistributedLock(redisClient);
import fs from 'fs';
import { lockManager } from './DistributedLock.js';
import { redisClient } from './redisClient.js';
const walletName = '0';
const nonceFileName = `nonce-${walletName}.txt`;
const incrementCount = 10;
function resetNonce() {
fs.writeFileSync(nonceFileName, '0');
}
resetNonce();
/*
simulate incrementing nonces for each wallet `n` times
each time we read the nonce (simulate RPC request getNonce())
and then increment in memory, and then save to file
*/
for (let i = 0; i < incrementCount; i++) {
const lock = await lockManager.acquireLock(walletName, 10000, 100);
if (!lock) {
console.log(`Failed to acquire lock for wallet ${walletName} on attempt ${i}`);
continue;
}
try {
await new Promise((resolve) => setTimeout(resolve, 90)); // simulate some delay in processing
// we touch the lock
await lock.touch(100); // extend lock by 0.1 seconds
await new Promise((resolve) => setTimeout(resolve, 90)); // simulate some more delay in processing
// if the touch is not working, the lock will expire after 0.1 seconds, and our access to the file here is in violation
let nonce = parseInt(fs.readFileSync(nonceFileName, 'utf-8'), 10);
nonce++;
fs.writeFileSync(nonceFileName, nonce.toString());
} finally {
await lock.release();
}
}
redisClient.quit();
import { createClient, type RedisClientType } from 'redis';
const client: RedisClientType = createClient({
url: `redis://localhost:6379`, // change with your host and port if needed
socket: {
connectTimeout: 5000
}
});
/*
If using an older version of Node that doesn't await top-level
Use something like
const redisClientPromise = client.connect();
Then whenever you need the client, do
const redisClient = await redisClientPromise;
*/
export const redisClient: RedisClientType = await client.connect();
Improvements Before Production
Before using this in production, there are some improvements that should be added:
jitters: adding some random jitter to the retry delay in acquire lock to prevent thundering herd problem. incremental backoff: instead of retrying every 100ms we can instead backoff incrementally to reduce load on busy resources. lock queue: instead of just retrying to acquire the lock, we can also add the request to a queue and process the queue when the lock gets released.
Conclusion
In This article we have seen how easy it is to implement a cross-instance distributed lock using redis, a similar lock was used on my project Pulsonic Trading Bot to solve the nonce collision issue, which allowed me to have concurrent signing jobs running across multiple instances and have no nonce collisions, and this is one of the reason the bot processed more than 2 million trades with no incidents related to nonce collisions.
Repository
The full code for this article can be found on GitHub.
