Part of the IBKR Saga
The supervision tree was solid. The architecture handled crashes and reconnections gracefully. The next step was actually talking to a broker. Interactive Brokers (IBKR) was the obvious first choice — deep liquidity, low commissions, a real API. I started reading their authentication docs. Then I stopped and read them again.
This is the story of hitting a wall in Elixir's crypto support, and the port I built to get around it.
The problem: IBKR's authentication is not standard OAuth
Most broker APIs authenticate with some flavor of OAuth 2.0. You get a token, you refresh it periodically, you move on with your life. IBKR uses OAuth 1.0a — the older, more complex version — and layers a custom token derivation on top.
IBKR's "live session token" flow works roughly like this:
- You have an access token and access token secret (standard OAuth 1.0a)
- You have a consumer key and a Diffie-Hellman prime provided by IBKR
- You generate a DH challenge using the prime
- You sign the request using your private signature key
- IBKR responds with their DH response
- You compute a shared secret from the DH exchange
- You use that shared secret to compute a live session token — by applying a specific HMAC construction that uses your private encryption key (separate from the signature key)
- That live session token is what actually authenticates your API calls
Steps 1-6 are annoying but doable. Step 7 is where things break.
Why Elixir couldn't handle it
The DH key exchange in IBKR's scheme uses specific parameters — a custom prime and generator — that need to be handled as raw big-integer arithmetic. Erlang's :crypto module wraps OpenSSL and supports DH operations. The problem wasn't DH itself.
The problem was the specific combination of operations IBKR requires to derive the live session token from the DH shared secret. Their scheme requires computing an HMAC where the key material is derived from a PKCS#8 PEM-encoded private encryption key, combined with the DH shared secret, using a specific byte-level construction that doesn't map cleanly to any single function in Erlang's crypto module.
To be precise: I needed to:
- Parse a PKCS#8 PEM private key and extract the raw key bytes
- Perform a specific prepend-and-hash operation to combine the DH secret with those bytes
- Use the result as an HMAC key to compute the live session token
Python's cryptography library handles this cleanly. Erlang's :public_key module can parse PEM files, but extracting the raw key material in the exact format IBKR's scheme requires meant either reimplementing chunks of the PKCS#8 parsing at the binary level in Elixir or finding another way.
I spent about two days trying. I could parse the PEM. I could do DH. I could do HMAC. But the specific byte manipulation between those steps — the part that turns a DH shared secret and a private encryption key into IBKR's expected input format — required either a NIF or a language that already had the right primitives.
The decision: port, NIF, or just use Python?
NIF: Fast, no serialization overhead, but a crash takes down the entire VM. For a trading system managing real capital, that's a non-starter for something as peripheral as authentication. NIFs are for hot paths, not for a token refresh that runs once every few hours.
Port with a C program: Safer — the C code runs in a separate OS process, so a crash doesn't take down the BEAM. But writing correct C for crypto operations is its own kind of risk, and I'd still need to link OpenSSL. More work, same result as Python, with more surface area for memory bugs.
Port with a Python script: Python's cryptography library already does exactly what I need. Small script, starts once on session init, subsequent calls are IPC round-trips.
I chose Python. Decision time: about ten minutes.
How Elixir ports work
The BEAM talks to the Python process via a length-prefixed binary protocol. Each runs in its own crash domain — if the Python side segfaults, the BEAM keeps running and gets a message. If the BEAM dies, the OS cleans up the child process.
The port open call:
port = Port.open(
{:spawn, "python priv/python/ibkr_auth/main.py"},
[:binary, :nouse_stdio, {:packet, 4}]
)
Three options worth understanding:
:binarymeans data arrives as binaries, not Erlang iolists.:nouse_stdiodisconnects the child process's stdin and stdout from the port. Instead, the BEAM uses file descriptors 3 and 4 for communication. This matters because Python uses stdout itself — for print statements, logging, and library internals. You don't want that mixed in with your protocol frames.{:packet, 4}tells the BEAM to automatically prepend a 4-byte big-endian unsigned integer (the message length) to everything you send viaPort.command/2, and to strip that header from everything you receive. You never see the framing on the Elixir side — you just send and receive binaries.
Communication is message-based. Sending is a direct call; receiving happens in handle_info:
# Send — the {:packet, 4} framing is added automatically
Port.command(port, JSON.encode_to_iodata!(payload))
# Receive — arrives as a message to the port owner
def handle_info({port, {:data, data}}, state) do
result = JSON.decode!(data)
# handle result...
end
No sockets, no HTTP, no message queues. The framing is handled by the BEAM on the send side and manually implemented on the Python side.
Wrapping the port in a GenServer
A bare port is fine for one-shot operations, but for a long-running system you want supervision. The GenServer owns the port, handles the command dispatch, and bridges the async port response back to the synchronous caller:
defmodule MyApp.IBKR.PortBridge do
use GenServer, restart: :transient
def start_link(%IBKRCreds{} = creds) do
GenServer.start_link(__MODULE__, creds)
end
def init(%IBKRCreds{} = creds) do
port = Port.open(
{:spawn, "python priv/python/ibkr_auth/main.py"},
[:binary, :nouse_stdio, {:packet, 4}]
)
{:ok, %{creds: creds, port: port, caller: nil}}
end
def handle_call(:get_session_headers, from, %{port: port} = state) do
Port.command(port, JSON.encode_to_iodata!(MyApp.Commands.GetSessionHeaders.new()))
{:noreply, %{state | caller: from}}
end
def handle_info({_port, {:data, data}}, %{caller: caller} = state) do
data
|> JSON.decode!()
|> case do
%{"status" => "ok", "payload" => payload} -> GenServer.reply(caller, {:ok, payload})
%{"status" => "ok"} -> GenServer.reply(caller, :ok)
%{"status" => "error", "message" => msg} -> GenServer.reply(caller, {:error, msg})
end
{:noreply, %{state | caller: nil}}
end
end
IBKR's OAuth 1.0a scheme requires each request to be individually signed — the method, full URL, and live session token all feed into the signature. That meant every call to the IBKR API went through the port:
def handle_call({:sign_request, method, url, session_token}, from, %{port: port} = state) do
Port.command(port, JSON.encode_to_iodata!(MyApp.Commands.GetRequestHeaders.new(method, url, session_token)))
{:noreply, %{state | caller: from}}
end
And the call site in the account manager:
{:ok, signed_url, headers} = PortBridge.sign_request(creds, "POST", url, session_token)
Finch.build("POST", signed_url, headers, payload)
|> Finch.request(MyApp.Finch)
The key pattern: handle_call returns {:noreply, ...} and stores the caller in state. The port response arrives later as a handle_info message, at which point the GenServer calls GenServer.reply/2 to complete the original call. To the caller, it looks synchronous — they called a function and got a result back.
The GenServer owns the port. When the GenServer terminates, the port closes and the external process receives a SIGTERM. The owning process's lifecycle controls the port's lifecycle.
restart: :transient — restarts on abnormal exits, stays down on clean shutdowns. If the Python process dies unexpectedly, the supervisor restarts the GenServer and opens a fresh port. If the session ends cleanly, it stays down.
The Python side
The Python process wasn't a one-shot script. It ran a persistent command loop — started once when the GenServer initialized, and kept alive for the lifetime of the IBKR session. Elixir sent commands to it; it sent results back.
Because the port used :nouse_stdio, the script couldn't use stdin and stdout directly. It opened file descriptors 3 and 4 instead:
import os
from struct import unpack, pack
import json
def setup_io():
return os.fdopen(3, "rb"), os.fdopen(4, "wb")
def read_message(input_f):
header = input_f.read(4)
if len(header) != 4:
return None # EOF — port closed
(total_msg_size,) = unpack("!I", header)
payload = input_f.read(total_msg_size)
return json.loads(payload)
def write_result(output_f, msg):
result = json.dumps(msg).encode("utf-8")
output_f.write(pack("!I", len(result)))
output_f.write(result)
output_f.flush()
This is the manual half of what {:packet, 4} handles automatically on the Elixir side. Read 4 bytes, interpret as a big-endian unsigned int, read that many bytes, parse as JSON. Write length header, write payload, flush. The same framing on both ends.
The main loop dispatched commands by name:
from ibind_api import IbindApi
def run():
input_f, output_f = setup_io()
api = None
while True:
msg = read_message(input_f)
if msg is None:
break
match msg["command"]:
case "init":
api = IbindApi(
access_token=msg["access_token"],
access_token_secret=msg["access_token_secret"],
consumer_key=msg["consumer_key"],
dh_prime=msg["dh_prime"],
encryption_key_fp=msg["encryption_key_fp"],
signature_key_fp=msg["signature_key_fp"],
)
output = {"status": "ok"}
case "get_session_headers":
output = api.get_session_headers()
case "calculate_session_token":
output = api.calculate_session_token(
msg["diffie_hellman_response"],
msg["live_session_token_signature"]
)
case "get_request_headers":
output = api.get_request_headers(
msg["request_method"],
msg["request_url"],
msg["live_session_token"]
)
case _:
output = {"status": "error", "message": "unknown command"}
write_result(output_f, output)
The IbindApi class was a thin wrapper around the ibind Python library, which implements IBKR's OAuth 1.0a scheme — including the DH key exchange, the PKCS#8 key parsing, and the specific HMAC construction. That's what I actually needed Python for: not to write crypto code, but to use a library that already had it.
from ibind.oauth import oauth1a
class IbindApi(oauth1a.OAuth1aConfig):
def get_session_headers(self):
self.prepend, extra_headers, self.dh_random = oauth1a.prepare_oauth(self)
headers = oauth1a.generate_oauth_headers(
oauth_config=self,
request_method="POST",
request_url=f"{self.base_url}{self.live_session_token_endpoint}",
extra_headers=extra_headers,
signature_method="RSA-SHA256",
prepend=self.prepend
)
return {"status": "ok", "payload": headers, "command": "get_session_headers"}
def calculate_session_token(self, dh_response, lst_signature):
token = oauth1a.calculate_live_session_token(
dh_prime=self.dh_prime,
dh_random_value=self.dh_random,
dh_response=dh_response,
prepend=self.prepend
)
valid = oauth1a.validate_live_session_token(
live_session_token=token,
live_session_token_signature=lst_signature,
consumer_key=self.consumer_key
)
if valid:
return {"status": "ok", "payload": token, "command": "calculate_session_token"}
else:
return {"status": "error", "message": "Failed to validate token"}
The whole thing was under 120 lines across two files. The crypto operations that took two days to fail at in Elixir took an afternoon in Python — not because Python is better at crypto, but because ibind already implemented exactly this scheme and Erlang's stdlib doesn't have an equivalent.
Fitting the port into the supervision tree
The PortBridge GenServer started as a child of the same flat DynamicSupervisor described in the previous post. It sat alongside the other IBKR session processes:
MyApp.Servers (DynamicSupervisor, one_for_one)
├── RealtimeTrader
├── IBKRAccountManager
├── IBKRDataSource
├── PortBridge (owns Python port)
└── ... other session processes
PortBridge should logically be grouped with the IBKR session it serves, but in the flat structure it was just another sibling. If PortBridge crashed, the IBKRAccountManager wouldn't restart with it — it would just fail on its next token refresh call and eventually restart itself. Since token refresh was infrequent, the staggered restart never mattered in practice. But it reinforced the argument for per-session supervisors.
What the credential model looked like
The IBKR integration required storing more credential data than any other broker. For comparison, TradeStation needs a client ID, client secret, and a refresh token. IBKR needed:
# From the Ash resource (now removed)
create table(:interactive_brokers_oauth1_creds) do
add :access_token, :text, null: false
add :access_token_secret, :text, null: false
add :consumer_key, :text, null: false
add :dh_prime, :text, null: false
add :private_encryption_key_path, :text, null: false
add :private_signature_key_path, :text, null: false
add :live_session_token_endpoint, :text, null: false,
default: "/oauth/live_session_token"
add :base_url, :text, null: false,
default: "https://api.ibkr.com/v1/api"
add :oauth_rest_url, :text, null: false,
default: "https://api.ibkr.com/v1/api"
end
Six credential fields versus TradeStation's three, two separate PEM key files on disk, a Diffie-Hellman prime per-account, and a dedicated endpoint — just for the token exchange.
This is actual migration code from the repo — the table has been dropped, but the migration and its resource snapshot are still in version control. The credential model alone tells you how much heavier IBKR's authentication is compared to a standard OAuth 2.0 flow.
Lessons from the port
The BEAM community reaches for NIFs because they're faster. But for anything off the hot path, a port gives you crash isolation for free. The Python process can segfault and your BEAM keeps running. A NIF segfault takes down the entire node — and in a trading system, that means every active session goes down with it.
People worry about overhead. Here's what actually costs: JSON encode/decode and the IPC round-trip per request — not process startup. The Python process starts once; that hundred-millisecond cost is one-time on session init. What you pay per request is the round-trip: encode on the Elixir side, Python reads, computes, writes back, Elixir decodes. Real latency, synchronous — the GenServer blocks until Python responds.
For the request volume of a single-account trading system, this was acceptable. For anything with high-frequency API usage or multiple concurrent sessions hitting the same port, it would become a bottleneck. Go in with eyes open about which category your use case falls into.
Language bridges should be narrow. I kept the Python process to one domain: IBKR's OAuth 1.0a crypto. No session logic, no retry, no error recovery — all of that lived in Elixir. The port was a crypto coprocessor, not a second application.
I spent two days trying to make Erlang's :crypto do something it wasn't built for. The Python solution took an afternoon. Pragmatism isn't admitting defeat — it's recognizing that the interesting problem is building a trading system, not reimplementing a crypto primitive.
The port held
With the Python port in place, IBKR could authenticate, establish a live session, and start placing orders. The token refresh ran on a timer, the port restarted cleanly if anything went wrong, and the supervision tree treated it like any other process.
Orders were flowing. Positions were being managed. The reconciliation loop confirmed the broker and system agreed on position state. Everything looked right.
The IBKR Saga
- → Building a Python Port in Elixir to Crack IBKR's Encryption
- · When the Broker Lies: Debugging Stop Orders That Silently Fail
- · Why I Fired My Broker (and What I Replaced Them With)