Designing Data Flow in an Elixir Trading System

An automated trading system makes hundreds of decisions a day on data it didn't generate. Ticks arrive from a broker. Orders go to a broker. Fills come back. Positions drift. The architectural question underneath all of it: when something breaks, who do I believe?

Five flows, five trust questions

Inside the trading system there are five distinct movements of data, each with its own provenance, its own latency budget, and its own failure mode. They look like one pipeline in a diagram. They aren't.

              ┌───────────────────────────────────┐
              │           Broker / Exchange       │
              └─────┬───────────────────────▲─────┘
                    │                       │
              ticks │                       │ orders / fills
                    ▼                       │
              ┌──────────────┐         ┌────┴─────┐
              │ Market Data  │         │  Account │
              │   Source     │         │  Manager │
              └──────┬───────┘         └────▲─────┘
                     │                      │
                     ▼                      │
              ┌──────────────────────────────────┐
              │         RealtimeTrader           │
              │     (one per active session)     │
              └──┬───────────┬──────────────▲────┘
                 │           │              │
        persist  │ aggregate │     reconcile│
                 ▼           ▼              │
          ┌──────────┐ ┌──────────────┐ ┌───┴────────────┐
          │ Database │ │ Portfolio    │ │ Broker (truth) │
          └──────────┘ │ Server       │ └────────────────┘
                       └──────────────┘

Inbound — ticks from the market data source into the engine
Outbound — strategy decisions become orders
Persistence — the audit trail in Postgres
Reconciliation — the periodic disagreement check against the broker
Aggregation — portfolio-level rollup across sessions

Each one is built to answer a single question.

Inbound: is this candle complete?

Ticks arrive from the data source as a stream of {symbol, price, volume, timestamp} events. They flow through a candle assembler — partial candles get broadcast to the UI continuously, but the strategy only runs when a bar closes.

The trust question is whether the candle the strategy sees represents a finalized window of market activity, or an in-progress one that may still change. Two consequences:

Subscribe to the feed before fetching warmup history. Otherwise the candle that forms between the historical fetch and the live subscription disappears into a gap nobody notices.
Strategies only run on the :new branch of the candle manager. When the data source replaces a candle it previously sent (same timestamp, updated values), the candle manager returns {:replaced, ...} — and the strategy does not run again. Running on replacements means the strategy can reverse a decision it just made.

# In the RealtimeTrader's handle_info for new candles
# The 5 is the max-recent window in which replacement is allowed
case CandleManager.add_or_replace(candles, new_candle, 5) do
  {:new, updated_candles} ->
    state = %{state | candles: updated_candles}
    {:noreply, maybe_run_strategy(state, updated_candles)}

  {:replaced, updated_candles, _index} ->
    # Update state, but do not re-run the strategy
    {:noreply, %{state | candles: updated_candles}}
end

Everything downstream assumes the data it operates on is finalized.

Outbound: did this order make it?

When a strategy returns a decision tuple — :nothing, :submit_order, :modify_orders, or :cancel_orders — the system hands it to a decision handler. The trust question is whether an order the system thinks it has placed actually exists at the broker. There are at least three places where this can go wrong: a network error during submission, a broker rejection silently swallowed, or a process crash between deciding and acknowledging.

So the order goes to Postgres before it goes to the broker. If the system crashes between "decided" and "acknowledged," the order still exists in the database. The audit trail is intact.

def handle_decision({:submit_order, order, new_strat_state}, state, _opts) do
  state = update_strategy_state(state, new_strat_state)

  # 1. Persist FIRST — audit trail before broker submission
  order = persist_order(order)

  # 2. Then submit through the broker-agnostic dispatch chain
  state.account_manager.module.submit_order(state.session.id, order)

  state
end

The state.account_manager.module dispatch is the seam the broker-agnostic adapter layer provides — the decision handler doesn't know whether it's talking to TradeStation or Tradovate, and doesn't need to.

Persistence: if we crash, what can we reconstruct?

The persistence model is the most consequential design decision in the system, because it determines what survives a process restart. The rule is simple and uncomfortable:

Persist only what you cannot recompute from external sources.

What that rules in, and what it rules out:

Orders are persisted — they're not recoverable from anywhere else.
Trades are persisted — calculated and written asynchronously by an Oban job, used for accounting.
Position records are persisted — the system's idea of "where we stand," reconciled periodically against the broker.
Account balances are persisted — but only on change, not on every poll.
Candles are not persisted. They can always be re-fetched.
Strategy state is not persisted. It must be reconstructible from candle history.
Portfolio heat is not persisted. It's recomputed from running traders.

This is the same triad the order-book post arrived at from inside the GenServer: the broker is the source of truth, the database is the audit trail, and GenServer state is the working copy.

Reconciliation: is the broker still telling us the same thing?

One sentence underwrites the whole architecture: the broker is the source of truth. Whatever the system thinks the position is, whatever the database says, whatever the GenServer holds in memory — none of it is authoritative. The broker's account state is.

Two periodic reconciliations enforce this:

Position — every 30 seconds, the system asks the broker for the current position and compares it to what it has in memory and in the database.
Orders — every 5 minutes, the system asks the broker for the current open orders and reconciles against its own list.

If they disagree, the broker wins. Always.

The alternative is worse. A system that trusts its own state can drift indefinitely. Reconciliation costs two API calls per minute and saves the system from any number of silent failure modes — partial fills not reported, orders cancelled out of band, positions adjusted by a margin desk.

# Scheduled at startup, rescheduled after each handler runs
Process.send_after(self(), :reconcile_orders, 300_000)    # 5 minutes
Process.send_after(self(), :reconcile_position, 30_000)   # 30 seconds

Reconciliation is also why GenServer state can be cheap. If it's only authoritative for thirty seconds at a time, losing it costs at most thirty seconds of staleness.

Aggregation: what's the total risk right now?

Individual trading sessions don't see each other. They each manage their own orders, position, and strategy state. But risk is a portfolio-level concern — if four sessions are all carrying max-sized positions in correlated instruments, the portfolio is exposed in a way no single session can see.

Aggregation is the fifth flow: each trader reports its risk contribution to a separate PortfolioServer process, which sums across sessions and pushes a portfolio view back down to each trader.

RealtimeTrader A ──┐
RealtimeTrader B ──┼──→  PortfolioServer  ──→  heat / warn / block
RealtimeTrader C ──┘                            (back to each trader)

The trust question here is the inverse of the others: not "do I believe this data," but "do I have all of it." The PortfolioServer is the only place in the system that knows the answer to "what is total exposure right now," and the answer is meaningless if even one session's contribution is missing.

Two design choices follow:

The PortfolioServer rebuilds from the traders, not the database. On restart, it loads portfolio configuration from Postgres but holds no heat state. As each surviving RealtimeTrader re-registers, it pushes its current risk contribution back in. The traders are the source of truth for their own state.
Exits are never blocked. Risk management can prevent new entries when heat is high, but a trader trying to close a position is never refused. Safety rails should protect, not trap.

The principles underneath

Five flows, five trust questions, and a small set of principles that show up everywhere:

The broker is the source of truth. Reconcile periodically. Don't trust the system's own state for long.
Write to the database before the network call, not after. If the call fails, the intent survives.
Persist only what cannot be recomputed. Everything else is in-memory and expendable.
Heavy work goes to Oban. The GenServer's callback chain stays responsive.
The owner of a piece of data is the only one allowed to declare it canonical. Traders own their own risk; the PortfolioServer owns the aggregate; the broker owns the truth.
Exits are never blocked. Closing a position must always be permitted.

None of these are original. All of them are uncomfortable to apply consistently. The easiest design — one source of state, one process, one database table — is also the first to break when reality disagrees with it.

What this post deliberately doesn't cover

The diagram above traces the macroscopic flow. It says nothing about how a raw WebSocket tick actually becomes a candle that triggers a decision that becomes an order. That's a different post — the interior of the engine, the actual handle_info chain that runs every time a tick arrives.

This one's about the shape. The next one is about the path.

Five flows, five trust questions

Inbound: is this candle complete?

Outbound: did this order make it?

Persistence: if we crash, what can we reconstruct?

Reconciliation: is the broker still telling us the same thing?

Aggregation: what's the total risk right now?

The principles underneath

What this post deliberately doesn't cover

Discussion

More in this series

GenServer State Management in Elixir: A Production Order Book

Designing a Broker-Agnostic API Layer in Elixir

The Broker Evaluation Gauntlet: Four Futures APIs Compared