Research Preview — Hardware Results Coming Soon

From Software
to Silicon.

We processed 85 million market events across 4 database architectures. Now we're putting the order book on an FPGA — 4,500 instruments, always live, queryable in under a microsecond.

0
Instruments Live
<1µs
Query Target
0
vs Reconstruct
0
Book Depth Levels
The Problem

The Reconstruction Problem

Every time you need an order book snapshot, software must replay all updates from storage.

Software Reconstruct FPGA Query (target)
What happens Replay 48,860 updates from storage Read hardware registers via bus
Latency ~200ms per instrument <1µs
Query 50 stocks 50 × 200ms = 10 seconds 50 × 1µs = 50µs
Book always current? No — stale when built Yes — updated on every tick
# WITHOUT FPGA: reconstruct from scratch every time
book = reconstruct_lob(storage_path, instrument_id=65721)
# Takes: ~200ms — and the book is already stale

# WITH FPGA: book always live in registers, instant read
book = fpga_client.get_book(65721)
# Takes: <1µs — reflects the latest tick
The Architecture

End-to-end — from raw feed to live order books.

Raw Feed

Binary protocol
SET ITCH format

Binary Encoder

Custom fixed-width
protocol format

High-Speed Bus

PCIe Gen3 ×16
12 GB/s bandwidth

FPGA Card

UltraScale+
Accelerator

FPGA Internals

Parser

Binary to structured fields. 1 clock cycle per message.

Book Engine

4,500 books in registers. 10 levels × bid + ask. N/C/D shifting in parallel.

Query Interface

Host reads any book via control bus. Full 10-level snapshot in <1µs.

Update Stream

Streams best bid/ask/spread to host for analytics.

"The FPGA does one thing: keep all order books correct, always. Analytics, surveillance, and visualization run on the host — where software excels."

FPGA Accelerator High-Level Synthesis Python Docker PCIe
The Hardware

Purpose-Built for Market Data

FPGA
UltraScale+
Architecture
Memory
8 GB HBM2
460 GB/s bandwidth
Bus
Gen3 ×16
12 GB/s throughput
Registers
~35 MB
1.1 MB used (0.003%)

"The entire exchange order book state fits in 0.003% of available register memory."

The Design

Every Level is a Register

Why ARRAY_PARTITION changes everything

Normal software stores price levels in an array. When a new level inserts at position 3, the CPU must shift levels 4 through 10 down — one iteration at a time. That is 10 iterations, sequential, one after another.

On an FPGA, the HLS pragma ARRAY_PARTITION complete tells the compiler to map each array element to a physical flip-flop register. The "loop" becomes parallel wiring — all 10 levels shift simultaneously in a single clock cycle.

The result: what takes software 10 iterations takes hardware 1 cycle. Not 10x faster — fundamentally different.

// All 10 price levels become physical registers
#pragma HLS ARRAY_PARTITION variable=bid_prices complete dim=1

// This "loop" executes in ONE clock cycle — parallel hardware
for (int i = MAX_LEVELS-1; i > idx; i--) {
    prices[i] = prices[i-1];  // 10 muxes, not 10 iterations
}

Software — Sequential

10 iterations, one after another. Each shift depends on the previous. Total: 10 clock cycles minimum.

FPGA — Parallel

All 10 levels shift simultaneously in 1 cycle. Physical wiring replaces sequential logic. Total: 1 clock cycle.

The Baseline

Software results — real measurements from 85M records.

Before building hardware, we established a rigorous software baseline. These are actual measured results from our High-Frequency Data Analytics study.

A — Columnar In-Memory B — Columnar On-Disk C — Embedded Analytical D — Traditional RDBMS
Query A  Measured B  Measured C  Measured D  Measured
OHLCV 1-min7ms65ms33ms122ms
VWAP calculation7ms67ms22ms76ms
Spread analysis (55M rows)13ms172ms52ms106ms
LOB batch reconstruction845ms485ms267ms881ms

LOB Single-Update Performance

4.1µs Python numpy per update  Measured
~2µs A native  Measured
<200ns FPGA target  Coming Soon

"Software results are real measurements from our High-Frequency Data Analytics study. FPGA targets are design specifications. Hardware results will be published upon completion."

The Targets

FPGA Performance Targets

Design specifications for the hardware implementation. These targets will be validated during hardware integration testing.

Metric Software Best  Measured FPGA Target  Coming Soon Status
Single LOB update 2µs (A) <200ns Coming Soon
Book query latency 200ms (reconstruct) <1µs Coming Soon
Simultaneous books 1 (per query) 4,500 (all live) Coming Soon
Feed processing 244K updates/s ~300M updates/s Coming Soon
Jitter (p99/p50) ~10× ~1× (deterministic) Coming Soon
Book Query Latency — Software Reconstruct vs FPGA Target (log scale)
The Roadmap

Development Roadmap

1

Data Foundation

85M records parsed, 4 databases benchmarked, LOB validated against independent reference.

DONE
2

Binary Format

Custom fixed-width protocol designed. Encoder built. Test vectors generated and validated.

DONE
3

HLS Kernels Written

Parser, order book engine, query interface — all written in C++ with HLS pragmas.

DONE
4

C-Simulation

Validate FPGA logic matches software golden reference. Bit-accurate functional verification.

IN PROGRESS
5

HLS Synthesis

Resource utilization analysis, timing closure at 300MHz target frequency.

PENDING
6

Hardware Integration

DMA transfers + block design integration for the FPGA accelerator card.

PENDING
7

Hardware Validation

Full trading day replay. All 4,500 instruments. Latency measurement and correctness verification.

PENDING
Get Involved

Interested in FPGA-Accelerated
Data Processing?

We're building this now. Want early access to the results, or have a similar challenge?

Start a Conversation

Read the Software Study →

🌏 We welcome international engagements — serving clients across Southeast Asia and beyond.

contact@infozense.com  |  +66-82-242-4008  |  Bangkok, Thailand