Research Preview — Hardware Results Coming Soon

From Software
to Silicon.

We processed 85 million market events across 4 database architectures. Now we're putting the order book on an FPGA — 4,500 instruments, always live, queryable in under a microsecond.

See the Architecture Read the Software Study

Instruments Live

<1µs

Query Target

vs Reconstruct

Book Depth Levels

The Problem

The Reconstruction Problem

Every time you need an order book snapshot, software must replay all updates from storage.

	Software Reconstruct	FPGA Query (target)
What happens	Replay 48,860 updates from storage	Read hardware registers via bus
Latency	~200ms per instrument	<1µs
Query 50 stocks	50 × 200ms = 10 seconds	50 × 1µs = 50µs
Book always current?	No — stale when built	Yes — updated on every tick

                # WITHOUT FPGA: reconstruct from scratch every time

                book = reconstruct_lob(storage_path, instrument_id=65721)

                # Takes: ~200ms — and the book is already stale

                # WITH FPGA: book always live in registers, instant read

                book = fpga_client.get_book(65721)

                # Takes: <1µs — reflects the latest tick

The Architecture

End-to-end — from raw feed to live order books.

Raw Feed

Binary protocol
SET ITCH format

→

Binary Encoder

Custom fixed-width
protocol format

→

High-Speed Bus

PCIe Gen3 ×16
12 GB/s bandwidth

→

FPGA Card

UltraScale+
Accelerator

FPGA Internals

▶

Parser

Binary to structured fields. 1 clock cycle per message.

■

Book Engine

4,500 books in registers. 10 levels × bid + ask. N/C/D shifting in parallel.

⇄

Query Interface

Host reads any book via control bus. Full 10-level snapshot in <1µs.

↑

Update Stream

Streams best bid/ask/spread to host for analytics.

"The FPGA does one thing: keep all order books correct, always. Analytics, surveillance, and visualization run on the host — where software excels."

FPGA Accelerator High-Level Synthesis Python Docker PCIe

The Hardware

Purpose-Built for Market Data

FPGA

UltraScale+

Architecture

Memory

8 GB HBM2

460 GB/s bandwidth

Bus

Gen3 ×16

12 GB/s throughput

Registers

~35 MB

1.1 MB used (0.003%)

"The entire exchange order book state fits in 0.003% of available register memory."

The Design

Every Level is a Register

Why ARRAY_PARTITION changes everything

Normal software stores price levels in an array. When a new level inserts at position 3, the CPU must shift levels 4 through 10 down — one iteration at a time. That is 10 iterations, sequential, one after another.

On an FPGA, the HLS pragma ARRAY_PARTITION complete tells the compiler to map each array element to a physical flip-flop register. The "loop" becomes parallel wiring — all 10 levels shift simultaneously in a single clock cycle.

The result: what takes software 10 iterations takes hardware 1 cycle. Not 10x faster — fundamentally different.

                        // All 10 price levels become physical registers

                        #pragma HLS ARRAY_PARTITION variable=bid_prices complete dim=1

                        // This "loop" executes in ONE clock cycle — parallel hardware

                        for (int i = MAX_LEVELS-1; i > idx; i--) {

                            prices[i] = prices[i-1];  // 10 muxes, not 10 iterations

                        }

↻

Software — Sequential

10 iterations, one after another. Each shift depends on the previous. Total: 10 clock cycles minimum.

⚔

FPGA — Parallel

All 10 levels shift simultaneously in 1 cycle. Physical wiring replaces sequential logic. Total: 1 clock cycle.

The Baseline

Software results — real measurements from 85M records.

Before building hardware, we established a rigorous software baseline. These are actual measured results from our High-Frequency Data Analytics study.

A — Columnar In-Memory B — Columnar On-Disk C — Embedded Analytical D — Traditional RDBMS

Query	A Measured	B Measured	C Measured	D Measured
OHLCV 1-min	7ms	65ms	33ms	122ms
VWAP calculation	7ms	67ms	22ms	76ms
Spread analysis (55M rows)	13ms	172ms	52ms	106ms
LOB batch reconstruction	845ms	485ms	267ms	881ms

LOB Single-Update Performance

4.1µs Python numpy per update Measured

~2µs A native Measured

<200ns FPGA target Coming Soon

"Software results are real measurements from our High-Frequency Data Analytics study. FPGA targets are design specifications. Hardware results will be published upon completion."

The Targets

FPGA Performance Targets

Design specifications for the hardware implementation. These targets will be validated during hardware integration testing.

Metric	Software Best Measured	FPGA Target Coming Soon	Status
Single LOB update	2µs (A)	<200ns	Coming Soon
Book query latency	200ms (reconstruct)	<1µs	Coming Soon
Simultaneous books	1 (per query)	4,500 (all live)	Coming Soon
Feed processing	244K updates/s	~300M updates/s	Coming Soon
Jitter (p99/p50)	~10×	~1× (deterministic)	Coming Soon

Book Query Latency — Software Reconstruct vs FPGA Target (log scale)

The Roadmap

Development Roadmap

Data Foundation

85M records parsed, 4 databases benchmarked, LOB validated against independent reference.

DONE

Binary Format

Custom fixed-width protocol designed. Encoder built. Test vectors generated and validated.

DONE

HLS Kernels Written

Parser, order book engine, query interface — all written in C++ with HLS pragmas.

DONE

C-Simulation

Validate FPGA logic matches software golden reference. Bit-accurate functional verification.

IN PROGRESS

HLS Synthesis

Resource utilization analysis, timing closure at 300MHz target frequency.

PENDING

Hardware Integration

DMA transfers + block design integration for the FPGA accelerator card.

PENDING

Hardware Validation

Full trading day replay. All 4,500 instruments. Latency measurement and correctness verification.

PENDING

Get Involved

Interested in FPGA-Accelerated
Data Processing?

We're building this now. Want early access to the results, or have a similar challenge?

Start a Conversation

Read the Software Study →

🌏 We welcome international engagements — serving clients across Southeast Asia and beyond.

contact@infozense.com | +66-82-242-4008 | Bangkok, Thailand

From Softwareto Silicon.

The Reconstruction Problem

End-to-end — from raw feed to live order books.

Raw Feed

Binary Encoder

High-Speed Bus

FPGA Card

FPGA Internals

Parser

Book Engine

Query Interface

Update Stream

Purpose-Built for Market Data

Every Level is a Register

Why ARRAY_PARTITION changes everything

Software — Sequential

FPGA — Parallel

Software results — real measurements from 85M records.

LOB Single-Update Performance

FPGA Performance Targets

Development Roadmap

Data Foundation

Binary Format

HLS Kernels Written

C-Simulation

HLS Synthesis

Hardware Integration

Hardware Validation

Interested in FPGA-AcceleratedData Processing?

From Software
to Silicon.

Interested in FPGA-Accelerated
Data Processing?