85 million market events. 4 database architectures. 1 question — what does it take to turn high-volume data into actionable analytics?
End-to-end data engineering — from raw protocol parsing to production analytics. Not theory. Not slides. Working code, real data, measured results.
SET ITCH tick data from the Stock Exchange of Thailand. Every order, every trade, every cancellation — captured with nanosecond precision.
"This study uses real production market data from the Stock Exchange of Thailand — not synthetic or simulated data. All benchmark results are from actual measurements on actual code."
Every component built, tested, and measured. From raw exchange feed to interactive dashboard — fully containerized, deployable with one command.
Binary protocol
SET ITCH format
16 message types
Nanosecond decode
Parquet format
Compressed output
A / B / C / D
Same schema
Benchmarked queries
Surveillance scoring
Interactive UI
Real-time alerts
Six queries. Four databases. Same data, same machine, same schema. p50 latency over 5 runs.
| Query | A | B | C | D |
|---|---|---|---|---|
| OHLCV 1-min | 7ms | 65ms | 33ms | 122ms |
| Range scan | 21ms | 104ms | 47ms | 196ms |
| Market-wide agg | 47ms | 70ms | 23ms | 145ms |
| Top-N by volume | 38ms | 69ms | 23ms | 137ms |
| Surveillance scoring | 65ms | 82ms | 34ms | 365ms |
| VWAP calculation | 7ms | 67ms | 22ms | 76ms |
"A dominates single-instrument time-series queries — 7ms OHLCV means your traders see candles before competitors refresh their screen. C wins market-wide aggregations. Both under 100ms. D is 3-10x slower."
Order book data is harder — more rows, more joins, more state. Here is where architectures reveal their limits.
| Query | A | B | C | D |
|---|---|---|---|---|
| Spread analysis | 13ms | 172ms | 52ms | 106ms |
| Best bid/ask range | 30ms | 141ms | 57ms | 210ms |
| Market-wide L1 | 149ms | 62ms | 35ms | 6,257ms |
| Top-N activity | 181ms | 80ms | 45ms | 690ms |
| Bid/Ask imbalance | 1,449ms | 516ms | 170ms | 11,283ms |
| Depth analysis | 333ms | 186ms | 115ms | 9,982ms |
"Pattern holds at scale. A wins targeted queries — 13ms spread analysis means real-time order book monitoring. C wins broad scans. D collapses on 55M rows — 6 to 11 seconds per query, unusable for production surveillance."
Limit Order Book (LOB) reconstruction is not a query — it is a state machine. Every New, Change, and Delete message alters the book. New inserts and shifts levels down. Change updates price or quantity. Delete removes and shifts levels up. Getting it wrong means your analytics are wrong.
The tempting approach — SELECT last(qty) GROUP BY level — runs fast but returns wrong results. It captures the last state of each level without processing the incremental updates that define the correct book. Fast-and-wrong is worse than slow-and-right.
| Method | Time | Correct? |
|---|---|---|
| C fetch + processing | 267ms | Yes |
| B fetch + processing | 485ms | Yes |
| A native processing | 845ms | Yes |
| D fetch + processing | 881ms | Yes |
| SQL shortcut (any DB) | ~165ms | WRONG |
"The database matters less than correctness. Fast-and-wrong is worse than slow-and-right. Results validated against independent reference implementation — 100% match on all comparable timestamps."
Speed is not free. Every architecture makes trade-offs between storage, ingestion rate, and query latency.
| Database | Size | Notes |
|---|---|---|
| B | 0.17 GB | Best compression (36x vs A) |
| C | 1.24 GB | Columnar, embedded |
| D | 1.33 GB | Row-based storage |
| A | 6.03 GB | In-memory, fastest queries |
| Database | Time | Rate |
|---|---|---|
| B | 43s | 243K rows/s |
| A | 145s | 72K rows/s |
| D | 243s | 43K rows/s |
"B compresses 36x better than A and ingests 6x faster. A trades storage for speed — all data in RAM means sub-10ms queries."
Independent benchmark for educational purposes. 85 million records, 22 trading days, p50 latency over 5 runs, all databases containerized on the same machine. Results reflect our test environment and may vary.
This is what the pipeline produces — real-time surveillance scoring across all listed securities. One anomalous stock, six signals, one verdict.
This is a flagged anomaly detected by the scoring engine — not investment advice. Alert data is from real market activity.
+28.6% in a single session — far exceeds normal volatility
20.4x rolling average — concentrated buying pressure
97.7% buy-side aggressor — far above 50% neutral
Average size drops — classic retail participation pattern
Tight spread during run-up — artificial liquidity holding price
Sharp reversal after peak — dump confirmed as buying collapses
Capital markets is just the demo. The same platform analyzes any high-frequency event data — wherever milliseconds matter.
The pattern is always the same: massive volume, nanosecond precision, real-time decisions.
Platform-agnostic means we benchmark objectively, recommend honestly, and deliver the right solution — not the most expensive one.
This study is the output of steps 1–4. The page you're reading is step 5.
"What data do you have?"
Audit your data landscape, volumes, latency requirements, and existing infrastructure.
"What architecture fits?"
Benchmark candidate databases on your actual data. No assumptions — measured results.
"From pipeline to dashboard"
Build the end-to-end pipeline — ingestion, storage, analytics, visualization. Fully containerized.
"One command to production"
Deploy on-premise, cloud, or hybrid. Docker-based, reproducible, and documented.
"It works — now make it faster"
Monitor performance, tune queries, scale infrastructure. Continuous improvement.
This study uses licensed market data obtained through commercial agreement. Infozense is not affiliated with the Stock Exchange of Thailand. No market data is distributed through this website. This content is for educational and analytical purposes only and does not constitute investment advice.