Ch 7 of 13

Chapter 8 — Coming Soon

Tick Data Intelligence — 07 of 13

Weak Signals,
Strong Verdicts.

588 → 2,125 → 5,240 → 195 → 214 alerts. How weak signals become strong verdicts.

No single signal is reliable. Price spikes have innocent explanations. Volume surges happen during news events. But when four independent signals align — the probability of coincidence collapses.

See the Funnel Top Alert

Alerts

Score-3

65ms

Detection

Trades

ⓘ About the sample

All 588 / 2,125 / 5,240 / 195 signal counts and the resulting 214 composite alerts come from the same 22-trading-day SET tick data window used throughout this series. The instrument universe is the full SET continuous-trading equity book — common stocks, foreign-board common stocks, and listed ETFs (approximately 800 actively-trading names; ‘4,500 instruments’ referenced elsewhere on the page includes inactive listings, warrants, and derivatives). Alerts have not been validated against regulator-confirmed manipulation cases — they are statistical anomalies meeting the four signal criteria, not adjudicated manipulation findings.

📡 Why this works on SET data

SET ITCH encodes the aggressor side explicitly in every trade message. The aggressor field is set to ‘B’ (buy-initiated) or ‘A’ (ask-initiated) by the exchange when the trade matches — not inferred. On most US and European equity venues, aggressor identification is inferred via the Lee-Ready algorithm (compare trade price to prevailing quote), which has classification errors near the spread midpoint. The 97.7% buy-aggressor figure shown in the Top Alert is exchange-reported, not estimated — there is no model error in this number.

A second structural advantage — the tick grid. SET’s binding tick floor (Chapter 4) means the spread is pinned at one tick approximately 98% of the time, so every trade is unambiguously bid- or ask-initiated and there is no spread-crossing or midpoint-trade ambiguity. Directional pressure expresses through aggressor flow rather than spread compression. The four-signal architecture in this chapter works particularly well because of this — on markets with elastic spreads, the aggressor signal would be noisier and weaker.

Together: the third signal in the funnel (‘Aggressor > 70%’) is measured directly from the protocol AND lives in a market structure where that measurement is more discriminative than it would be elsewhere.

📚 Quick definitions

Composite score — count of signals that fired on a given instrument-window. Range 0 to 4.
Instrument-window — one stock × one continuous-trading session (one day). The unit of analysis.
False positive rate (FPR) — the fraction of alerts that, on review, would not represent actual manipulation. Theoretical (Bayesian) FPRs are stated in the chapter; empirical FPRs require human review of the alert population, which has not been done at scale.
Aggressor side — for each trade, the side (buy or sell) that initiated by hitting a resting order on the other side of the book. Aggressor > 70% means more than 70% of trade volume was initiated by takers on one side.
Size decline — average trade size in the alert window is materially smaller than the instrument’s recent baseline. Often indicative of retail-driven activity rather than institutional flow.

The Signal Funnel

How many events survive each threshold.

An ‘event’ is one instrument × one window where the threshold was met. Each signal alone produces hundreds or thousands of hits. But when multiple independent signals fire on the same event — the false positive rate collapses exponentially.

Any 1 signal fires

~8,000 events

candidate population

Any 2 signals overlap

214 events

composite alerts

Any 3 signals overlap (priority)

20 events

operational investigation threshold

All 4 signals overlap

0 events

never observed in this sample

Why multi-signal scoring works

If each signal has an independent false positive rate of 10%, one signal alone produces too many alerts. Two signals together: 1% false positive. Three signals: 0.1%. Four signals: 0.01%. This is the Bayesian argument popularized by statistician Nate Silver in The Signal and the Noise — each additional signal multiplies the evidence.

The composite score assigns +1 for each signal that fires. Score-2 means two signals aligned. Score-3 means three. The 20 score-3 alerts are the highest priority for investigation.

● All results from real SET ITCH data. Not simulated.

Individual signal thresholds

Signal 1: Abnormal Return

Session return exceeds ±5% — 588 instrument-days flagged.

Signal 2: Volume Surge

Daily volume exceeds 3× the 20-day rolling average — 2,125 flagged.

Signal 3: Aggressor Dominance

Buy-side aggressor ratio exceeds 70% — 5,240 flagged.

Signal 4: Shrinking Trade Size

Average trade size declines by >50% during the run-up — 195 flagged.

The Top Alert

Four panels. One instrument. +28.6% in 25 minutes.

This is the highest-scoring alert in the dataset. All four signals fire simultaneously. The price, volume, aggressor ratio, and trade size tell a coherent story.

+28.6% return, 20.4× volume, 97.7% buy aggressor

Flagged anomaly — not investment advice. Alert data is from real market activity.

Price & Volume — 25-Minute Window

Buy-Side Aggressor Ratio

Average Trade Size

📊 Analyst Track

The four-panel view reveals the manipulation anatomy: (1) price rises sharply, (2) volume surges, (3) nearly all trades are buy-initiated — suggesting a single actor or coordinated group, (4) average trade size shrinks as retail participants pile in. The dump phase shows volume staying high while the aggressor ratio inverts and price collapses.

The Score Distribution

194 score-2 alerts. 20 score-3 alerts. Zero score-4.

The distribution is steep: most multi-signal alerts fire exactly two signals. Twenty alerts fire three signals — these are the highest priority for investigation. No alert fired all four signals simultaneously.

Alert Score Distribution

💼 Business Track

For a regulator with limited investigation capacity, the score provides a prioritization mechanism. Investigate the 20 score-3 alerts first — they represent the intersection of abnormal return, volume surge, and directional dominance. This reduces the investigation queue from thousands of daily anomalies to a manageable 20.

🔧 Engineer Track

The scoring engine runs in 65ms across all 4,500 instruments using Database A. That is fast enough for intraday surveillance — alerts can be generated every 5 minutes throughout the trading session, not just end-of-day. The limiting factor is not compute but analyst bandwidth.

The Bayesian Argument

If signals were independent, each one would cut the FPR 10×.

Nate Silver's insight in The Signal and the Noise: most predictions fail not because the model is wrong but because the base rate is ignored. Multi-signal scoring is Bayesian updating in action.

False Positive Decay by Number of Signals (Log Scale)

With one signal (price > 5%), you flag 588 instrument-days. Assume 90% are innocent — that is 529 false positives. Unworkable.

Add a second signal (volume surge). If the two are independent, the joint false positive rate drops from 10% to 1%. Now you have ~5 false positives instead of 529.

Add a third signal (aggressor dominance). Joint false positive: 0.1%. Add a fourth: 0.01%. At score-3, you are looking at events where three independent anomalies coincide — the probability of coincidence is vanishingly small.

💼 Generalization: IoT & Fraud Detection

This funnel architecture is not specific to markets. The same pattern applies to any multi-sensor anomaly detection: IoT sensor arrays (temperature + vibration + power draw), fraud detection (amount + velocity + geolocation + device fingerprint), cybersecurity (packet rate + port scan + payload signature). One signal lies. Four signals convict.

🔧 Engineer Track

The funnel is composable: adding a fifth signal (e.g., a PIN-style informed-flow estimator, or spread compression) costs near-zero compute but further reduces false positives. The architecture is designed for extensibility — each signal is a module that produces a binary flag per instrument-window.

— Nate Silver, The Signal and the Noise: Why So Many Predictions Fail — but Some Don't (Penguin Press, 2012), Ch 8

The Method

The scoring engine — modular, fast, and extensible.

Each signal is an independent module. The composite score is a simple sum. The power is not in complexity but in the combination of orthogonal evidence.

                # Pseudocode — multi-signal surveillance scoring

                for each instrument in universe:

                  window = trades.filter(sym, time_range)

                  score = 0

                  # Signal 1: Abnormal return

                  ret = (last(price) / first(price) - 1) * 100

                  if abs(ret) > 5.0: score += 1

                  # Signal 2: Volume surge

                  vol_ratio = sum(qty) / rolling_avg(daily_vol, 20)

                  if vol_ratio > 3.0: score += 1

                  # Signal 3: Aggressor dominance

                  buy_pct = mean(aggressor == "buy")

                  if buy_pct > 0.70: score += 1

                  # Signal 4: Trade size decline

                  size_early = avg(qty, first_half)

                  size_late = avg(qty, second_half)

                  if size_late / size_early < 0.5: score += 1

                  if score >= 2:

                    alert(instrument, score, ret, vol_ratio, buy_pct)

📊 Analyst Track

The thresholds (5%, 3×, 70%, 50%) are calibrated from the empirical distribution of the dataset. They are not arbitrary — each represents approximately the 95th percentile of its respective metric. Changing the thresholds shifts the precision/recall trade-off: tighter thresholds catch fewer manipulations but with higher confidence.

— Harris, Trading and Exchanges

Beyond Markets

The funnel works everywhere event streams meet anomaly detection.

The multi-signal scoring pattern is domain-agnostic. Any system that monitors high-frequency event streams for anomalies benefits from the same architecture.

💼 IoT / Manufacturing

Signal 1: temperature exceeds threshold. Signal 2: vibration frequency shifts. Signal 3: power draw spikes. Signal 4: product defect rate increases. One signal = routine maintenance check. Three signals = stop the line. Same funnel, different data.

📊 Financial Fraud

Signal 1: transaction amount exceeds pattern. Signal 2: velocity exceeds normal. Signal 3: new device or geolocation. Signal 4: recipient is flagged entity. Banks that use single-signal rules block 70% of legitimate transactions. Multi-signal scoring cuts false positives by 90%.

🔧 Cybersecurity

Signal 1: unusual packet rate. Signal 2: port scan detected. Signal 3: payload signature match. Signal 4: connection from known-bad IP range. SOC teams drown in single-signal alerts — thousands per day. Multi-signal scoring surfaces the 5 that matter.

💼 Healthcare Monitoring

Signal 1: heart rate anomaly. Signal 2: blood pressure deviation. Signal 3: oxygen saturation drop. Signal 4: activity pattern change. Each alarm alone triggers alarm fatigue. Combined scoring directs nurse attention to the patient most likely deteriorating.

"One signal lies. Two signals suggest. Three signals convict. The architecture is the same whether you are catching market manipulation, predicting machine failure, or detecting fraud — independent signals, Bayesian combination, exponential false-positive decay."

Honest Caveats

What this analysis does not claim.

Read these before citing the numbers.

Sample is 22 trading days. Whether the four-signal architecture generalizes across volatility regimes, earnings seasons, or different market structures is untested.
No regulator-confirmed ground truth. The 214 composite alerts are statistical anomalies. They have not been cross-referenced against SEC Thailand enforcement actions, suspension lists, or subsequent news. The chapter claims ‘candidate’ anomalies, not confirmed manipulation.
The 10× FPR reduction per signal assumes statistical independence. The Bayesian argument (‘10% × 10% × 10% × 10% = 0.01%’) only holds if the four signals fire independently. In practice they correlate — a pump produces price moves AND volume surges AND aggressor dominance AND smaller trade sizes simultaneously, because these are all downstream manifestations of the same coordinated trading behavior. The empirical FPR reduction per added signal lies somewhere between 1× (full correlation) and 10× (full independence). The chapter does not measure where on this spectrum SET’s actual signals sit.
Thresholds are sample-calibrated. The 5% / 3× / 70% / 50% cutoffs come from the 95th percentile of this sample. Out-of-sample thresholds would likely differ.
The 5% return threshold is sample-percentile-based, but its tick-cost differs across stocks. A 5% move on a low-priced stock (1 tick = 60+ bps) is far fewer ticks than on a high-priced stock (1 tick = 25 bps), meaning the threshold captures different kinds of events depending on the stock’s price tier. A tick-normalized threshold would be more uniform but was not implemented in this analysis.
65ms scoring is whole-universe, single-day. Throughput on a multi-day backfill or a real-time streaming pipeline is a different measurement not made in this chapter.

This study uses licensed market data obtained through commercial agreement. Infozense is not affiliated with the Stock Exchange of Thailand. No market data is distributed through this website. This content is for educational and analytical purposes only and does not constitute investment advice.

Next Chapter

The Manipulation Fingerprint

Chapter 8 measures the asymmetric price-impact signature of pumps — how Harris’s prediction shows up at 4.49× on SET.

Coming Soon

← How Much Signal Is in L1?

Ch 7 of 13

Chapter 8 — Coming Soon

Weak Signals,Strong Verdicts.

How many events survive each threshold.

Why multi-signal scoring works

Individual signal thresholds

Four panels. One instrument. +28.6% in 25 minutes.

📊 Analyst Track

194 score-2 alerts. 20 score-3 alerts. Zero score-4.

💼 Business Track

🔧 Engineer Track

If signals were independent, each one would cut the FPR 10×.

💼 Generalization: IoT & Fraud Detection

🔧 Engineer Track

The scoring engine — modular, fast, and extensible.

📊 Analyst Track

The funnel works everywhere event streams meet anomaly detection.

💼 IoT / Manufacturing

📊 Financial Fraud

🔧 Cybersecurity

💼 Healthcare Monitoring

What this analysis does not claim.

Read these before citing the numbers.

The Manipulation Fingerprint

Weak Signals,
Strong Verdicts.