You have data in files, databases, object storage. You need fast analytics. Moving data is expensive, slow, and sometimes not allowed. StarRocks queries it where it sits.
Every organization faces the same analytics dilemma. Data lives in multiple places. Business needs real-time answers. The traditional options all have painful tradeoffs.
ETL pipelines, staging tables, nightly syncs. Expensive infrastructure. Stale by morning. Triple the storage cost. And every copy is a governance liability.
Federated queries across raw files and databases. Technically possible. Practically painful. Minutes per query. Users give up and go back to spreadsheets.
You already have infrastructure — on-prem servers, local cloud providers, hyperscalers, or a mix. You need an analytics engine that runs on what you have, not one that forces you onto a specific platform.
"StarRocks eliminates the tradeoff. It queries data where it sits -- files, lakes, databases -- at sub-second speed. No copying. No compromise. One engine for all three scenarios."
StarRocks is an open-source analytics engine designed from scratch for sub-second queries on large datasets. It reads external data via catalogs, caches intelligently, and speaks the MySQL protocol -- so every tool you already use just works.
Connect to Hive, Iceberg, Delta Lake, Hudi, JDBC sources. Query data in S3, HDFS, or any object store without copying a single byte.
Pre-compute expensive aggregations. Auto-refresh on schedule. Queries transparently rewrite to use cached results. Freshness you control.
For data that benefits from full indexing -- load it into StarRocks native columnar storage. Aggregate, duplicate, or primary key models.
Connect with any MySQL client, BI tool, or driver. No proprietary connectors. No SDK. Your existing tools work on day one.
Column-oriented, SIMD-optimized execution. Pipeline engine processes data in batches, not row-by-row. Built for modern CPUs and large scans.
Apache 2.0 license. No vendor lock-in. Active community. Deploy it yourself, inspect the code, contribute back. Your data, your engine.
S3, HDFS, Hive,
JDBC, Files
Catalog + Cache
+ Local Tables
BI Tools, APIs,
Notebooks, Dashboards
StarRocks does not force a single access pattern. You choose the right mode for each dataset — and you can use all three in the same query.
Zero-copy queries against external data. StarRocks reads Parquet, ORC, or CSV files directly from object storage or HDFS via external catalogs. The data never moves. Ideal for large, regulated, or infrequently queried datasets.
Materialized views on top of external data. StarRocks pre-computes aggregations and caches results locally. Queries auto-rewrite to use cached views when fresh. The best of both: live source data with cached speed.
Full-speed native columnar storage. Load data into StarRocks for maximum performance -- primary key, aggregate, or duplicate models. Sorted columns, zone maps, bloom filters. Sub-second on billions of rows.
"Use all three modes in a single SQL statement. Join a local dimension table with a federated fact table, filtered by a materialized view. One engine. One query. The right mode for each table."
StarRocks runs wherever your data lives. No cloud lock-in. No special hardware. The same binary, the same SQL, the same performance -- whether you deploy on a laptop or a 100-node cluster.
Single-command startup for development and testing. Docker Compose for multi-node local clusters. Perfect for proof-of-concept and CI/CD pipelines.
Helm charts and operator for production deployments. Auto-scaling, rolling upgrades, persistent volumes. Cloud-native from the start.
Traditional deployment on physical servers or virtual machines. Full control over hardware, networking, and storage. Air-gapped environments supported.
CelerData offers a fully managed StarRocks service on major clouds. Same engine, zero ops. Optional for teams that prefer managed infrastructure.
"Start on Docker. Test on Kubernetes. Deploy to bare metal in a regulated data center. Migrate to cloud next year. The engine doesn't care. Your SQL doesn't change."
Anyone can download StarRocks. The difference is knowing how to deploy it, tune it, connect it to your data, and build the platform around it. That's what IZ engineers deliver.
We map your data landscape -- every source, format, location, and access pattern. External catalogs configured. Connections tested. Schema registered. Data stays where it is.
We design the query layer -- which tables are federated, which are cached, which are local. Materialized views tuned. Partitioning strategy set. Performance benchmarked against your actual queries.
We deploy the engine, connect it to your BI tools, build the dashboards, set up monitoring, and train your team. Production-ready. Documented. Handed over with confidence.
"Collect. Think. Act. The engine handles the queries. We engineer the knowledge -- the catalog design, the caching strategy, the deployment architecture, and the platform that makes it all work."