Skip to Content
Analytics StoreQuerying Data

Querying Data

Once the Analytics Store has exported data, you can query it using the DuckDB CLI or any tool that supports Parquet files.

Prerequisites: Installing DuckDB CLI

Follow the official installation guide: DuckDB Installation 

If you’re using DuckLake storage mode, also install the DuckLake extension:

duckdb -c "INSTALL ducklake;"

DuckLake Storage Mode

Querying DuckLake with DuckDB Catalog

# Method 1: Interactive shell cd /path/to/yaci-store duckdb # Then in DuckDB shell: D INSTALL ducklake; D LOAD ducklake; D ATTACH 'ducklake:duckdb:./data/analytics/ducklake.catalog.db' AS analytics (DATA_PATH './data/analytics'); D SELECT COUNT(*) FROM analytics.block; D SELECT * FROM analytics.transaction WHERE date = '2024-01-15' LIMIT 10; # Method 2: Script mode with heredoc cd /path/to/yaci-store duckdb << 'EOF' INSTALL ducklake; LOAD ducklake; ATTACH 'ducklake:duckdb:./data/analytics/ducklake.catalog.db' AS analytics (DATA_PATH './data/analytics'); SELECT COUNT(*) FROM analytics.block; SELECT * FROM analytics.transaction WHERE date = '2024-01-15' LIMIT 10; SELECT epoch, sum(amount) as total_stake FROM analytics.epoch_stake WHERE epoch >= 450 group by epoch ORDER BY epoch; EOF

Note: Always prefix table names with the catalog name (e.g., analytics.block). Querying without the prefix will fail:

  • SELECT * FROM block; — Table not found
  • SELECT * FROM analytics.block; — Correct

Using read_parquet() with DuckLake

You can also query DuckLake-exported data directly via read_parquet(), just like in Parquet mode. Since DuckLake writes the same Parquet files under the export path, you can bypass the catalog and query files directly:

duckdb -c "SELECT * FROM read_parquet('./data/analytics/main/block/**/*.parquet', hive_partitioning=true) LIMIT 10;"

This can be useful for ad-hoc queries or when you don’t need to attach the DuckLake catalog.

Querying DuckLake with PostgreSQL Catalog

# Important: Start from the same directory as yaci-store if using relative DATA_PATH cd /path/to/yaci-store duckdb -c " INSTALL ducklake; LOAD ducklake; ATTACH 'ducklake:postgres:dbname=yaci_store host=localhost port=5432 user=postgres password=pass options=-csearch_path=public' AS ducklake_catalog (DATA_PATH './data/analytics'); -- Query DuckLake tables directly SELECT COUNT(*) FROM ducklake_catalog.block; SELECT * FROM ducklake_catalog.transaction WHERE date = '2024-01-15' LIMIT 10; SELECT epoch, total_stake FROM ducklake_catalog.epoch_stake WHERE epoch >= 450 ORDER BY epoch; "

Tip: If you used a relative export-path, you must run the DuckDB CLI from the same working directory where yaci-store was started. See Known Caveats for details.

Parquet Storage Mode

Querying Parquet Files

When using parquet storage mode, query files directly:

# Query a single partition duckdb -c "SELECT * FROM read_parquet('./data/analytics/block/date=2024-01-15/*.parquet') LIMIT 10;" # Query all partitions with Hive partitioning duckdb -c "SELECT * FROM read_parquet('./data/analytics/block/**/*.parquet', hive_partitioning=true) LIMIT 10;"
Last updated on