> ## Documentation Index
> Fetch the complete documentation index at: https://docs.flipsidecrypto.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Query Optimization

> Best practices for writing efficient queries against Flipside data shares

Flipside's blockchain data is optimized for performance through strategic clustering and indexing.
Understanding how tables are organized will help you write faster, more cost-effective queries.

## How Flipside clusters data

All Flipside tables are **clustered on `block_timestamp`** (or equivalent time field).

**What this means:**

* Queries filtering by time are extremely fast
* Data is physically organized by timestamp on disk
* Snowflake can skip irrelevant micro-partitions

**Always filter by time** for optimal performance.

## Understanding timestamp fields

Flipside tables include multiple timestamp fields:

### `block_timestamp`

The timestamp when the block was produced on the blockchain.

**Use for:**

* Time-series analysis
* Historical queries
* Filtering by when events occurred on-chain

**Clustered:** ✅ Yes (primary clustering key)

```sql theme={null}
-- ✅ GOOD: Clustered query
SELECT *
FROM ethereum.core.fact_transactions
WHERE block_timestamp >= '2024-01-01'
  AND block_timestamp < '2024-02-01';
```

### `_inserted_timestamp`

The timestamp when the row was first inserted into Flipside's database.

**Use for:**

* Identifying newly added data
* Incremental data loads
* Data pipeline orchestration

**Clustered:** ❌ No

```sql theme={null}
-- Use for incremental loads
SELECT *
FROM ethereum.core.fact_transactions
WHERE _inserted_timestamp >= DATEADD(hour, -1, CURRENT_TIMESTAMP());
```

### `_modified_timestamp`

The timestamp when the row was last modified (for updates or corrections).

**Use for:**

* Detecting data corrections
* Change data capture (CDC)
* Audit trails

**Clustered:** ❌ No

<Warning>
  **Performance tip:** Always include a `block_timestamp` filter, even when filtering on
  `_inserted_timestamp` or `_modified_timestamp`. This leverages the clustering.
</Warning>

## Query optimization best practices

### 1. Always filter by time

<CodeGroup>
  ```sql Good - Uses clustering theme={null}
  SELECT
      block_number,
      tx_hash,
      from_address,
      to_address
  FROM ethereum.core.fact_transactions
  WHERE block_timestamp >= CURRENT_DATE - 7
    AND to_address = '0x...'
  LIMIT 1000;
  ```

  ```sql Bad - No time filter theme={null}
  SELECT
      block_number,
      tx_hash,
      from_address,
      to_address
  FROM ethereum.core.fact_transactions
  WHERE to_address = '0x...'
  LIMIT 1000;
  ```
</CodeGroup>

**Why it matters:**

* Without time filter: Scans entire table (billions of rows)
* With time filter: Scans only relevant partitions (millions of rows)
* Result: 10-100x faster queries, lower costs

### 2. Use specific time ranges

Narrow time ranges = faster queries.

```sql theme={null}
-- ✅ Best: Specific range
WHERE block_timestamp >= '2024-01-01'
  AND block_timestamp < '2024-01-08'

-- ⚠️ OK: Last N days
WHERE block_timestamp >= CURRENT_DATE - 30

-- ❌ Avoid: Open-ended queries
WHERE block_timestamp >= '2020-01-01'
```

### 3. Limit result sets

Use `LIMIT` to constrain output:

```sql theme={null}
-- For exploration
SELECT *
FROM ethereum.core.fact_transactions
WHERE block_timestamp >= CURRENT_DATE - 1
LIMIT 1000;

-- For production queries
SELECT *
FROM ethereum.core.fact_transactions
WHERE block_timestamp >= CURRENT_DATE - 30
  AND success = TRUE
ORDER BY block_timestamp DESC
LIMIT 10000;
```

### 4. Filter early, aggregate late

Push filters down before joins or aggregations:

<CodeGroup>
  ```sql Good - Filter first theme={null}
  WITH recent_txs AS (
      SELECT *
      FROM ethereum.core.fact_transactions
      WHERE block_timestamp >= CURRENT_DATE - 7
        AND success = TRUE
  )
  SELECT
      DATE_TRUNC('day', block_timestamp) AS date,
      COUNT(*) AS tx_count
  FROM recent_txs
  GROUP BY 1;
  ```

  ```sql Bad - Filter after aggregation theme={null}
  SELECT
      date,
      tx_count
  FROM (
      SELECT
          DATE_TRUNC('day', block_timestamp) AS date,
          COUNT(*) AS tx_count
      FROM ethereum.core.fact_transactions
      GROUP BY 1
  )
  WHERE date >= CURRENT_DATE - 7;
  ```
</CodeGroup>

### 5. Use appropriate join strategies

When joining multiple tables, ensure both have time filters:

```sql theme={null}
SELECT
    t.tx_hash,
    t.from_address,
    s.amount_usd
FROM ethereum.core.fact_transactions t
JOIN ethereum.defi.ez_dex_swaps s
    ON t.tx_hash = s.tx_hash
WHERE t.block_timestamp >= CURRENT_DATE - 7  -- Filter on both tables
  AND s.block_timestamp >= CURRENT_DATE - 7
LIMIT 10000;
```

### 6. Leverage materialized CTEs for complex queries

For queries with multiple CTEs, consider using `MATERIALIZED`:

```sql theme={null}
WITH base_data AS MATERIALIZED (
    SELECT
        block_timestamp,
        from_address,
        to_address,
        eth_value
    FROM ethereum.core.fact_transactions
    WHERE block_timestamp >= CURRENT_DATE - 30
      AND eth_value > 0
)
SELECT
    from_address,
    COUNT(*) AS tx_count,
    SUM(eth_value) AS total_eth
FROM base_data
GROUP BY 1
HAVING total_eth > 100
ORDER BY total_eth DESC;
```

## Common query patterns

### Time-series aggregations

```sql theme={null}
-- Daily transaction counts
SELECT
    DATE_TRUNC('day', block_timestamp) AS date,
    COUNT(*) AS tx_count,
    COUNT(DISTINCT from_address) AS unique_senders
FROM ethereum.core.fact_transactions
WHERE block_timestamp >= CURRENT_DATE - 30
GROUP BY 1
ORDER BY 1;
```

### Address-specific queries

```sql theme={null}
-- All activity for an address
SELECT
    block_timestamp,
    tx_hash,
    from_address,
    to_address,
    eth_value
FROM ethereum.core.fact_transactions
WHERE block_timestamp >= CURRENT_DATE - 90
  AND (from_address = LOWER('0x...')
   OR to_address = LOWER('0x...'))
ORDER BY block_timestamp DESC;
```

<Tip>
  **Case sensitivity:** Ethereum addresses are case-insensitive. Use `LOWER()` for comparisons to
  ensure matches.
</Tip>

### Token transfer analysis

```sql theme={null}
-- USDC transfers over \$10K
SELECT
    block_timestamp,
    from_address,
    to_address,
    amount,
    amount_usd
FROM ethereum.core.ez_token_transfers
WHERE block_timestamp >= CURRENT_DATE - 7
  AND contract_address = LOWER('0xA0b86991c6218b36c1d19D4a2e9Eb0cE3606eB48') -- USDC
  AND amount_usd > 10000
ORDER BY amount_usd DESC
LIMIT 1000;
```

### DEX swap analysis (Premium)

```sql theme={null}
-- Top Uniswap swaps by USD value
SELECT
    block_timestamp,
    tx_hash,
    trader,
    token_in,
    token_out,
    amount_in_usd,
    amount_out_usd
FROM ethereum.defi.ez_dex_swaps
WHERE block_timestamp >= CURRENT_DATE - 1
  AND platform = 'uniswap-v3'
  AND amount_in_usd > 50000
ORDER BY amount_in_usd DESC
LIMIT 100;
```

## Performance monitoring

### Check query performance

Use Snowflake's query profile to identify bottlenecks:

```sql theme={null}
-- Check your query history
SELECT
    query_id,
    query_text,
    execution_time,
    bytes_scanned,
    rows_produced
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY())
WHERE query_text ILIKE '%ethereum%'
ORDER BY start_time DESC
LIMIT 10;
```

### Estimate query cost

```sql theme={null}
-- See bytes scanned (proxy for cost)
SELECT
    query_id,
    query_text,
    bytes_scanned / POWER(1024, 3) AS gb_scanned,
    execution_time / 1000 AS seconds
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY())
WHERE start_time >= DATEADD(hour, -1, CURRENT_TIMESTAMP())
ORDER BY bytes_scanned DESC;
```

## Avoiding common pitfalls

<AccordionGroup>
  <Accordion title="SELECT * from large tables" icon="asterisk">
    **Problem:** Selecting all columns scans unnecessary data

    **Solution:** Only select columns you need

    ```sql theme={null}
    -- ❌ Bad
    SELECT * FROM ethereum.core.fact_transactions
    WHERE block_timestamp >= CURRENT_DATE - 7;

    -- ✅ Good
    SELECT
        block_timestamp,
        tx_hash,
        from_address,
        to_address
    FROM ethereum.core.fact_transactions
    WHERE block_timestamp >= CURRENT_DATE - 7;
    ```
  </Accordion>

  <Accordion title="Missing time filters on joins" icon="calendar-xmark">
    **Problem:** Joining without time constraints scans entire tables

    **Solution:** Add time filters to both sides of the join

    ```sql theme={null}
    -- ✅ Good
    SELECT t.*, s.amount_usd
    FROM ethereum.core.fact_transactions t
    JOIN ethereum.defi.ez_dex_swaps s
        ON t.tx_hash = s.tx_hash
    WHERE t.block_timestamp >= CURRENT_DATE - 7
      AND s.block_timestamp >= CURRENT_DATE - 7;
    ```
  </Accordion>

  <Accordion title="Open-ended time ranges" icon="infinity">
    **Problem:** Querying years of data when you only need recent data

    **Solution:** Use explicit, narrow time ranges

    ```sql theme={null}
    -- ❌ Bad: Scans years of data
    WHERE block_timestamp >= '2020-01-01'

    -- ✅ Good: Scans only what you need
    WHERE block_timestamp >= CURRENT_DATE - 30
    ```
  </Accordion>

  <Accordion title="No LIMIT on exploratory queries" icon="infinity">
    **Problem:** Returning millions of rows accidentally

    **Solution:** Always use LIMIT when exploring

    ```sql theme={null}
    -- ✅ Good for exploration
    SELECT *
    FROM ethereum.core.fact_transactions
    WHERE block_timestamp >= CURRENT_DATE - 1
    LIMIT 1000;
    ```
  </Accordion>
</AccordionGroup>

## Advanced optimization techniques

### Using search optimization service

For frequent point lookups on specific addresses:

```sql theme={null}
-- Enable search optimization (requires privileges)
ALTER TABLE ethereum.core.fact_transactions
ADD SEARCH OPTIMIZATION ON (from_address, to_address);
```

Contact Flipside if you need specific tables optimized for your use case.

### Incremental processing patterns

For data pipelines, use incremental loads:

```sql theme={null}
-- Track last processed timestamp
CREATE OR REPLACE TABLE my_checkpoint (
    table_name VARCHAR,
    last_processed_timestamp TIMESTAMP_NTZ
);

-- Incremental query
SELECT *
FROM ethereum.core.fact_transactions
WHERE block_timestamp > (
    SELECT last_processed_timestamp
    FROM my_checkpoint
    WHERE table_name = 'fact_transactions'
)
AND block_timestamp <= CURRENT_TIMESTAMP() - INTERVAL '10 minutes' -- Buffer for late-arriving data
ORDER BY block_timestamp;
```

## Next steps

<CardGroup cols={2}>
  <Card title="Use cases" icon="lightbulb" href="/snowflake-data-shares/use-cases">
    See example queries for common patterns
  </Card>

  <Card title="Data freshness" icon="clock" href="/snowflake-data-shares/data-freshness">
    Understand update schedules and latency
  </Card>

  <Card title="Schema reference" icon="book" href="/blockchain-data/schema-overview">
    Explore table structures and organization
  </Card>

  <Card title="Troubleshooting" icon="wrench" href="/snowflake-data-shares/troubleshooting">
    Fix common query issues
  </Card>
</CardGroup>
