Python library guide

This guide covers configuration, backend setup, error handling, and result objects for the featuremesh Python package. For installation and basic usage, see the Python library page.

Configuration

Use set_default() to choose local vs managed registry, SQLite path (local), managed/serving API endpoints, and defaults for the Jupyter magic:

from featuremesh import Registry, set_default

# Transpilation + persistence: LOCAL (default) or MANAGED
set_default("registry", Registry.MANAGED)

# Local mode: SQLite file for persisted features (default ./featuremesh.db)
set_default("local.db_path", "./featuremesh.db")

# Managed batch API (BatchClient when registry=MANAGED)
set_default("managed.host", "https://api.featuremesh.com")
set_default("managed.path", "/v1/featureql")
set_default("managed.timeout", 30)
set_default("managed.verify_ssl", True)

# Serving API (ServingClient)
set_default("serving.host", "http://host.docker.internal:10090")
set_default("serving.path", "/v1/featureql")
set_default("serving.timeout", 30)
set_default("serving.verify_ssl", True)

# Magic defaults (used when a flag is omitted on %%featureql)
set_default("debug_mode", False)
set_default("show_sql", False)

# Get current settings
from featuremesh import get_default, get_all_defaults

debug_mode = get_default("debug_mode")
all_settings = get_all_defaults()
python

Valid keys are exactly those accepted by set_default (see get_all_defaults()). Older registry.* keys were renamed to managed.*.

Local mode and SQL backends

With registry=LOCAL (the default), BatchClient uses the bundled engine and SQLite persistence. Execution is limited to Backend.DUCKDB in this release. To run transpiled SQL on Trino, BigQuery, or DataFusion, set set_default("registry", Registry.MANAGED), pass a project access_token, and provide a sql_executor that matches your chosen Backend.

Backend setup

Each backend needs a sql_executor function that takes a SQL string and returns a Pandas DataFrame. Here are ready-to-use examples for each supported backend.

DuckDB

from featuremesh import BatchClient, Backend, Registry, set_default
import duckdb

set_default("registry", Registry.MANAGED)  # backend setup examples use managed mode

# Option 1: Using a persistent connection
_duckdb_conn = None

def get_duckdb_conn(storage_path: str = ":memory:"):
    """Get or create a DuckDB connection."""
    global _duckdb_conn
    if _duckdb_conn is None:
        _duckdb_conn = duckdb.connect(storage_path)
    return _duckdb_conn

def query_duckdb(sql: str, storage_path: str = ":memory:"):
    """Execute SQL query and return results as DataFrame."""
    conn = get_duckdb_conn(storage_path)
    result = conn.sql(sql)
    return result.df()

client = BatchClient(
    access_token=__YOUR_ACCESS_TOKEN__,
    backend=Backend.DUCKDB,
    sql_executor=query_duckdb
)

# Option 2: Simple in-memory executor
def simple_duckdb_executor(sql: str):
    return duckdb.sql(sql).df()

client = BatchClient(
    access_token=__YOUR_ACCESS_TOKEN__,
    backend=Backend.DUCKDB,
    sql_executor=simple_duckdb_executor
)
python

Trino

from featuremesh import BatchClient, Backend
import pandas as pd
import trino.dbapi

def query_trino(sql: str):
    """Execute SQL query on Trino and return results as DataFrame."""
    # Configure your Trino connection details
    conn = trino.dbapi.connect(
        host="localhost",  # or host.docker.internal for docker
        port=8080,
        user="admin",
        catalog="memory",
        schema="default"
    )
    cur = conn.cursor()
    cur.execute(sql)

    # Fetch results
    cols = cur.description
    rows = cur.fetchall()

    if len(rows) > 0:
        df = pd.DataFrame(rows, columns=[col[0] for col in cols])
        return df
    else:
        return pd.DataFrame()

client = BatchClient(
    access_token=__YOUR_ACCESS_TOKEN__,
    backend=Backend.TRINO,
    sql_executor=query_trino
)

# For production with OAuth2 authentication:
import trino.auth

def query_trino_oauth(sql: str):
    """Execute SQL query on Trino with OAuth2 authentication."""
    conn = trino.dbapi.connect(
        host="trino.your-domain.com",
        port=443,
        user="your-username",
        catalog="your-catalog",
        schema="default",
        http_scheme="https",
        auth=trino.auth.OAuth2Authentication()
    )
    cur = conn.cursor()
    cur.execute(sql)
    cols = cur.description
    rows = cur.fetchall()

    if len(rows) > 0:
        return pd.DataFrame(rows, columns=[col[0] for col in cols])
    return pd.DataFrame()
python

BigQuery

from featuremesh import BatchClient, Backend
from google.cloud import bigquery

def query_bigquery(sql: str):
    """Execute SQL query on BigQuery and return results as DataFrame."""
    client = bigquery.Client(project=__YOUR_PROJECT_ID__)
    return client.query(sql).to_dataframe()

client = BatchClient(
    access_token=__YOUR_ACCESS_TOKEN__,
    backend=Backend.BIGQUERY,
    sql_executor=query_bigquery
)
python

Error handling

All operations return result objects with structured error information. Check result.success before accessing the DataFrame:

result = client.query("""
    WITH
        FEATURE1 := INPUT(BIGINT)
    SELECT
        FEATURE1 := BIND_VALUES(ARRAY[1, 2, 3]),
        FEATURE2 := FEATURE1 * 2
""")

if result.success:
    print("Query succeeded!")
    print(result.dataframe)
else:
    print("Query failed!")
    for error in result.errors:
        print(f"Error [{error.code}]: {error.message}")
        if error.context:
            print(f"Context: {error.context}")
python

For richer display in notebooks, use the result helpers:

# Prints errors/warnings/SQL/SLT/debug as requested; returns the dataframe if show_dataframe=True
result.display(show_sql=True, show_slt=False, show_debug=False, show_dataframe=True)

# Markdown helpers (Help / Describe / Validate)
client.help("zip").display()
client.describe("fm.demo").display()
client.validate("SELECT F := 1;").display()
python

Translation only

You can also translate FeatureQL to SQL without executing it — useful for debugging or integrating with other tools:

# Available with BatchClient
featureql_query = """
    WITH
        FEATURE1 := INPUT(BIGINT)
    SELECT
        FEATURE1 := BIND_VALUES(ARRAY[1, 2, 3]),
        FEATURE2 := FEATURE1 * 2
"""
translate_result = client.translate(featureql_query)

print(translate_result.sql)      # Generated SQL
print(translate_result.success)  # True if translation succeeded
python

Debug mode

Pass debug_mode=True to see the intermediate translation steps — useful for understanding how FeatureQL resolves dependencies and generates SQL:

result = client.query("""
    WITH
        FEATURE1 := INPUT(BIGINT)
    SELECT
        FEATURE1 := BIND_VALUES(ARRAY[1, 2, 3]),
        FEATURE2 := FEATURE1 * 2
""", debug_mode=True)

if result.debug_logs:
    print(result.debug_logs)
python

Result objects

QueryResult

Returned by BatchClient.query() and ServingClient.query(). Important fields:

  • successTrue when there are no errors and a dataframe was produced
  • dataframe, sql, sltslt is populated for batch execution (website-style SLT); ServingClient sets slt to None
  • translate_seconds, execute_seconds — wall time for translation and SQL execution (batch)
  • column_types — list of (name, type) pairs from the API when present
  • errors / warningsErrors and Warnings collections (iterate like lists)
  • debug_logsDebugLogs mapping when debug_mode=True
  • display(...), to_dict() — notebook-friendly printing and JSON-serializable export

In Jupyter, %%featureql --hook NAME stores to_dict() in NAME: a plain dict whose "dataframe" value is row records (list[dict]), not a DataFrame. For a real QueryResult / DataFrame, call client.query(...) or use the magic’s return value when --hide-dataframe is omitted (_ / Out[n]). See Python library .

TranslateResult

Returned by BatchClient.translate() only (no translate on ServingClient):

  • Same error/warning/debug patterns as above
  • full_response — raw registry/engine payload when available
  • client_type — e.g. BatchClient(local) vs BatchClient(managed)

HelpResult, DescribeResult, ValidateResult

Returned by BatchClient.help(), BatchClient.describe(), and BatchClient.validate(). Each has text (markdown), optional structured row data, display(), and to_dict().

JSON encoding

import json
from featuremesh import FeatureMeshJSONEncoder

json.dumps(result, cls=FeatureMeshJSONEncoder)
python

Test suite

BatchClient.sltest() runs bundled SLT-style checks against documentation CODE_SAMPLE snippets. By default it uses SHOW DOCS (INCLUDE (CONTENT)) with optional where and limit. Pass source= with a custom FeatureQL query that returns NAME and CONTENT columns when you do not want that built-in fetch.

Use labels=[...] with snippet headers skipif and onlyif (sqllogictest-style):

  • Active labels are the union of the client’s SQL backend name (always present, lowercased) and every string in labels. Extra labels add tags; they do not replace or hide the backend. So on a DuckDB client, onlyif duckdb still matches even if you also pass labels=["trino"]—both duckdb and trino are active. To exercise backend-specific snippets, use the matching BatchClient backend (or rely on skipif / onlyif against the real backend name).
  • skipif X: skip the snippet if X is in the active set—for example skipif trino skips when the batch client is Trino, or skipif exclude-slt-bugs when you pass labels=["exclude-slt-bugs"].
  • onlyif: if a snippet declares one or more onlyif lines, it runs when at least one of those tokens is in the active set (OR across lines). Example: onlyif identified-as-bug is skipped by default and runs only when you pass that string in labels. Example: onlyif trino runs when the client backend is Trino. skipif is evaluated only after the onlyif gate passes.

Other keyword arguments include halt_on_fail, quiet_engine_output, and force_no_schema. The return value is a summary dict (passed, failed, skipped, blocked, per-status lists, timing fields). See the method docstring for the full API.

Version

import featuremesh
print(featuremesh.__version__)
python

See also

Last update at: 2026/04/27 15:40:31