Python library guide
This guide covers configuration, backend setup, error handling, and result objects for the featuremesh Python package. For installation and basic usage, see the Python library page.
Configuration
Use set_default() to choose local vs managed registry, SQLite path (local), managed/serving API endpoints, and defaults for the Jupyter magic:
from featuremesh import Registry, set_default
# Transpilation + persistence: LOCAL (default) or MANAGED
set_default("registry", Registry.MANAGED)
# Local mode: SQLite file for persisted features (default ./featuremesh.db)
set_default("local.db_path", "./featuremesh.db")
# Managed batch API (BatchClient when registry=MANAGED)
set_default("managed.host", "https://api.featuremesh.com")
set_default("managed.path", "/v1/featureql")
set_default("managed.timeout", 30)
set_default("managed.verify_ssl", True)
# Serving API (ServingClient)
set_default("serving.host", "http://host.docker.internal:10090")
set_default("serving.path", "/v1/featureql")
set_default("serving.timeout", 30)
set_default("serving.verify_ssl", True)
# Magic defaults (used when a flag is omitted on %%featureql)
set_default("debug_mode", False)
set_default("show_sql", False)
# Get current settings
from featuremesh import get_default, get_all_defaults
debug_mode = get_default("debug_mode")
all_settings = get_all_defaults() Valid keys are exactly those accepted by set_default (see get_all_defaults()). Older registry.* keys were renamed to managed.*.
Local mode and SQL backends
With registry=LOCAL (the default), BatchClient uses the bundled engine and SQLite persistence. Execution is limited to Backend.DUCKDB in this release. To run transpiled SQL on Trino, BigQuery, or DataFusion, set set_default("registry", Registry.MANAGED), pass a project access_token, and provide a sql_executor that matches your chosen Backend.
Backend setup
Each backend needs a sql_executor function that takes a SQL string and returns a Pandas DataFrame. Here are ready-to-use examples for each supported backend.
DuckDB
from featuremesh import BatchClient, Backend, Registry, set_default
import duckdb
set_default("registry", Registry.MANAGED) # backend setup examples use managed mode
# Option 1: Using a persistent connection
_duckdb_conn = None
def get_duckdb_conn(storage_path: str = ":memory:"):
"""Get or create a DuckDB connection."""
global _duckdb_conn
if _duckdb_conn is None:
_duckdb_conn = duckdb.connect(storage_path)
return _duckdb_conn
def query_duckdb(sql: str, storage_path: str = ":memory:"):
"""Execute SQL query and return results as DataFrame."""
conn = get_duckdb_conn(storage_path)
result = conn.sql(sql)
return result.df()
client = BatchClient(
access_token=__YOUR_ACCESS_TOKEN__,
backend=Backend.DUCKDB,
sql_executor=query_duckdb
)
# Option 2: Simple in-memory executor
def simple_duckdb_executor(sql: str):
return duckdb.sql(sql).df()
client = BatchClient(
access_token=__YOUR_ACCESS_TOKEN__,
backend=Backend.DUCKDB,
sql_executor=simple_duckdb_executor
) Trino
from featuremesh import BatchClient, Backend
import pandas as pd
import trino.dbapi
def query_trino(sql: str):
"""Execute SQL query on Trino and return results as DataFrame."""
# Configure your Trino connection details
conn = trino.dbapi.connect(
host="localhost", # or host.docker.internal for docker
port=8080,
user="admin",
catalog="memory",
schema="default"
)
cur = conn.cursor()
cur.execute(sql)
# Fetch results
cols = cur.description
rows = cur.fetchall()
if len(rows) > 0:
df = pd.DataFrame(rows, columns=[col[0] for col in cols])
return df
else:
return pd.DataFrame()
client = BatchClient(
access_token=__YOUR_ACCESS_TOKEN__,
backend=Backend.TRINO,
sql_executor=query_trino
)
# For production with OAuth2 authentication:
import trino.auth
def query_trino_oauth(sql: str):
"""Execute SQL query on Trino with OAuth2 authentication."""
conn = trino.dbapi.connect(
host="trino.your-domain.com",
port=443,
user="your-username",
catalog="your-catalog",
schema="default",
http_scheme="https",
auth=trino.auth.OAuth2Authentication()
)
cur = conn.cursor()
cur.execute(sql)
cols = cur.description
rows = cur.fetchall()
if len(rows) > 0:
return pd.DataFrame(rows, columns=[col[0] for col in cols])
return pd.DataFrame() BigQuery
from featuremesh import BatchClient, Backend
from google.cloud import bigquery
def query_bigquery(sql: str):
"""Execute SQL query on BigQuery and return results as DataFrame."""
client = bigquery.Client(project=__YOUR_PROJECT_ID__)
return client.query(sql).to_dataframe()
client = BatchClient(
access_token=__YOUR_ACCESS_TOKEN__,
backend=Backend.BIGQUERY,
sql_executor=query_bigquery
) Error handling
All operations return result objects with structured error information. Check result.success before accessing the DataFrame:
result = client.query("""
WITH
FEATURE1 := INPUT(BIGINT)
SELECT
FEATURE1 := BIND_VALUES(ARRAY[1, 2, 3]),
FEATURE2 := FEATURE1 * 2
""")
if result.success:
print("Query succeeded!")
print(result.dataframe)
else:
print("Query failed!")
for error in result.errors:
print(f"Error [{error.code}]: {error.message}")
if error.context:
print(f"Context: {error.context}") For richer display in notebooks, use the result helpers:
# Prints errors/warnings/SQL/SLT/debug as requested; returns the dataframe if show_dataframe=True
result.display(show_sql=True, show_slt=False, show_debug=False, show_dataframe=True)
# Markdown helpers (Help / Describe / Validate)
client.help("zip").display()
client.describe("fm.demo").display()
client.validate("SELECT F := 1;").display() Translation only
You can also translate FeatureQL to SQL without executing it — useful for debugging or integrating with other tools:
# Available with BatchClient
featureql_query = """
WITH
FEATURE1 := INPUT(BIGINT)
SELECT
FEATURE1 := BIND_VALUES(ARRAY[1, 2, 3]),
FEATURE2 := FEATURE1 * 2
"""
translate_result = client.translate(featureql_query)
print(translate_result.sql) # Generated SQL
print(translate_result.success) # True if translation succeeded Debug mode
Pass debug_mode=True to see the intermediate translation steps — useful for understanding how FeatureQL resolves dependencies and generates SQL:
result = client.query("""
WITH
FEATURE1 := INPUT(BIGINT)
SELECT
FEATURE1 := BIND_VALUES(ARRAY[1, 2, 3]),
FEATURE2 := FEATURE1 * 2
""", debug_mode=True)
if result.debug_logs:
print(result.debug_logs) Result objects
QueryResult
Returned by BatchClient.query() and ServingClient.query(). Important fields:
success—Truewhen there are no errors and a dataframe was produceddataframe,sql,slt—sltis populated for batch execution (website-style SLT);ServingClientsetsslttoNonetranslate_seconds,execute_seconds— wall time for translation and SQL execution (batch)column_types— list of(name, type)pairs from the API when presenterrors/warnings—ErrorsandWarningscollections (iterate like lists)debug_logs—DebugLogsmapping whendebug_mode=Truedisplay(...),to_dict()— notebook-friendly printing and JSON-serializable export
In Jupyter, %%featureql --hook NAME stores to_dict() in NAME: a plain dict whose "dataframe" value is row records (list[dict]), not a DataFrame. For a real QueryResult / DataFrame, call client.query(...) or use the magic’s return value when --hide-dataframe is omitted (_ / Out[n]). See Python library .
TranslateResult
Returned by BatchClient.translate() only (no translate on ServingClient):
- Same error/warning/debug patterns as above
full_response— raw registry/engine payload when availableclient_type— e.g.BatchClient(local)vsBatchClient(managed)
HelpResult, DescribeResult, ValidateResult
Returned by BatchClient.help(), BatchClient.describe(), and BatchClient.validate(). Each has text (markdown), optional structured row data, display(), and to_dict().
JSON encoding
import json
from featuremesh import FeatureMeshJSONEncoder
json.dumps(result, cls=FeatureMeshJSONEncoder) Test suite
BatchClient.sltest() runs bundled SLT-style checks against documentation CODE_SAMPLE snippets. By default it uses SHOW DOCS (INCLUDE (CONTENT)) with optional where and limit. Pass source= with a custom FeatureQL query that returns NAME and CONTENT columns when you do not want that built-in fetch.
Use labels=[...] with snippet headers skipif and onlyif (sqllogictest-style):
- Active labels are the union of the client’s SQL backend name (always present, lowercased) and every string in
labels. Extralabelsadd tags; they do not replace or hide the backend. So on a DuckDB client,onlyif duckdbstill matches even if you also passlabels=["trino"]—bothduckdbandtrinoare active. To exercise backend-specific snippets, use the matchingBatchClientbackend (or rely onskipif/onlyifagainst the real backend name). skipif X: skip the snippet ifXis in the active set—for exampleskipif trinoskips when the batch client is Trino, orskipif exclude-slt-bugswhen you passlabels=["exclude-slt-bugs"].onlyif: if a snippet declares one or moreonlyiflines, it runs when at least one of those tokens is in the active set (OR across lines). Example:onlyif identified-as-bugis skipped by default and runs only when you pass that string inlabels. Example:onlyif trinoruns when the client backend is Trino.skipifis evaluated only after theonlyifgate passes.
Other keyword arguments include halt_on_fail, quiet_engine_output, and force_no_schema. The return value is a summary dict (passed, failed, skipped, blocked, per-status lists, timing fields). See the method docstring for the full API.
Version
import featuremesh
print(featuremesh.__version__) See also
- Python library — Installation and quick start
- Getting started — Overview of all entry points
- FeatureQL for the Impatient — Quick tour of the language for SQL users
- Demos Docker container — Full environment with notebooks and sample data