PySemantic¶
A lightweight, graph-based Semantic Layer for Python and SQL.
Define metrics once. Generate SQL everywhere.
See It in Action¶
Write a Python query โ get production-ready SQL:
from pysemantic.client import SemanticLayer
sl = SemanticLayer(model_path="./models")
sql = sl.query(
measures=["total_order_price", "total_number_of_distinct_orders"],
dimensions=["customer_state", "customer_city"],
filters=[{"field": "customer_state", "operator": "IN", "value": "('SP', 'RJ')"}],
order_by=["total_order_price DESC"],
limit=10,
)
from pysemantic.client import SemanticLayer
from pysemantic.modeling import Model, Dimension, Measure, Entity, EntityType
order_items = Model(
name="order_items", table="order_items", primary_key="order_item_id",
measures=[
Measure(name="total_order_price", agg="sum", column="price"),
Measure(name="total_number_of_distinct_orders", agg="distinct_count", column="order_id"),
],
entities=[
Entity(name="order_item", entity_type=EntityType.PRIMARY, column="order_item_id"),
Entity(name="order", entity_type=EntityType.FOREIGN, column="order_id"),
],
)
orders = Model(
name="orders", table="orders", primary_key="order_id",
entities=[
Entity(name="order", entity_type=EntityType.PRIMARY, column="order_id"),
Entity(name="customer", entity_type=EntityType.FOREIGN, column="customer_id"),
],
)
customers = Model(
name="customers", table="customers", primary_key="customer_id",
dimensions=[
Dimension(name="customer_city", column="customer_city", dtype="string"),
Dimension(name="customer_state", column="customer_state", dtype="string"),
],
entities=[
Entity(name="customer", entity_type=EntityType.PRIMARY, column="customer_id"),
],
)
sl = SemanticLayer(models=[order_items, orders, customers])
sql = sl.query(
measures=["total_order_price", "total_number_of_distinct_orders"],
dimensions=["customer_state", "customer_city"],
filters=[{"field": "customer_state", "operator": "IN", "value": "('SP', 'RJ')"}],
order_by=["total_order_price DESC"],
limit=10,
)
SELECT
customers.customer_city AS customer_city,
customers.customer_state AS customer_state,
SUM(order_items.price) AS total_order_price,
COUNT(DISTINCT order_items.order_id) AS total_number_of_distinct_orders
FROM order_items
LEFT JOIN orders
ON order_items.order_id = orders.order_id
LEFT JOIN customers
ON orders.customer_id = customers.customer_id
WHERE
customers.customer_state IN ('SP', 'RJ')
GROUP BY
1, 2
ORDER BY
total_order_price DESC
LIMIT 10

Why PySemantic?¶
Single Source of Truth
Define a metric once โ use it across dashboards, notebooks, and APIs. No more scattered business logic.
Automatic Join Resolution
Declare entity relationships in your models. PySemantic finds the shortest join path automatically.
SQL Injection Safe
Structured filters with operator whitelisting and value escaping. No raw string interpolation.
Intelligent Filter Pushdown
Filters are automatically routed to the right CTE or outer query โ conformed, fact-specific, or measure. No manual routing.
Multi-Dialect Support
MySQL (default), Postgres, BigQuery, Snowflake, DuckDB, and 20+ more via SQLGlot. One model, any database.
Zero Infrastructure
Pure Python library. No server, no daemon, no Docker. Install and query.
Interactive Studio
Explore models, visualize the entity graph, and test queries in the browser โ powered by Streamlit.
Architecture¶
PySemantic processes queries through a clean, four-stage pipeline:
User Query (measures, dimensions, filters)
โ
โผ
โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ
โ AST โโโโโโถโ Planner โโโโโโถโGenerator โโโโโถ SQL string
โ (Parser) โ โ(Resolver)โ โ(Compiler)โ
โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ
โ
โโโโโโโดโโโโโโ
โ Registry โ
โ + Entity โ
โ Graph โ
โโโโโโโโโโโโโ
| Layer | Responsibility |
|---|---|
| AST | Parses raw input into a structured, validated syntax tree |
| Registry | Loads model files, validates them, builds the entity graph |
| Planner | Resolves measures/dimensions to models, calculates join paths; detects single-fact vs multi-fact and enforces conformed dimensions |
| Generator | Translates the logical plan into dialect-specific SQL (flat query or CTE-based for multi-fact) |
Ready to define your first semantic model?