Getting Started¶
Get PySemantic running in under 5 minutes.
Installation¶
Requirements
Python 3.11 or higher is required.
Step 1 — Define Your Models¶
Create a directory (e.g. models/) with one Python file per table. Each file declares a Model with its dimensions, measures, and entities.
models/order_items.py¶
from pysemantic.modeling import Model, Dimension, Measure, Entity, EntityType
model = Model(
name="order_items",
table="order_items",
primary_key="order_item_id",
dimensions=[
Dimension(name="product_category", column="product_category", dtype="string"),
],
measures=[
Measure(name="total_order_price", agg="sum", column="price"),
Measure(
name="total_number_of_distinct_orders",
agg="distinct_count",
column="order_id",
),
],
entities=[
Entity(name="order_item", entity_type=EntityType.PRIMARY, column="order_item_id"),
Entity(name="order", entity_type=EntityType.FOREIGN, column="order_id"),
Entity(name="product", entity_type=EntityType.FOREIGN, column="product_id"),
Entity(name="seller", entity_type=EntityType.FOREIGN, column="seller_id"),
],
)
models/orders.py¶
from pysemantic.modeling import Model, Dimension, Measure, Entity, EntityType
model = Model(
name="orders",
table="orders",
primary_key="order_id",
dimensions=[
Dimension(name="order_status", column="order_status", dtype="string"),
],
measures=[
Measure(name="total_orders", agg="count", column="order_id"),
Measure(name="unique_customers", agg="distinct_count", column="customer_id"),
],
entities=[
Entity(name="order", entity_type=EntityType.PRIMARY, column="order_id"),
Entity(name="customer", entity_type=EntityType.FOREIGN, column="customer_id"),
],
)
models/customers.py¶
from pysemantic.modeling import Model, Dimension, Measure, Entity, EntityType
model = Model(
name="customers",
table="customers",
primary_key="customer_id",
dimensions=[
Dimension(name="customer_city", column="customer_city", dtype="string"),
Dimension(name="customer_state", column="customer_state", dtype="string"),
],
measures=[
Measure(name="total_customers", agg="count", column="customer_id"),
],
entities=[
Entity(name="customer", entity_type=EntityType.PRIMARY, column="customer_id"),
],
)
How joins work
PySemantic resolves joins automatically through shared entity names.
orders declares a FOREIGN entity customer → links to customers which declares a PRIMARY entity customer.
No manual join configuration needed.
Step 2 — Query¶
from pysemantic.client import SemanticLayer
sl = SemanticLayer(model_path="./models")
sql = sl.query(
measures=["total_orders"],
dimensions=["customer_city"],
filters=[{"field": "customer_state", "operator": "=", "value": "SP"}],
order_by=["total_orders DESC"],
limit=10,
)
print(sql)
Generated SQL
Joins, table references, WHERE vs HAVING — all resolved automatically from your model definitions.
Core Concepts¶
Model¶
A Model maps to a database table and defines its semantic metadata:
| Component | Purpose | Example |
|---|---|---|
| Dimensions | Columns to group or filter by | customer_city, order_status |
| Measures | Aggregated metrics | SUM(price), COUNT(DISTINCT id) |
| Entities | Relationship keys (PRIMARY / FOREIGN) | order_id, customer_id |
Entities & Join Resolution¶
Entities define how models connect. A PRIMARY entity declares ownership of a concept; a FOREIGN entity references it:
order_items ──(FK: order)──▶ orders ──(FK: customer)──▶ customers
│
├──(FK: seller)──▶ sellers
└──(FK: product)──▶ products
When you query a measure from order_items with a dimension from customers, PySemantic traverses the entity graph and generates the full LEFT JOIN chain automatically.
Supported Aggregations¶
agg value |
SQL output |
|---|---|
sum |
SUM(column) |
count |
COUNT(column) |
avg |
AVG(column) |
min |
MIN(column) |
max |
MAX(column) |
distinct_count |
COUNT(DISTINCT column) |
Filters¶
Filters can be passed as dictionaries:
{"field": "customer_state", "operator": "IN", "value": "('SP', 'RJ')"}
{"field": "total_order_price", "operator": ">", "value": "1000"}
{"field": "order_status", "operator": "IS", "value": None}
Automatic WHERE vs HAVING
Dimension filters go to WHERE. Measure filters go to HAVING. PySemantic handles this automatically.
Supported operators: =, !=, >, <, >=, <=, IN, NOT IN, LIKE, ILIKE, IS, IS NOT
API Reference¶
SemanticLayer¶
| Method | Description |
|---|---|
query(measures, dimensions, filters, order_by, limit) |
Generate a SQL query string |
reload(model_path=None) |
Hot-reload models from disk (useful in notebooks) |
generate_graph(output_file) |
Export an interactive entity graph as HTML |
query() Parameters¶
| Parameter | Type | Required | Description |
|---|---|---|---|
measures |
list[str] |
Yes | Metric names to aggregate |
dimensions |
list[str] |
No | Dimension names to group by |
filters |
list[dict] |
No | Filter conditions as {field, operator, value} dicts |
order_by |
list[str] |
No | Sort columns (append DESC for descending) |
limit |
int |
No | Maximum rows to return |
Next: explore the full CLI and interactive Studio → CLI & Studio