Power BI / Looker / Sigma
When the standard reports aren't enough, you want your accounting data flowing into your BI stack — Power BI, Looker, Sigma, Tableau, Metabase, or whatever your team uses. The pattern is straightforward: a daily ETL job pulls from Solid's REST API and pushes to your BI tool's data source.
This recipe walks through the canonical version: a daily Python job that pulls account balances and journal entries, lands them in a warehouse (Postgres, BigQuery, Snowflake, etc.), and your BI tool reads from there.
Why not BI-tool-direct-to-Solid?
Most BI tools support REST API connectors, but going direct has real downsides:
- Every dashboard refresh hits Solid's API (fast for small files, slow for large ones)
- No historical snapshot — yesterday's "AR aging" can't be reproduced
- BI auth lives in the BI tool's connector configuration, often less secure than a service account
- Schema drift between Solid versions can break dashboards mid-refresh
A daily ETL into a warehouse solves all four. The marginal cost is ~50 lines of Python and a cron schedule.
The data model
What flows from Solid into the warehouse, daily:
| Table | Source endpoint | Refresh strategy |
|---|---|---|
accounts | GET /accounts | Full overwrite — small list (under 100 typically) |
contacts | GET /contacts | Full overwrite — also small |
items | GET /items | Full overwrite |
journal_entries | GET /entries?from=YYYY-MM-DD | Incremental since last run |
account_balances_daily | GET /balances?as_of=YYYY-MM-DD | Snapshot per day; never overwritten |
aging_ar_daily | GET /reports/ar-aging?as_of=YYYY-MM-DD | Snapshot per day |
aging_ap_daily | GET /reports/ap-aging?as_of=YYYY-MM-DD | Snapshot per day |
The daily-snapshot tables are what enables "show me how AR aged over the last 6 months" — every day you store a row, your BI tool charts the trend.
Step 1 — Set up the warehouse
Pick a warehouse:
| Warehouse | Best for |
|---|---|
| Postgres | Self-hosted; cheapest at low volume |
| Snowflake | Industry-standard for analytics; best for medium-to-large data |
| BigQuery | Flat per-query pricing; good for sporadic dashboard use |
| DuckDB (local file) | Smallest possible; Python script writes to a .duckdb file your BI tool reads |
For a small business with under 100K transactions/year, DuckDB is enough. The whole "warehouse" is one .duckdb file in S3 (or anywhere). Your BI tool queries it like any other warehouse.
For a larger business or multi-tenant accounting firm, Snowflake or BigQuery is the right answer.
This recipe uses Postgres for concreteness; the SQL is portable.
Step 2 — Define the schemas
CREATE SCHEMA solid;
CREATE TABLE solid.accounts (
id TEXT PRIMARY KEY,
account_number TEXT,
name TEXT NOT NULL,
account_type TEXT NOT NULL,
category TEXT NOT NULL, -- asset, liability, equity, revenue, expense
normal_balance TEXT NOT NULL, -- debit or credit
parent_id TEXT,
currency_code TEXT NOT NULL,
is_active BOOLEAN NOT NULL,
loaded_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE solid.contacts (
id TEXT PRIMARY KEY,
contact_type TEXT NOT NULL, -- customer, vendor, employee
display_name TEXT NOT NULL,
email TEXT,
is_1099 BOOLEAN NOT NULL DEFAULT FALSE,
currency_code TEXT NOT NULL,
is_active BOOLEAN NOT NULL,
loaded_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE solid.journal_entries (
id TEXT PRIMARY KEY,
entry_number TEXT NOT NULL,
transaction_type TEXT NOT NULL,
date DATE NOT NULL,
status TEXT NOT NULL,
contact_id TEXT,
memo TEXT,
source TEXT NOT NULL,
total_cents BIGINT NOT NULL,
loaded_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE solid.journal_entry_lines (
id TEXT PRIMARY KEY,
journal_entry_id TEXT NOT NULL REFERENCES solid.journal_entries(id),
line_number INT NOT NULL,
account_id TEXT NOT NULL REFERENCES solid.accounts(id),
description TEXT,
debit_cents BIGINT NOT NULL DEFAULT 0,
credit_cents BIGINT NOT NULL DEFAULT 0,
contact_id TEXT,
class_id TEXT,
location_id TEXT,
project_id TEXT
);
-- Daily snapshot tables — never overwritten
CREATE TABLE solid.account_balances_daily (
snapshot_date DATE NOT NULL,
account_id TEXT NOT NULL,
balance_cents BIGINT NOT NULL,
PRIMARY KEY (snapshot_date, account_id)
);
CREATE TABLE solid.aging_ar_daily (
snapshot_date DATE NOT NULL,
customer_id TEXT NOT NULL,
bucket TEXT NOT NULL, -- current / 30 / 60 / 90 / over90
balance_cents BIGINT NOT NULL,
PRIMARY KEY (snapshot_date, customer_id, bucket)
);Indexes on journal_entries.date, journal_entry_lines.account_id, and journal_entry_lines.contact_id will speed up the most common BI queries. Add as you observe slow dashboards.
Step 3 — The ETL job
import os
import requests
import psycopg2
from psycopg2.extras import execute_batch
from datetime import date, timedelta
SOLID_API = os.environ["SOLID_API"] # e.g. https://your-solid.example.com/api/v1
SOLID_API_KEY = os.environ["SOLID_API_KEY"]
PG_DSN = os.environ["PG_DSN"]
def solid_get(path, params=None):
r = requests.get(
f"{SOLID_API}{path}",
headers={"X-API-Key": SOLID_API_KEY},
params=params,
timeout=60,
)
r.raise_for_status()
return r.json()["data"]
def upsert_accounts(conn):
accounts = solid_get("/accounts")
with conn.cursor() as cur:
cur.execute("TRUNCATE solid.accounts")
execute_batch(cur, """
INSERT INTO solid.accounts (id, account_number, name, account_type,
category, normal_balance, parent_id, currency_code, is_active)
VALUES (%(id)s, %(account_number)s, %(name)s, %(account_type)s,
%(category)s, %(normal_balance)s, %(parent_id)s, %(currency_code)s,
%(is_active)s)
""", accounts)
conn.commit()
def upsert_contacts(conn):
contacts = solid_get("/contacts")
# ... similar pattern
def incremental_journal_entries(conn):
with conn.cursor() as cur:
cur.execute("SELECT MAX(date) FROM solid.journal_entries")
last = cur.fetchone()[0] or date(2020, 1, 1)
# Pull entries from a few days before last, in case of late-arriving entries
fetch_from = last - timedelta(days=7)
entries = solid_get("/entries", params={"from": fetch_from.isoformat()})
# Upsert by id (entries are immutable but we re-pull to catch reversals)
with conn.cursor() as cur:
for e in entries:
cur.execute("""
INSERT INTO solid.journal_entries (id, entry_number, transaction_type,
date, status, contact_id, memo, source, total_cents)
VALUES (%(id)s, %(entry_number)s, %(transaction_type)s, %(date)s,
%(status)s, %(contact_id)s, %(memo)s, %(source)s, %(total_cents)s)
ON CONFLICT (id) DO UPDATE SET
status = EXCLUDED.status,
memo = EXCLUDED.memo
""", e)
# Lines too — pull and upsert
conn.commit()
def snapshot_balances(conn):
today = date.today()
balances = solid_get("/balances", params={"as_of": today.isoformat()})
with conn.cursor() as cur:
execute_batch(cur, """
INSERT INTO solid.account_balances_daily (snapshot_date, account_id, balance_cents)
VALUES (%s, %s, %s)
ON CONFLICT (snapshot_date, account_id) DO UPDATE
SET balance_cents = EXCLUDED.balance_cents
""", [(today, b["account_id"], b["balance_cents"]) for b in balances])
conn.commit()
def main():
conn = psycopg2.connect(PG_DSN)
upsert_accounts(conn)
upsert_contacts(conn)
incremental_journal_entries(conn)
snapshot_balances(conn)
conn.close()
print("ETL complete")
if __name__ == "__main__":
main()Step 4 — Schedule it
Cron, Airflow, GitHub Actions, AWS EventBridge — pick whatever your stack uses. Daily at 03:00 local time is conventional (after the previous business day, before anyone's at their desk).
0 3 * * * /usr/bin/python3 /opt/solid-etl/run.py >> /var/log/solid-etl.log 2>&1Set up alerting on the cron job's exit code — silent failures are the worst kind of integration bug.
Step 5 — Connect your BI tool
Each BI tool has its own way to add a Postgres connection. The shape:
| Tool | Connection type |
|---|---|
| Power BI | Get Data → Database → PostgreSQL |
| Looker | Database connection → PostgreSQL → service-account credentials |
| Sigma | Connections → New → PostgreSQL |
| Metabase | Add database → PostgreSQL |
| Tableau | Data → New Data Source → PostgreSQL |
Use a read-only service account for the BI tool — separate from the ETL writer account. Standard security hygiene.
Common dashboards
What you'd build first:
- Revenue trend —
journal_entry_linesfiltered to revenue accounts, grouped by month - AR aging trend —
aging_ar_dailygrouped bysnapshot_dateand bucket; line chart over time - Top customers —
journal_entry_linesfiltered to revenue, joined tocontacts, summed and sorted - Cash flow forecast — bills due in next 30/60/90 days vs. AR collectible in next 30/60/90 days
- Cost-of-customer-acquisition — marketing spend (filtered) divided by new customers (
contactsfiltered to created-this-period) - Margin by class/location/project — joins through
journal_entry_lines.class_idetc. to dimensional reporting beyond what Solid's UI does
Performance considerations
For files with multi-million-line GLs, the daily ETL can take 5–30 minutes. Things to optimize:
- Incremental pulls —
from=YYYY-MM-DDquery parameter limits journal-entry pulls to recent activity - Pagination — Solid's API paginates at 500 entries; pull in pages and bulk-insert per page
- Index your warehouse tables — particularly
journal_entry_lines.account_idandjournal_entry_lines.transaction_date - Skip snapshots on closed periods — once a period is closed, the daily snapshots stop changing; you can stop snapshotting it
Variations
DuckDB instead of Postgres
For solo / small teams, replace psycopg2 with the duckdb Python library and write to a local .duckdb file. Same SQL works. Your BI tool can read DuckDB files directly (most do).
Push to Snowflake / BigQuery
Same shape, different connection. Use snowflake-connector-python or google-cloud-bigquery. Both handle batched inserts efficiently.
Real-time via webhooks
Replace cron with a webhook receiver: Solid POSTs to your service whenever a journal entry posts (when outbound webhooks ship). Your service updates the warehouse row. BI tool sees data within seconds. More complex; only worth it for time-sensitive dashboards.
Cross-references
- REST API → Endpoint groups — full list of available endpoints
- Reports module — what's available built-in before you'd build a custom dashboard
- Stripe → Solid Accounting — inbound integration as a different shape