Feature DSL

Feature DSL

πŸ“˜ Feature DSL β€” Authoring Pipelines in Natural Language

Runink’s .dsl files allow data, governance, and domain teams to write declarative pipeline logic in plain English β€” no YAML, no code, just structured steps tied to contracts.

Inspired by Gherkin and feature-driven development, the DSL is intentionally designed to:

  • Align with real-world data contracts
  • Support lineage, compliance, and multi-tenant governance
  • Be editable by non-engineers β€” analysts, stewards, and reviewers

✨ Full Example

Feature: Trade Events – Validation & Compliance

Scenario: Validate and Tag Incoming Financial Trade Events

  Metadata:
    purpose: "Check and tag incoming trade events for compliance and data quality"
    module_layer: "Silver"
    herd: "Finance"
    slo: "99.9%"
    classification: "pii"
    contract: "cdm_trade/fdc3events.contract"
    contract_version: "1.0.0"
    compliance: ["SOX", "GDPR", "PCI-DSS"]
    lineage_tracking: true

  Given: "Arrival of Events"
    - source_type: stream
    - name: "Trade Events Kafka Stream"
    - format: CDM
    - tags: ["live", "trades", "finance"]

  When "Data is received":
    - Decode each trade event using the CDM schema
    - Check for required fields: trade_id, symbol, price, timestamp
    - Mask sensitive values like SSNs, emails, and bank accounts
    - Tag events with classification and region
    - Compare schema to the latest approved contract version

  Then:
    - Send valid records to: snowflake table "Validated Trades Table"
    - Send invalid records to: snowflake table "DLQ for Invalid Trades"
    - Annotate all records with compliance and lineage metadata

  Assertions:
    - At least 1,000 records must be processed
    - Schema drift must not be detected
    - All sensitive fields must pass redaction or tokenization checks

  GoldenTest:
    input: "cdm_trade/fdc3events.input"
    output: "cdm_trade/data/fdc3events.validated.golden"
    validation: strict

  Notifications:
    - On schema failure β†’ alert "alerts/finance_data_validation"
    - On masking failure β†’ alert "alerts/finance_security_breach"

🧠 DSL Concepts

BlockDescription
FeatureHigh-level business intent (group of scenarios)
ScenarioSpecific pipeline run, often tied to a contract version
MetadataTags used for governance, lineage, compliance, and SLOs
GivenDeclares the data source and input assumptions
WhenDescribes logic, transformations, and validations to apply
ThenDeclares output actions β€” writing to sinks, tagging, alerts
AssertionsValidate record counts, masking, schema drift, etc.
GoldenTestPoints to expected inputs/outputs for regression safety
NotificationsAlerts emitted when failures occur during pipeline runs

πŸ” Metadata-Driven Pipelines

Each .dsl is contract-aware and herd-scoped by default.

  • Contracts are declared via contract: and contract_version:
  • SLOs, classification, and compliance tags are baked into Metadata
  • Data lineage and observability are auto-inferred from DSL structure

βœ… DSL Advantages

  • Self-documenting: Reads like an audit policy + data spec
  • Secure-by-default: Every scenario runs inside a herd
  • Governance-friendly: Tracks lineage, policy, SLOs, classification
  • Reproducible: GoldenTest ensures outputs match expectations
  • Declarative: No code, no imperative orchestration logic

πŸ“Ž Tips for Authors

Use thisInstead of
- Mask sensitive values...@step("FieldMasker")
"Validate and Tag...""run pipeline X"
Plain-English assertionsInline test logic
contract: "x.contract"Hardcoded field lists

πŸ“š Related Links


🎯 Final Thought

With Runink DSL, your data pipeline is the spec β€” no translation layers, no wasted doc effort. Write what the pipeline should do, tag it with intent, and Runink turns it into secure, auditable, executable logic.

Let your domain experts lead the way β€” and let infra follow automatically.