Tuesday, March 18, 2025

Spec-Driven Programming by Example

 Example Based Programming

I've been playing with an idea that builds upon the concept of "programming by example," but with a twist: spec-driven programming by example. Imagine a system where you define the desired data transformation through a simple, human-readable specification, rather than writing code directly.



The Core Idea:

Instead of coding, you write a specification in a markdown file. This specification describes:

  1. The Function Call: A single function call representing the data transformation you want to perform.
  2. Input Tables: A set of input tables, each with a clear description of its columns and data types.
  3. Output Tables: The desired output tables, also with descriptions.
  4. Example Cases: Multiple examples of input and corresponding output data. The more examples, the better, but even a few well-chosen cases can be incredibly powerful.

Example Markdown Spec (Conceptual):

```Markdown
# Data Transformation: Combine Customer and Order Data
## Function Call:
combine_customer_orders(customers, orders)
## Input Tables:
### customers
| Column | Type | Description |
|---|---|---|
| customer_id | int | Unique identifier for each customer. |
| name | string | Customer's full name. |
| city | string | Customer's city of residence. |

### orders
| Column | Type | Description |
|---|---|---|
| order_id | int | Unique identifier for each order. |
| customer_id | int | ID of the customer who placed the order. |
| order_date | date | Date the order was placed. |
| total_amount | float | Total amount of the order. |

## Output Tables:

### customer_orders
| Column | Type | Description |
|---|---|---|
| customer_name | string | Customer's full name. |
| city | string | Customer's city of residence. |
| order_date | date | Date the order was placed. |
| total_amount | float | Total amount of the order. |

## Example Cases:

### Example 1:
**Input: customers**
| customer_id | name | city |
|---|---|---|
| 1 | Alice Smith | New York |
| 2 | Bob Johnson | London |

**Input: orders**
| order_id | customer_id | order_date | total_amount |
|---|---|---|---|
| 101 | 1 | 2023-10-26 | 100.00 |
| 102 | 2 | 2023-10-27 | 50.00 |

**Output: customer_orders**
| customer_name | city | order_date | total_amount |
|---|---|---|---|
| Alice Smith | New York | 2023-10-26 | 100.00 |
| Bob Johnson | London | 2023-10-27 | 50.00 |

### Example 2:
... (More example cases) ...

The Magic:

The system would then analyze this specification and attempt to infer the logic required to transform the input tables into the output tables. The more examples you provide, the more accurate the inference becomes. In cases where the logic is clear, even a few examples might be sufficient.

A marketplace of coder could bid on projects, or give feedback that some logic is missing, etc.

Once enough test cases pass a webservice API spins up and can take real data. We email you the url and you configure your systems to feed it data, or forward it on to your engineering team, etc.



Why This Matters:

  • Accessibility: This approach lowers the barrier to entry for data transformation. Analysts and domain experts, who may not be proficient in programming, can easily define complex transformations.
  • Clarity: The specification serves as a clear and concise documentation of the transformation logic.
  • Flexibility: The system can potentially handle a wide range of transformations, from simple filtering and aggregation to more complex joins and data cleaning.
  • Iterative Development: Adding more examples allows for incremental refinement of the transformation logic.


Challenges and Next Steps:

  • Building a robust inference engine that can handle diverse data transformations.
  • Designing a user-friendly interface for creating and managing specifications.
  • Handling ambiguity and edge cases in the example data.
  • Creating a simple demo.

I've built a few prototypes, but I'm still striving for that "dead simple" demo that truly showcases the power of this concept. My current prototypes use python to parse the markdown, and then use some basic data manipulation libraries to try and match the input to the output.

I believe that spec-driven programming by example has the potential to revolutionize how we work with data. By shifting the focus from writing code to defining specifications, we can empower a wider audience to unlock the insights hidden within their data.

What do you think? I'd love to hear your thoughts and suggestions.