Introduction
This guide provides a complete reference for PUT annotation syntax. It covers all annotation formats, multi-language support, multiline annotations, and best practices.
New to putior? Start with the Quick Start guide to create your first diagram in 2 minutes.
PUT stands for PUT + Input + Output + R, reflecting the package’s core purpose: tracking data inputs and outputs through your analysis pipeline using special annotations.
Annotation Basics
PUT annotations are special comments that describe workflow nodes. Start simple:
Minimal annotation (just a label):
# put label:"Load Data"
That’s all you need! putior will: - Auto-generate a unique ID -
Default node_type to "process" - Default
output to the filename
Add more detail as needed:
# put label:"Load Data", node_type:"input", output:"data.csv"
Full R script example:
# data_processing.R
# put label:"Load Customer Data", node_type:"input", output:"raw_data.csv"
# Your actual code
data <- read.csv("customer_data.csv")
write.csv(data, "raw_data.csv")
# put label:"Clean and Validate", input:"raw_data.csv", output:"clean_data.csv"
# Data cleaning code
cleaned_data <- data %>%
filter(!is.na(customer_id)) %>%
mutate(purchase_date = as.Date(purchase_date))
write.csv(cleaned_data, "clean_data.csv")
Python script example:
# analysis.py
# put id:"analyze_sales", label:"Sales Analysis", node_type:"process", input:"clean_data.csv", output:"sales_report.json"
import pandas as pd
import json
# Load cleaned data
data = pd.read_csv("clean_data.csv")
# Perform analysis
sales_summary = {
"total_sales": data["amount"].sum(),
"avg_order": data["amount"].mean(),
"customer_count": data["customer_id"].nunique()
}
# Save results
with open("sales_report.json", "w") as f:
json.dump(sales_summary, f)
Resulting diagram from both files:
flowchart TD
load_data(["Load Customer Data"])
clean_data["Clean and Validate"]
analyze_sales["Sales Analysis"]
%% Connections
load_data --> clean_data
clean_data --> analyze_sales
%% Styling
classDef inputStyle fill:#dbeafe,stroke:#2563eb,stroke-width:2px,color:#1e40af
class load_data inputStyle
classDef processStyle fill:#ede9fe,stroke:#7c3aed,stroke-width:2px,color:#5b21b6
class clean_data processStyle
class analyze_sales processStyle
Extracting Annotations
Use the put() function to scan your files and extract
workflow information:
# Scan all R and Python files in a directory
workflow <- put("./src/")
# View the extracted workflow
print(workflow)The output is a data frame where each row represents a workflow node:
| Column | Description |
|---|---|
file_name |
Which script contains this node |
file_type |
Programming language (r, py, sql, etc.) |
id |
Unique identifier for the node |
label |
Human-readable description |
node_type |
Type of operation (input, process, output) |
input |
Files consumed by this step |
output |
Files produced by this step |
Custom properties you define are also included as additional columns.
Complete Syntax Reference
Basic Format
The general syntax for PUT annotations is:
# put property1:"value1", property2:"value2", property3:"value3"
Flexible Syntax Options
PUT annotations support several formats to fit different coding styles:
# put id:"my_node", label:"My Process" # Standard format (matches logo)
#put id:"my_node", label:"My Process" # Also valid (no space)
# put| id:"my_node", label:"My Process" # Pipe separator
# put id:'my_node', label:'Single quotes' # Single quotes
# put id:"my_node", label:'Mixed quotes' # Mixed quote styles
Multiline Annotations
For complex annotations with many properties, use backslash
(\) continuation:
R/Python style:
# put id:"complex_etl", \
# label:"Complex ETL Process", \
# node_type:"process", \
# input:"raw_data.csv, config.yaml", \
# output:"processed.parquet", \
# author:"Data Team", \
# version:"2.0"SQL style:
--put id:"load_customers", \
-- label:"Load Customer Data", \
-- node_type:"input", \
-- output:"customers_table"
SELECT * FROM raw_customers;JavaScript/TypeScript style:
//put id:"api_handler", \
// label:"Process API Request", \
// input:"request.json", \
// output:"response.json"Rules for multiline annotations:
- End each line (except the last) with a backslash
\ - Start continuation lines with the same comment prefix
- Continuation lines can have leading whitespace for readability
- Properties can span multiple lines
- The backslash must be the last character on the line (no trailing spaces)
Example with many properties:
# put id:"train_model", \
# label:"Train Random Forest Model", \
# node_type:"process", \
# input:"features.csv, labels.csv", \
# output:"model.rds, metrics.json", \
# group:"machine_learning", \
# stage:"3", \
# estimated_time:"45min", \
# memory_intensive:"true"When Multiline Annotations Don’t Work:
- Trailing spaces: Ensure backslash is the last character (no spaces after)
- Missing prefix: Each continuation line needs the comment prefix (
#,--,//)- Fallback: If multiline fails, use a single long line - readability is secondary to functionality
- Debug: Use
set_putior_log_level("DEBUG")to see exactly how lines are being parsed
Multi-Language Support
putior automatically uses the correct comment prefix based on file extension:
| Comment Style | Languages | Extensions |
|---|---|---|
# put |
R, Python, Shell, Julia, Ruby, YAML |
.R, .py, .sh,
.jl, .rb, .yaml
|
-- put |
SQL, Lua, Haskell |
.sql, .lua,
.hs
|
// put |
JavaScript, TypeScript, C, Java, Go, Rust |
.js, .ts, .c,
.java, .go, .rs
|
% put |
MATLAB, LaTeX |
.m, .tex
|
SQL Example:
-- query.sql
--put id:"load_data", label:"Load Customer Data", output:"customers"
SELECT * FROM customers WHERE active = 1;
JavaScript Example:
// process.js
//put id:"transform", label:"Transform JSON", input:"data.json", output:"output.json"
const transformed = data.map(item => process(item));
MATLAB Example:
% analysis.m
%put id:"compute", label:"Statistical Analysis", input:"data.mat", output:"results.mat"
results = compute_statistics(data);
Block Comments
For languages with block comment support (JavaScript, TypeScript, C,
C++, Java, Go, Rust, and other //-prefix languages), PUT
annotations can also appear inside /* ... */ and
/** ... */ block comments. Use a * line
prefix:
JSDoc-style (recommended for JS/TS):
/**
* put id:"load", label:"Load Data", node_type:"input"
*/
function loadData() { return fetch('/api/data'); }
C-style block comment:
/*
* put id:"init", label:"Initialize System"
*/
void init() {}
Single-line block comment:
/* put id:"quick", label:"Quick Operation" */
const x = transform(data);
Multiple annotations can appear in one block:
/**
* put id:"step_a", label:"Step A"
* put id:"step_b", label:"Step B"
*/
Both single-line (//) and block (/* */)
annotations can coexist in the same file. Languages without block
comment syntax (R, Python, SQL, etc.) continue to use their single-line
prefix only.
Core Properties
While putior accepts any properties you define, these are commonly used:
| Property | Purpose | Example Values |
|---|---|---|
id |
Unique identifier |
"load_data", "process_sales"
|
label |
Human description | "Load Customer Data" |
node_type |
Operation type |
"input", "process",
"output"
|
input |
Input files |
"raw_data.csv", "data/*.json"
|
output |
Output files | "processed_data.csv" |
Standard Node Types
For consistency across projects, use these standard node types:
| Type | Mermaid Shape | Use For |
|---|---|---|
input |
Stadium ([...])
|
Data sources, file loading, API inputs |
process |
Rectangle [...]
|
Data transformation, analysis, computation (default) |
output |
Subroutine [[...]]
|
Report generation, data export, visualization |
decision |
Diamond {...}
|
Conditional logic, branching workflows |
start |
Stadium ([...])
|
Workflow entry point (gets boundary styling) |
end |
Stadium ([...])
|
Workflow exit point (gets boundary styling) |
artifactnodes (cylinder shape) are automatically created byput_diagram(show_artifacts = TRUE)for data files referenced ininput/outputfields. You don’t setnode_type:"artifact"manually.
Visual representation of node types:
flowchart TD
load(["Load Data (input)"])
transform["Transform (process)"]
export[["Export (output)"]]
check{"Validate? (decision)"}
%% Connections
load --> transform
transform --> export
transform --> check
%% Styling
classDef inputStyle fill:#dbeafe,stroke:#2563eb,stroke-width:2px,color:#1e40af
class load inputStyle
classDef processStyle fill:#ede9fe,stroke:#7c3aed,stroke-width:2px,color:#5b21b6
class transform processStyle
classDef outputStyle fill:#dcfce7,stroke:#16a34a,stroke-width:2px,color:#15803d
class export outputStyle
classDef decisionStyle fill:#fef3c7,stroke:#d97706,stroke-width:2px,color:#92400e
class check decisionStyle
Custom Properties
Add any properties you need for visualization or metadata:
# put id:"train_model", label:"Train ML Model", node_type:"process", color:"green", group:"machine_learning", duration:"45min", priority:"high"
These custom properties can be used by visualization tools or workflow management systems.
Advanced Usage
Processing Individual Files
You can process single files instead of entire directories:
# Process a single file
workflow <- put("./scripts/analysis.R")Recursive Directory Scanning
Include subdirectories in your scan:
# Search subdirectories recursively
workflow <- put("./project/", recursive = TRUE)Including Line Numbers
For debugging annotation issues, include line numbers:
# Include line numbers for debugging
workflow <- put("./src/", include_line_numbers = TRUE)Automatic ID Generation
If you omit the id field, putior will automatically
generate a unique UUID:
# Annotations without explicit IDs get auto-generated UUIDs
# put label:"Load Data", node_type:"input", output:"data.csv"
# put label:"Process Data", node_type:"process", input:"data.csv", output:"clean.csv"
# Extract workflow - IDs will be auto-generated
workflow <- put("./")
print(workflow$id) # Will show UUIDs like "a1b2c3d4-e5f6-7890-abcd-ef1234567890"Note: If you provide an empty id (e.g.,
id:""), you’ll get a validation warning.
Automatic Output Defaulting
If you omit the output field, putior automatically uses
the file name as the output:
# In process_data.R:
# put label:"Process Step", node_type:"process", input:"raw.csv"
# No output specified - will default to "process_data.R"
# In analyze_data.R:
# put label:"Analyze", node_type:"process", input:"process_data.R", output:"results.csv"
# This creates a connection from process_data.R to analyze_data.RThis feature ensures that scripts can be connected in workflows even when explicit output files aren’t specified.
Tracking Source Relationships
When you have scripts that source other scripts, use this annotation pattern:
# In main.R (sources other scripts):
# put label:"Main Analysis", input:"load_data.R,process_data.R", output:"report.pdf"
source("load_data.R") # Reading load_data.R into main.R
source("process_data.R") # Reading process_data.R into main.R
# In load_data.R (sourced by main.R):
# put label:"Data Loader", node_type:"input"
# output defaults to "load_data.R"
# In process_data.R (sourced by main.R, depends on load_data.R):
# put label:"Data Processor", input:"load_data.R"
# output defaults to "process_data.R"This correctly shows the flow: sourced scripts are inputs to the main script.
Variable References with .internal Extension
putior supports tracking in-memory variables and objects using the
.internal extension. This is useful for documenting
computational steps within scripts while maintaining clear data flow
between scripts.
Key Concepts
.internal variables: - Represent
in-memory objects during script execution - Can only be
outputs, never inputs between scripts - Help document
what variables are created within each script - Example:
my_data.internal represents a variable named
my_data
Persistent files: - Enable actual data flow between
scripts - Can be both inputs and outputs - Required for connected
workflows - Example: my_data.RData,
results.csv
Correct Usage Pattern
# Script 1: Create variable and save it
# put id:"create_data", output:"dataset.internal, dataset.RData"
dataset <- data.frame(x = 1:100, y = rnorm(100))
save(dataset, file = "dataset.RData")
# Script 2: Load data and create new variables
# put id:"analyze_data", input:"dataset.RData", output:"analysis.internal, summary.txt"
load("dataset.RData") # Load the persistent file (NOT dataset.internal)
analysis <- summary(dataset) # Create new in-memory variable
writeLines(capture.output(analysis), "summary.txt")What NOT to Do
# INCORRECT: Using .internal as input between scripts
# put input:"dataset.internal" # This is wrong!
# CORRECT: Use persistent files as inputs
# put input:"dataset.RData" # This is correct!Complete Example
Try the comprehensive variable reference example:
source(system.file("examples", "variable-reference-example.R", package = "putior"))This creates a connected 4-script workflow demonstrating proper
.internal usage and file-based data flow.
Real-World Example
Let’s walk through a complete data science workflow:
1. Data Collection (Python)
# 01_collect_data.py
# put id:"fetch_api_data", label:"Fetch Data from API", node_type:"input", output:"raw_api_data.json"
import requests
import json
response = requests.get("https://api.example.com/sales")
data = response.json()
with open("raw_api_data.json", "w") as f:
json.dump(data, f)
2. Data Processing (R)
# 02_process_data.R
# put id:"clean_api_data", label:"Clean and Structure Data", node_type:"process", input:"raw_api_data.json", output:"processed_sales.csv"
library(jsonlite)
library(dplyr)
# Load raw data
raw_data <- fromJSON("raw_api_data.json")
# Process and clean
processed <- raw_data %>%
filter(!is.na(sale_amount)) %>%
mutate(
sale_date = as.Date(sale_date),
sale_amount = as.numeric(sale_amount)
) %>%
arrange(sale_date)
# Save processed data
write.csv(processed, "processed_sales.csv", row.names = FALSE)
3. Analysis and Reporting (R)
# 03_analyze_report.R
# put id:"sales_analysis", label:"Perform Sales Analysis", node_type:"process", input:"processed_sales.csv", output:"analysis_results.rds"
# put id:"generate_report", label:"Generate HTML Report", node_type:"output", input:"analysis_results.rds", output:"sales_report.html"
library(dplyr)
# Load processed data
sales_data <- read.csv("processed_sales.csv")
# Perform analysis
analysis_results <- list(
total_sales = sum(sales_data$sale_amount),
monthly_trends = sales_data %>%
group_by(month = format(sale_date, "%Y-%m")) %>%
summarise(monthly_total = sum(sale_amount)),
top_products = sales_data %>%
group_by(product) %>%
summarise(product_sales = sum(sale_amount)) %>%
arrange(desc(product_sales)) %>%
head(10)
)
# Save analysis
saveRDS(analysis_results, "analysis_results.rds")
# Generate report
rmarkdown::render("report_template.Rmd",
output_file = "sales_report.html")
Best Practices
1. Use Descriptive Names
Choose clear, descriptive names that explain what each step does:
# Good
# put id:"load_customer_transactions", label:"Load Customer Transaction Data"
# put id:"calculate_monthly_revenue", label:"Calculate Monthly Revenue Totals"
# Less descriptive
# put id:"step1", label:"Load data"
# put id:"process", label:"Do calculations"
2. Document Data Dependencies
Always specify inputs and outputs for data processing steps:
# put id:"merge_datasets", label:"Merge Customer and Transaction Data", input:"customers.csv,transactions.csv", output:"merged_data.csv"
3. Use Consistent Node Types
Stick to a standard set of node types across your team:
# put id:"load_raw_data", label:"Load Raw Sales Data", node_type:"input"
# put id:"clean_data", label:"Clean and Validate", node_type:"process"
# put id:"export_results", label:"Export Final Results", node_type:"output"
4. Add Helpful Metadata
Include metadata that helps with workflow understanding:
# put id:"train_model", label:"Train Random Forest Model", node_type:"process", estimated_time:"30min", requires:"tidymodels", memory_intensive:"true"
5. Group Related Operations
Use grouping properties to organize complex workflows:
# put id:"feature_engineering", label:"Engineer Features", group:"preprocessing", stage:"1"
# put id:"model_training", label:"Train Model", group:"modeling", stage:"2"
# put id:"model_evaluation", label:"Evaluate Model", group:"modeling", stage:"3"
Troubleshooting
Having issues with annotations? See the Troubleshooting Guide for:
- Most Common Issues - Start here for quick solutions
- Annotation Syntax Errors - Quote mismatches, invalid properties
- File Pattern Matching - Files not being scanned
- Debugging with Logging - Enable detailed output
Quick diagnostic:
# Test if your annotation is valid
is_valid_put_annotation('# put id:"test", label:"Test Node"') # Should be TRUESee Also
| Guide | Description |
|---|---|
| Quick Start | Create your first diagram in 2 minutes |
| Quick Reference | Cheat sheet for daily use |
| Features Tour | Auto-detection, logging, interactive diagrams |
| API Reference | Complete function documentation |
| Showcase | Real-world examples (ETL, ML, bioinformatics) |
| Troubleshooting | Common issues and solutions |
Built-in examples:
# Complete workflow example
source(system.file("examples", "reprex.R", package = "putior"))
# Variable reference example
source(system.file("examples", "variable-reference-example.R", package = "putior"))
# Interactive diagrams example
source(system.file("examples", "interactive-diagrams-example.R", package = "putior"))Function help:
-
?put- Extract annotations from files -
?put_diagram- Generate Mermaid diagrams -
?put_auto- Auto-detect workflow from code