Scan R and Python Files for PUT Annotations

Scans source files in a directory for PUT annotations that define workflow nodes, inputs, outputs, and metadata. Supports both R and Python files with flexible annotation syntax including single-line and multiline formats.

Usage

put(
  path,
  pattern = "\\.(R|r|py|sql|sh|jl)$",
  recursive = FALSE,
  include_line_numbers = FALSE,
  validate = TRUE
)

Arguments

path: Character string specifying the path to the folder containing files, or path to a single file
pattern: Character string specifying the file pattern to match. Default: "\.(R|r|py|sql|sh|jl)$" (R, Python, SQL, shell, Julia files)
recursive: Logical. Should subdirectories be searched recursively? Default: FALSE
include_line_numbers: Logical. Should line numbers be included in output? Default: FALSE
validate: Logical. Should annotations be validated for common issues? Default: TRUE

Value

A data frame containing file names and all properties found in annotations. Always includes columns: file_name, file_type, and any properties found in PUT annotations (typically: id, label, node_type, input, output). If include_line_numbers is TRUE, also includes line_number. Note: If output is not specified in an annotation, it defaults to the file name.

PUT Annotation Syntax

PUT annotations can be written in single-line or multiline format:

Single-line format: All parameters on one line

#put id:"node1", label:"Process Data", input:"data.csv", output:"result.csv"

Multiline format: Use backslash (\) for line continuation

#put id:"node1", label:"Process Data", \
#    input:"data.csv", \
#    output:"result.csv"

Benefits of multiline format:

Compliance with code style guidelines (styler, lintr)
Improved readability for complex workflows
Easier maintenance of long file lists
Better code organization and documentation

Syntax rules:

End lines with backslash (\) to continue
Each continuation line must start with # comment marker
Properties are automatically joined with proper comma separation
Works with all PUT formats: #put, # put, #put|, #put:

Examples

if (FALSE) { # \dontrun{
# Scan a directory for workflow annotations
workflow <- put("./src/")

# Scan recursively including subdirectories
workflow <- put("./project/", recursive = TRUE)

# Scan a single file
workflow <- put("./script.R")

# Include line numbers for debugging
workflow <- put("./src/", include_line_numbers = TRUE)

# Single-line PUT annotations (basic syntax):
# #put id:"load_data", label:"Load Dataset", node_type:"input", output:"data.csv"
# #put id:"process", label:"Clean Data", node_type:"process", input:"data.csv", output:"clean.csv"
#
# Multiline PUT annotations (for better code style compliance):
# Use backslash (\) at end of line to continue on next line
# #put id:"complex_process", label:"Complex Data Processing", \
# #    input:"file1.csv,file2.csv,file3.csv,file4.csv", \
# #    output:"results.csv"
#
# Multiline example with many files:
# #put id:"data_merger", \
# #    label:"Merge Multiple Data Sources", \
# #    node_type:"process", \
# #    input:"sales.csv,customers.csv,products.csv,inventory.csv", \
# #    output:"merged_dataset.csv"
#
# All PUT formats support multiline syntax:
# # put id:"style1", label:"Standard" \     # Space after #
# #put| id:"style2", label:"Pipe" \        # Pipe separator
# #put: id:"style3", label:"Colon" \       # Colon separator
} # }