Version: V11

Data Extractor Node

The Data Extractor Node extracts specific fields from structured data objects and converts them into text or structured format for downstream processing. It supports dot notation for accessing nested fields and provides extensive formatting control through separators, metadata inclusion, and record identifiers. This selective extraction reduces data volume and focuses processing on relevant information.

How It Works

When the node executes, it reads objects from the input variable and extracts the specified fields from each object using dot notation paths. For nested data like arrays within objects, the node automatically handles iteration and flattening based on configuration.

Two output modes are available: flat text format that concatenates all extracted values into a single string suitable for LLM prompts, or structured format that preserves the original object hierarchy for programmatic processing. The node applies filtering and formatting rules during extraction, removing empty values, truncating long fields, and adding separators or metadata labels as configured.

Record identifiers like [Document #1] help track which extracted content came from which source object, making it easier to reference specific documents in LLM responses.

Configuration Parameters

Input Field

Input Field (Text, Required): Workflow variable containing structured data.

The node expects objects or arrays of objects with extractable fields. Primitive types like strings or numbers are not supported.

Output Field

Output Field (Text, Required): Workflow variable where extracted data is stored.

The output is either flat text (when Preserve Structure is disabled) or structured data (when Preserve Structure is enabled).

Common naming patterns: extracted_text, extracted_data, formatted_content.

Fields to Extract

Fields to Extract (Array, Required): Field names to extract using dot notation for nested fields.

Each field path specifies the exact location of data within the object structure. Example: Title extracts a top-level field, while Content.ContentFiles.url navigates through nested objects. The node supports complex nested structures and handles arrays at any level of the path.

Preserve Structure

Preserve Structure (Toggle, Default: false): Maintain original object structure or flatten to text.

Mode	Output format	Use when
Disabled	Flat text with configurable separators and metadata	Preparing data for LLM prompts, creating readable summaries
Enabled	Structured objects/arrays preserving hierarchy	Passing data to nodes for programmatic processing, maintaining relationships

When disabled, additional formatting options become available.

Keep Parent Structure

Keep Parent Structure (Toggle, Optional): Keep full nested path or use only last key.

Only applicable when Preserve Structure is enabled. When enabled, preserves complete hierarchy like {"content": {"contentDetails": {...}}}. When disabled, uses only the last key like {"contentFiles": [...]}.

Include Metadata

Include Metadata (Toggle, Optional): Include field names as labels in extracted text.

Only applicable when Preserve Structure is disabled. When enabled, output includes labels like Name: John, Age: 30. When disabled, output contains only values like John, 30. Labels help LLMs understand field context but increase token usage.

Field Separator

Field Separator (Dropdown, Default: Comma): Separator between fields within a single object.

Only applicable when Preserve Structure is disabled.

Separator	Output example	Use when
Space	`John 30 Engineer`	Creating compact output
Comma	`John, 30, Engineer`	Standard CSV-style formatting
New Line	Each field on separate line	Vertical layout for readability
Pipe	`John \| 30 \| Engineer`	Clear visual separation

Object Separator

Object Separator (Dropdown, Default: New Line): Separator between multiple objects.

Only applicable when Preserve Structure is disabled.

Separator	Output example	Use when
New Line	Each object on separate line	Standard line-by-line output
Double New Line	Blank line between objects	Better readability with spacing
Triple New Line	Two blank lines between objects	Document-style separation
Space	Inline format	Compact output
Comma	CSV-style	List format
Pipe	Clear visual boundaries	Strong field separation
Section (---)	Strong visual separation	Document sections

Max Field Length

Max Field Length (Number, Optional): Maximum characters per field.

Only applicable when Preserve Structure is disabled. Fields exceeding this length are truncated with ... appended. Leave empty for no limit. Prevents individual fields from consuming excessive tokens in LLM prompts.

Include Record IDs

Include Record IDs (Toggle, Optional): Include identifiers like [Document #1] with each object.

Only applicable when Preserve Structure is disabled. Record IDs help track and reference specific documents in LLM responses.

Record ID Prefix

Record ID Prefix (Text, Optional): Prefix text for record identifiers.

Only applicable when Include Record IDs is enabled. Results in identifiers like [Document #1], [Profile #1], or [Resume #1]. Leave empty to use just the number like [#1].

Record ID Field

Record ID Field (Text, Optional): Field name to use as identifier instead of sequential numbers.

Only applicable when Include Record IDs is enabled. If specified and the field exists, the node uses that value (e.g., CaseId, DocumentId). Falls back to sequential numbering if the field doesn't exist.

Filter Empty Values

Filter Empty Values (Toggle, Default: false): Exclude null or empty fields from output.

Skips fields with null values, empty strings, or empty collections. Creates cleaner output and prevents LLMs from processing irrelevant null values.

Common Parameters

This node supports common parameters shared across workflow nodes, including Stream Output Response, Streaming Messages, Logging Mode, and Wait For All Edges. For detailed information, see Common Parameters.

Best Practices

Extract only needed fields rather than entire objects to reduce token usage and improve LLM performance
Test field paths with a small dataset to ensure correct extraction level in nested arrays
Use Include Metadata when sending data to LLMs for field context; disable when self-explanatory to save tokens
Enable Filter Empty Values to create cleaner output and prevent LLMs from processing null values
Apply Max Field Length limits for fields like descriptions that may vary significantly in length
Enable Include Record IDs with meaningful prefixes so LLMs can cite specific sources in responses

Limitations

Input type restriction: Input must be objects or dictionaries with extractable fields. Primitive types like strings, numbers, or booleans are not supported.
Field path validation: Invalid field paths that don't exist in objects return null values. Verify paths match your data structure.
Array flattening behavior: When extracting a single field from nested arrays with Preserve Structure disabled, all values are completely flattened into a single array.
Max field length truncation: Length limits truncate fields mid-content without word boundary awareness, potentially cutting words.
Record ID fallback: When Record ID Field is specified but doesn't exist in objects, the node falls back to sequential numbering.

How It Works​

Configuration Parameters​

Input Field​

Output Field​

Fields to Extract​

Preserve Structure​

Keep Parent Structure​

Include Metadata​

Field Separator​

Object Separator​

Max Field Length​

Include Record IDs​

Record ID Prefix​

Record ID Field​

Filter Empty Values​

Common Parameters​

Best Practices​

Limitations​

Related Articles​