Skip to main content
Version: V11

OpenAI Embedding Node

The OpenAI Embedding node converts text content into numerical vector representations using OpenAI's cloud-based embedding models. It supports both standard OpenAI and Azure OpenAI deployments, with configurable embedding dimensions for storage and performance optimization. Automatic retry logic and batch processing provide reliability and efficiency for large-scale operations.

How It Works

When the node executes, it receives text input from a workflow variable, sends the text to OpenAI's API with authentication credentials, and returns embedding vectors as arrays of floating-point numbers. Each text input produces one embedding vector, with dimensionality determined by the model (e.g., text-embedding-3-small produces 1536-dimensional vectors by default). The node constructs API requests with the specified model, sends batched requests to OpenAI's servers, and stores the resulting vectors in the output variable.

The node supports both standard OpenAI and Azure OpenAI deployments, allowing organizations to choose between OpenAI's public cloud service or Azure-hosted instances for compliance and data residency requirements. Authentication is handled through API keys for standard OpenAI or Azure-specific credentials for Azure OpenAI. The node implements automatic retry logic for failed requests and batch processing to group multiple texts into single API calls.

Output embeddings maintain correlation with input items through unique identifiers, with each embedding traced back to its source text via UUID. The node supports dimension reduction for text-embedding-3 models, allowing fewer dimensions than the model's default to reduce storage requirements while maintaining quality. Failed embedding generation for individual items does not stop processing of other items.

Configuration Parameters

Input Field

Input Field (Text, Required): Workflow variable containing text to embed.

The node expects a list of embedding request objects where each object contains a type field (set to "text"), an optional id field (string for tracking), and a text field (string content to embed). Single objects are automatically converted to single-item lists.

Example input structure:

[
{"type": "text", "id": "doc1", "text": "First document content"},
{"type": "text", "id": "doc2", "text": "Second document content"}
]

Output Field

Output Field (Text, Required): Workflow variable where embedding results are stored.

The output is a list of EmbeddingResponse objects where each object contains a uuid field (string identifier matching input ID or generated UUID) and an embeddings field (array of floating-point numbers). The list maintains the same order as the input. Empty embeddings are returned for failed generation attempts.

Example output structure:

[
{"uuid": "doc1", "embeddings": [0.123, -0.456, 0.789, ...]},
{"uuid": "doc2", "embeddings": [0.234, -0.567, 0.890, ...]}
]

Common naming patterns: text_embeddings, document_vectors, openai_embeddings, query_vectors.

Model

Model (Text, Required): OpenAI model identifier for embedding generation.

Options include text-embedding-3-small (1536 dimensions, cost-effective), text-embedding-3-large (3072 dimensions, highest quality), and text-embedding-ada-002 (1536 dimensions, legacy). The text-embedding-3 models support dimension reduction through the Dimensions parameter. Variable interpolation using ${variable_name} syntax is supported.

OpenAI API key

OpenAI API Key (Text, Required): API key for authentication with OpenAI.

Obtain keys from https://platform.openai.com/api-keys. Variable interpolation with ${variable_name} syntax enables secure credential management. API keys typically start with sk- prefix.

Dimensions

Dimensions (Number, Optional): Number of dimensions for output embeddings with text-embedding-3 models.

Allows dimension reduction from the model's default (1536 for text-embedding-3-small, 3072 for text-embedding-3-large) to reduce storage requirements while maintaining quality. For example, reducing text-embedding-3-large from 3072 to 1024 dimensions significantly reduces storage costs. Minimum value is 1. Only supported in text-embedding-3 and later models.

OpenAI API Base URL

OpenAI API Base URL (Text, Optional): Custom API endpoint for OpenAI services.

Leave empty for the default OpenAI endpoint (https://api.openai.com/v1). For Azure OpenAI deployments, use the format https://{resource-name}.openai.azure.com. Variable interpolation is supported.

OpenAI API Type

OpenAI API Type (Text, Optional): API type specification.

Values: openai for standard OpenAI or azure for Azure OpenAI. Determines how authentication and endpoint URLs are constructed. Leave empty for standard OpenAI deployments.

OpenAI API Version

OpenAI API Version (Text, Optional): API version for Azure OpenAI deployments.

Example: 2023-05-15. Required when using Azure OpenAI. Leave empty for standard OpenAI deployments.

Deployment Name

Deployment Name (Text, Optional): Deployment name for Azure OpenAI.

Required when using Azure OpenAI. This is the name assigned to the embedding model deployment in the Azure portal. Variable interpolation is supported.

OpenAI Organization ID

OpenAI Organization ID (Text, Optional): Organization identifier for usage tracking.

Helps track API usage across different organizations when belonging to multiple OpenAI organizations. Organization IDs typically start with org- prefix.

Embedding Context Length*

Embedding Context Length (Number, Optional): Maximum tokens the model can process.

Default is 8191 for most OpenAI embedding models. Texts exceeding this length are truncated or rejected depending on the Check Embedding Context Length setting.

Chunk Size

Chunk Size (Number, Optional): Number of texts per API request.

Higher values improve throughput but increase memory usage. Lower values reduce memory usage and allow finer-grained error handling. Minimum value is 1.

Max Retries

Max Retries (Number, Optional): Maximum retry attempts for failed API requests.

The node automatically retries with exponential backoff before giving up. Higher values improve reliability for transient issues. Minimum value is 0 (no retries).

Request Timeout

Request Timeout (Number, Optional): Maximum seconds to wait for API response.

Prevents workflows from hanging on slow or unresponsive API calls. Applies to individual API requests, not entire batch operations. Minimum value is 1 second.

Show Progress Bar

Show Progress Bar (Toggle, Optional): Display progress indicator during batch operations.

Shows progress information including texts processed and estimated time remaining in execution logs.

Skip Empty Strings

Skip Empty Strings (Toggle, Optional): Automatically skip empty text strings.

When enabled, filters out empty strings before processing, avoiding API errors and reducing unnecessary API calls. Empty strings produce empty embeddings to maintain index alignment.

Check Embedding Context Length

Check Embedding Context Length (Toggle, Optional): Validate text length before sending to API.

When enabled, checks if text exceeds the Embedding Context Length parameter and prevents sending oversized texts. Helps catch errors early with clear error messages.

Common Parameters

This node supports common parameters shared across workflow nodes, including Stream Output Response, Streaming Messages, and Logging Mode. For detailed information, see Common Parameters.

Best Practices

  • Store API keys in workflow variables rather than hardcoding, enabling secure credential management
  • Choose text-embedding-3-small for cost-effective embeddings, text-embedding-3-large for highest quality
  • Use the Dimensions parameter with text-embedding-3 models to reduce storage costs; 512-1024 dimensions often maintains quality
  • Configure Chunk Size based on text length: larger chunks (100-500) for short texts, smaller chunks (10-50) for long texts
  • Enable Max Retries (3-5) for production workflows to handle transient network issues
  • Enable Check Embedding Context Length to catch oversized texts early
  • Monitor OpenAI API usage and costs through the OpenAI dashboard

Limitations

  • API rate limits: Subject to OpenAI's API rate limits based on account tier. High-volume embedding generation may be throttled.
  • API costs: Each embedding request consumes API quota and incurs costs based on token count. Monitor usage to avoid unexpected charges.
  • Text-only support: The node only supports text embeddings. Image embedding requests fail even though the node accepts multimodal input format.
  • Network dependency: Requires internet connectivity to reach OpenAI's API servers. Network issues or API outages cause workflow failures.
  • Token limits: Texts exceeding the model's context length (typically 8191 tokens) must be truncated or split before embedding.
  • Dimension reduction limitations: The Dimensions parameter only works with text-embedding-3 models. Legacy models do not support dimension reduction.