Version: V11

HuggingFace Embedding Node

The HuggingFace Embedding node converts text content into numerical vector representations using Sentence Transformers models that run locally on your infrastructure. No API keys are required—models run entirely on your servers, providing complete data privacy and eliminating per-request costs. Multi-GPU processing supports high-throughput batch operations for large-scale embedding generation.

How It Works

When the node executes, it receives text input from a workflow variable, loads the specified model from local cache or downloads it from HuggingFace Hub if not cached, processes the text through the model, and returns embedding vectors as arrays of floating-point numbers. Each text input produces one embedding vector, with dimensionality determined by the selected model (e.g., all-mpnet-base-v2 produces 768-dimensional vectors, all-MiniLM-L6-v2 produces 384-dimensional vectors). The node validates text content, loads the model into memory, encodes the text through the neural network, and stores the resulting vectors in the output variable.

Models are downloaded once from HuggingFace Hub and cached locally for subsequent use, with configurable cache directories to manage storage locations. The first execution with a new model triggers a download which may take several minutes depending on model size and network speed, but subsequent executions use the cached model for immediate processing. The node supports multi-GPU processing for high-throughput scenarios, distributing the workload across available GPUs to accelerate processing.

Output embeddings maintain correlation with input items through unique identifiers, with each embedding traced back to its source text via UUID. Failed embedding generation for individual items does not stop processing of other items. Model selection determines both embedding quality and dimensionality, with larger models generally producing higher quality embeddings at the cost of increased memory usage and slower processing.

Configuration Parameters

Input Field

Input Field (Text, Required): Workflow variable containing text to embed.

The node expects a list of embedding request objects where each object contains a type field (set to "text"), an optional id field (string for tracking), and a text field (string content to embed). Single objects are automatically converted to single-item lists.

Example input structure:

[
  {"type": "text", "id": "doc1", "text": "First document content"},
  {"type": "text", "id": "doc2", "text": "Second document content"}
]

Output Field

Output Field (Text, Required): Workflow variable where embedding results are stored.

The output is a list of EmbeddingResponse objects where each object contains a uuid field (string identifier matching input ID or generated UUID) and an embeddings field (array of floating-point numbers). The list maintains the same order as the input. Empty embeddings are returned for failed generation attempts.

Example output structure:

[
  {"uuid": "doc1", "embeddings": [0.123, -0.456, 0.789, ...]},
  {"uuid": "doc2", "embeddings": [0.234, -0.567, 0.890, ...]}
]

Common naming patterns: text_embeddings, document_vectors, embedding_results, local_embeddings.

Model

Model (Text, Required): HuggingFace model identifier or local path for embedding generation.

Popular models include sentence-transformers/all-mpnet-base-v2 (768 dimensions, high quality), sentence-transformers/all-MiniLM-L6-v2 (384 dimensions, fast and efficient), and BAAI/bge-large-en-v1.5 (1024 dimensions, state-of-the-art quality). Models are automatically downloaded from HuggingFace Hub on first use and cached locally. Variable interpolation using ${variable_name} syntax is supported.

Cache Folder

Cache Folder (Text, Optional): Directory path where downloaded models are cached.

Leave empty to use the default cache location determined by the SENTENCE_TRANSFORMERS_HOME environment variable or ~/.cache/torch/sentence_transformers. This parameter controls where models are stored, useful for managing disk space or containerized environments with specific volume mounts. The directory must be writable with sufficient space (typically 100MB to several GB per model).

Multi-Process

Multi-Process (Toggle, Optional): Enable multi-GPU processing for faster encoding.

When enabled, the node automatically detects available GPUs and distributes the workload, reducing processing time for high-volume embedding tasks. Requires multiple GPUs—enabling this on single-GPU or CPU-only systems has no effect. Multi-GPU coordination overhead is only beneficial for large batches (1000+ texts).

Show Progress

Show Progress (Toggle, Optional): Display progress bar during batch operations.

When enabled, shows progress information including texts processed and estimated time remaining in execution logs. Helps identify whether the node is actively processing or stalled. Disable for production workflows where log verbosity should be minimized.

Common Parameters

This node supports common parameters shared across workflow nodes, including Stream Output Response, Streaming Messages, and Logging Mode. For detailed information, see Common Parameters.

Best Practices

Choose models based on quality and performance requirements: all-mpnet-base-v2 for excellent quality, all-MiniLM-L6-v2 for faster processing, bge-large-en-v1.5 for state-of-the-art accuracy
Pre-download models to the cache folder before production use to avoid download delays during first execution
Use the same model for both document and query embeddings in search systems to ensure vector compatibility
Enable Multi-Process only when processing large batches (1000+ texts) on multi-GPU systems
Configure Cache Folder to use fast storage (SSD) when possible, as model loading impacts first-execution performance
Monitor disk space in the cache directory; remove unused models periodically to free space

Limitations

Local execution only: Models run on the server hosting the workflow engine. Ensure sufficient CPU/GPU resources and memory, as large models can require several GB of RAM.
First-run download delay: The first execution with a new model triggers a download from HuggingFace Hub which may take several minutes. Subsequent executions use the cached model.
Text-only support: The node only supports text embeddings. Image embedding requests fail even though the node accepts multimodal input format.
Model compatibility: Only Sentence Transformers models are supported. Standard HuggingFace transformer models without the Sentence Transformers wrapper do not work.
GPU memory requirements: Large models and large batches can exceed GPU memory limits, causing out-of-memory errors. Monitor GPU memory and reduce batch sizes if needed.
No automatic model updates: Cached models are not automatically updated when new versions are released. Manually clear the cache to download updated models.

How It Works​

Configuration Parameters​

Input Field​

Output Field​

Model​

Cache Folder​

Multi-Process​

Show Progress​

Common Parameters​

Best Practices​

Limitations​

Related Articles​