Skip to main content
Version: V12

Understanding Ingestion of Content from Azure Blob Storage in VIDIZMO

Azure Blob Storage is a scalable cloud object storage service from Microsoft Azure, designed for storing and retrieving unstructured data such as media files, documents, and backups. VIDIZMO supports Azure Blob Storage as a content ingestion source, so you can automatically import files from your Azure containers into the VIDIZMO portal for centralized management, processing, and playback.

The Role of Azure Blob Storage

Azure Blob Storage is a cost-effective repository for a wide range of digital content, including video recordings, audio files, PDFs, and images. Organizations across industries use it to store content generated from business processes such as recorded meetings, training sessions, surveillance footage, legal evidence, and customer interactions.

VIDIZMO ingests content directly from Azure Blob Storage containers, so you can centralize your media management while keeping the scalability, security, and global availability that Azure provides.

Why Use Content Ingestion

Organizations generate large volumes of multimedia content stored across cloud environments. VIDIZMO's Azure Blob Storage ingestion addresses this by providing a centralized content management solution.

  • Centralized content repository. Import multimedia content from multiple Azure containers into a single VIDIZMO portal, instead of managing content across separate storage locations.
  • Search and playback. Ingested content becomes searchable and playable through VIDIZMO's search and player features, so users can find, view, and interact with their content.
  • Content processing. VIDIZMO automatically processes ingested content through its pipeline, including transcoding, indexing, transcription, and AI-powered insights.
  • Storage-provider-agnostic target. Content ingested from Azure Blob Storage can be stored in any VIDIZMO-configured storage provider, whether that is Azure Storage, AWS S3, or VIDIZMO local storage. The ingestion source and storage target are independent.
  • Automation. Manual content transfer is time-consuming and error-prone. Content ingestion automates the process, saving time and reducing human error.

How Content Ingestion Works

To start ingesting content, an administrator configures the Azure Blob Storage ingestion app in the VIDIZMO portal with the storage account credentials (Account Name and Account Key). This establishes a secure connection between VIDIZMO and the Azure Blob Storage account.

Once configured and turned on, VIDIZMO runs a background synchronization workflow at a configurable interval. During each sync cycle, the system:

  1. Scans the specified containers or paths for blobs.
  2. Filters out excluded folders and empty files.
  3. Skips files that have already been ingested to prevent duplicates.
  4. Groups related files according to the configured file grouping strategy.
  5. Maps each file type to the appropriate content section using mashup part rules.
  6. Downloads the files from Azure Blob Storage and uploads them to the tenant's configured storage provider.
  7. Starts the content processing pipeline (encoding, indexing, and thumbnail generation).
  8. Applies the configured post-ingestion action on the source blobs (keep, delete, or move).

This workflow repeats at the configured time interval, so new content added to the Azure Blob Storage containers is continuously detected and ingested without manual intervention.

You can customize how content is organized after ingestion, such as maintaining the original folder hierarchy or importing items into a flat structure. You can also set up rules for how files are grouped together and how metadata files are mapped. The following sections explain these concepts.

Ingested Content Setup

Content Organization Preference

When setting up content ingestion, you have two options for organizing the ingested content:

  • Flat. Each item from the Azure Blob Storage container is imported into the VIDIZMO portal without any folders. Everything is placed at the portal's root level.
  • Hierarchical. The original folder structure from the Azure Blob Storage container is maintained. Content is ingested with the same folder hierarchy recreated in the portal.

Include and Exclude Folders

You can specify which containers or folder paths within the Azure Blob Storage account to include or exclude from ingestion. The portal follows these specifications, importing content from the designated include paths and ignoring content from the exclude paths.

This option is optional. If you don't specify any paths, all containers in the storage account are included in the ingestion process.

Source Content Post Ingestion

After ingestion, you can choose what happens to the source content:

  • Keep content unaltered post ingestion. The content stays unchanged in the Azure Blob Storage container. The original blobs are preserved in their source location.
  • Delete. The content is removed from the Azure Blob Storage container after successful ingestion into VIDIZMO.
  • Move content to container post ingestion. The content is moved to a specified destination container. You can configure a destination storage account, container name, and folder path. If the specified container or folder doesn't exist, Azure creates it automatically. This option supports moving content to an archive container, even in a different Azure storage account.

Publishing Status

You can choose what happens to content after it's ingested into the portal:

  • Publish. Content is automatically published and available in the portal.
  • Drafted. Content is saved in the draft tab, so you can review it before publishing.

Viewing Access

You can customize viewing access for ingested content:

  1. Portal Security/Publish Settings. Viewing access follows the settings configured in the control panel. To learn more, refer to Understanding Portal's Security Policy.
  2. Anonymous Users. Anonymous users can view ingested content. This option isn't available in the DEMS product package but is available in the Enterprise Tube product package.
  3. Portal Users. All portal users can view the ingested content.
  4. Account and Portal Users. All portal and account users can view ingested content.

Time Interval

The time interval defines the number of seconds the system rests between ingestion cycles. After completing one cycle, the system waits for this interval before starting the next. If you have a large amount of content to ingest, use a longer interval. The minimum recommended value is five seconds.

Content File Grouping

You can configure file grouping to organize related files together during ingestion. VIDIZMO provides four file group type options.

  • None

No grouping is applied. The application ingests all defined file types as original content. This is the default setting. Specify a file type's regex pattern in the mapping rules to ingest only that type as original content.

  • Substring

Files are grouped based on a common character sequence in the file name. Three fields control substring grouping:

  1. Start Position. The numeric index where substring extraction begins in the file name.
  2. Number of Characters to Include. How many characters to extract from the file name after the start position.
  3. Minimum Group File Count. The minimum number of files required to form a group. A group is created only when the specified minimum count of files sharing the same substring is met. For example, if the minimum count is set to two, the system only groups files when at least two files have identical substrings.

For example, consider two files: Audio_Song.wav and Audio_Song.json.

Start Position: 0 Number of Characters to Include: 6 Minimum Group File Count: 2

The system extracts the first six characters (Audio_) from each file name. Since both files share this substring and there are at least two files, they're grouped together.

  • Regular Expression

You can group files using a regular expression (regex) pattern. For example, a pattern for grouping files with the prefix Audio_ followed by any extension (.wav or .json):

Regex: (?<GroupName>^Audio_).*\.(wav|json)

  • (?<GroupName> Defines a named capturing group for the pattern.
  • ^Audio_ Specifies that the file name starts with Audio_.
  • .* Matches any characters (except a newline) zero or more times.
  • \.(wav|json) Matches either .wav or .json extension at the end of the file name.

You can create and test your custom file grouping regex at regex101.com.

The minimum file group count concept still applies when using regular expressions.

  • Last Folder

Files are grouped based on the last folder in their file path. For example, given these file paths:

folder1/folder2/Audio_Song.wav folder1/folder2/Audio_Music.mp3 folder1/folder3/Audio_Podcast.wav folder1/folder3/Audio_Interview.mp3

Files in folder2 and folder3 are grouped separately since they're the last folders in their respective paths. This strategy organizes files by their immediate parent folder.

The minimum file group count concept still applies when using the last folder option.

Content File Type Mapping

File type mapping defines rules for how files are assigned to content sections after ingestion.

Media File Sections

Each content item in VIDIZMO has sections for different types of associated files. You can define rules to determine which file type goes into which section. Multiple rules can be specified, each with its own criteria and associated content section. If a file meets any of the provided criteria, it's placed into the corresponding section.

You can store files in various formats, including .vtt and .json. No fixed association exists between a file format and a specific section. You choose which section each format goes into. However, use caution when defining rules to avoid placing files in sections not intended for those formats, as it may cause issues.

The available media file sections are:

  • Audio PCM. Digitally encoded audio data using PCM.
  • Closed Caption. Closed captions associated with video content.
  • Content. Primary content files.
  • Supporting Files. Files that support the main content, such as metadata, additional documentation, or related files.
  • Thumbnails. Thumbnail images associated with the content.
  • Original Content. The original content file.

Note: At least one rule for the "OriginalContent" media file section is mandatory. If you don't specify a rule for a file, the file is placed in the Supporting File section by default.

Regex for File Type

Define a regex pattern for each media file section rule to identify which file types belong to that section. Use .* to match all files, or a specific pattern for particular files. For example, .*\.wav matches WAV files.

Example: the regex pattern .*\.mp4 identifies all .mp4 files and stores them in the designated media file section.

Use Cases

Organizations that already use Azure as their primary cloud platform benefit from a native integration that keeps content within the Azure ecosystem during ingestion. This reduces the need for cross-cloud data transfers when both the ingestion source and the VIDIZMO storage provider are configured with Azure, lowering latency and data transfer costs.

Legal firms, law enforcement agencies, and courts can use Azure Blob Storage ingestion to create an organized repository for managing legal documents, evidence, and case files. This simplifies access to critical content while maintaining security and regulatory compliance standards.

See Also