Ottomi Multimodal AI Data Platform (Ottomi-Nexus 3.0) — Capability Overview

Ottomi Nexus is an end-to-end data processing platform built around the DataOps methodology. It is delivered as an all-in-one software package and deployed in containers. Designed for government and enterprise scenarios, it supports data governance, data development, data asset operations, multimodal AI data management, and trusted data space construction.

The platform provides integrated capabilities across the full data lifecycle, including data ingestion, standardization and governance, data quality management, data development and modeling, task scheduling, data asset services, and AI-powered applications.

The functional modules are organized into the following major areas:


1. Management Center

SubmoduleCore Capabilities
Account ManagementUser/member management, organizational unit management, role management
Permission SystemRBAC + ABAC, with 6-level access granularity: System → Project → Data Source → Table → Row → Column
Row-Level PermissionFine-grained row-level data access control
Log ManagementFull-operation audit trail and tamper-resistant logs
AI Assistant ConfigurationLarge language model integration configuration and API key management
System ConfigurationNotification channel configuration, supporting in-platform messages, email, WeCom / Enterprise WeChat, etc.

2. Business Planning

SubmoduleCore Capabilities
Data Layering DesignODS raw data layer → DWD standardized detail layer → ADS application metric layer
Business Domains and Subject AreasBusiness domain creation and subject area definition to support enterprise-level data architecture planning
Project Workspace ManagementProject creation, compute source management, member management
Dual-Sandbox ArchitectureStrong isolation between development sandbox and production sandbox, or an integrated single-sandbox mode
Three Deployment ModesStandard dual-sandbox deployment for large enterprise groups, flexible hybrid deployment for mid-sized organizations, and lightweight all-in-one deployment for smaller scenarios

3. Data Ingestion Engine

The Data Ingestion Engine is designed for enterprise scenarios involving multi-source, heterogeneous data collection. It supports multiple ingestion methods, including databases, CDC, APIs, and files. It enables unified management and consolidation of structured, semi-structured, and multimodal data resources.

SubmoduleCore Capabilities
Source Database ManagementRegister heterogeneous data sources. Supports 40+ data sources including MySQL, PostgreSQL, Oracle, DB2, SQL Server, Dameng, Kingbase, OceanBase, TiDB, ArgoDB, Greenplum, ClickHouse, Doris, StarRocks, GBase, Hive, etc.
Database Type ManagementExtensible through JDBC drivers, supporting custom enterprise data source integration
Table ExtractionSupports table-level data extraction, including full synchronization, incremental synchronization, and differential update synchronization
CDC SynchronizationSupports millisecond-level change data capture for MySQL CDC, Oracle CDC, PostgreSQL CDC, SQL Server CDC, MongoDB CDC, etc.
API-Based CollectionAutomatically generates API collection tasks through UI configuration and retrieves data from source APIs. Supports HTTP methods such as GET, POST, PATCH, etc.
API Parameter ConfigurationSupports URL parameters, body parameters, and request headers. Supports parameter transformation, including no transformation and Java script-based transformation
Body Parameter FormatsSupports multiple body formats including form-data, application/json, text/plain, etc.
Parameter SourcesAPI collection parameters can come from custom configuration or database table configuration, enabling dynamic parameterized collection
File CollectionSupports importing data files in various formats including CSV, TXT, XLSX, LSX, etc.; planned support for JSON, XML, and ORC file uploads
Sample Rules & Sample Engine5 sample generation strategies: binding sample rules, expression-based calculation, external table value-domain generation, base type generation, and generation based on original table data. Provides a three-layer rule system: basic rules, business rules, and special rules. Supports privacy-preserving computation transformation, allowing sample data to participate in computation
Resource ExplorerData source browsing, table schema viewing, DDL copying, and data querying
Metadata ManagementAutomatic cataloging and asset publishing/unpublishing
AI-Powered Auto CatalogingAI-assisted automatic cataloging and aggregation of source-side data assets

3.1 Data Collection Capability Details

3.1.1 Table Extraction

Table extraction is designed for synchronizing data from traditional business system databases. It supports synchronizing table data from source databases to designated target systems within the platform. Typical use cases include historical data initialization, periodic data collection, and business system data consolidation.

Supported methods include:

  • Full Synchronization: Extracts all data from the source table at once. Suitable for initial loading and historical data migration.
  • Incremental Synchronization: Synchronizes only newly added or changed data based on incremental identifiers such as time fields, primary keys, or version numbers.
  • Differential Update Synchronization: Compares source and target data, identifies differences, and performs insert or update operations accordingly.

3.1.2 CDC Synchronization

CDC synchronization captures change logs from source databases to enable low-latency data change synchronization. It is suitable for real-time data warehouses, real-time metrics, asynchronous decoupling of business systems, and real-time ingestion into data lakes.

Supported capabilities include:

  • Millisecond-level data synchronization latency;
  • Capture of insert, update, and delete events;
  • Support for MySQL CDC, Oracle CDC, PostgreSQL CDC, SQL Server CDC, and MongoDB CDC;
  • Integration with real-time computing, real-time data quality validation, and real-time data services.

3.1.3 API-Based Collection

API-based collection is a tool that automatically generates API data collection tasks through visual configuration and retrieves data from source system APIs. Users can collect data from third-party APIs without writing complex code.

Supported capabilities include:

  • Support for HTTP methods such as GET, POST, PATCH, etc.;
  • Support for URL parameters, body parameters, and request header configuration;
  • Support for parameter transformation, including no transformation and Java script-based transformation;
  • Support for body parameter formats such as form-data, application/json, and text/plain;
  • Parameters can come from custom configuration or database table configuration;
  • Applicable to data collection from SaaS systems, business systems, government service interfaces, third-party open platforms, and similar scenarios.

3.1.4 File Collection

File collection is designed for batch file import and external file resource management. It supports importing local or remote files into the platform and converting them into processable data resources.

Supported formats include:

  • Currently supported: CSV, TXT, XLSX, LSX, etc.;
  • Planned support: JSON, XML, ORC, and other file upload resources;
  • Can be integrated with data standards, quality checks, data development, asset cataloging, and other modules to standardize and govern file-based data.

4. Development Center · Data Development

Core capabilities: visual drag-and-drop ETL canvas + AI assistant for “modeling through conversation”.

4.1 Development Component Library: 9 Categories, 95+ Components

CategoryQuantityRepresentative Components
Real-Time Input7Kafka, MySQL CDC, Oracle CDC, SQL Server CDC, MongoDB CDC, PostgreSQL CDC, EventStore
Real-Time Output3Single-table output, StarRocks output, Kafka output
Offline Input14Single table, API, MongoDB, StarRocks, Excel, CSV, XML, Text, S3, JSON, logical table, FTP, SFTP, RabbitMQ
Offline Output9Text, Excel, CSV, XML, JSON, ORC, S3, FTP, SFTP
Data Transformation, Common for Real-Time and Offline19Outlier detection, unique ID generation, column-to-row conversion, NULL replacement, data filtering, value replacement, string trim / case conversion / split / concatenation / slicing, field filtering, field name mapping, advanced Java transformation, JsonPath extraction, function calculation, data encryption/decryption, data masking
Offline Scripts11Script management, SQL, Shell, Python, Flink, MR, FlinkSQL, HQL, DataX, Sqoop, Flink JAR
Offline Data Operations3Aggregation, deduplication, sorting
Offline Multi-Table Synchronization1Batch synchronization of multiple tables
Offline Data Fusion1Table merging

4.2 Built-In Function Library: 84+ Functions

CategoryQuantityExamples
Numeric Functions27ABS, CEIL, FLOOR, ROUND, MOD, SQRT, EXP, LN, LOG, POWER, RAND, etc.
String Functions28CONCAT, SUBSTR, TRIM, REPLACE, REGEXP_LIKE, REGEXP_REPLACE, LEFT, RIGHT, LPAD, RPAD, etc.
Date and Time FunctionsSeveralDate formatting, date calculation, time difference calculation, etc.
System FunctionsSeveralSystem variables, environment information, etc.

4.3 AI Canvas Assistant

  • An intelligent assistant chat panel embedded in the visual modeling canvas;
  • Natural language description → automatic interpretation → automatic data source selection, operator placement, parameter configuration, and workflow connection on the canvas;
  • Supports AI model integration, including cloud-based models and locally deployed private models;
  • Covers AI-assisted scenarios such as data collection, data development, and data quality inspection;
  • Helps data engineers, data analysts, and business users lower the barrier to data development.

5. Data Standards Management

The Data Standards module is divided into four major functional areas: Standards Management, Reference Data, General Configuration, and Standard Implementation Assessment.

It covers the complete process of industry standard hierarchy construction, full-lifecycle management of business data standards, standard resource accumulation, standardized template configuration, intelligent feature recognition, automatic data standard matching, full-domain compliance scanning, implementation effectiveness evaluation, and execution traceability.

This module supports traditional structured data standardization and governance for government and enterprise users. It is also designed to support standardized management of multimodal AI data such as text, images, audio, and video. The goal is to provide clean, unified, consistent, and compliant high-quality foundational data for automated modeling.

It helps solve common industry pain points such as inconsistent business definitions, chaotic field naming, inconsistent coding rules, difficulty in implementing standards, and lack of quantitative evidence for governance effectiveness.

SubmoduleCore Capabilities
Standards ManagementSupports the construction of standard systems including industry standards, enterprise standards, business standards, field standards, coding standards, etc.
Reference DataBuilds unified reference data resources such as administrative divisions, industry classifications, certificate types, status codes, enumeration values, etc.
General ConfigurationSupports configuration of standard templates, naming rules, coding rules, data type mappings, and standard recognition rules
Standard Implementation AssessmentSupports automatic data standard matching, standard compliance detection, implementation rate statistics, issue tracing, and remediation closed-loop management

5.1 Standards Management

Standards Management is used to build an enterprise-level data standards system and supports full-lifecycle management from standard definition, publication, and reference to change management.

Core capabilities include:

  • Support for multi-level standard systems such as industry standards, enterprise standards, and business standards;
  • Maintenance of standard attributes such as field name, Chinese name, English name, data type, length, precision, value domain, coding rule, and business definition;
  • Standard classification, version, and status management;
  • Standard publishing, retirement, and change traceability;
  • Linkage between standards and data assets, data models, and data quality rules.

5.2 Reference Data

Reference Data is used to consolidate enterprise-wide base codes, enumerations, dictionaries, and value-domain resources. It solves problems such as inconsistent coding and inconsistent meanings across different systems.

Core capabilities include:

  • Maintenance of reference data such as administrative divisions, organizations, industry classifications, certificate types, personnel types, and business status codes;
  • Reference data grouping, version, and status management;
  • Linkage between reference data and field standards, quality rules, and data development tasks;
  • Unified value-domain validation to ensure consistent definitions across business systems and the data platform.

5.3 General Configuration

General Configuration supports rule-based, template-based, and automated standardization governance processes.

Core capabilities include:

  • Standard template configuration;
  • Field naming convention configuration;
  • Data type mapping configuration;
  • Coding rule configuration;
  • Intelligent feature recognition rule configuration;
  • Standard matching rule configuration;
  • Multi-scenario and multi-industry standard adaptation configuration.

5.4 Standard Implementation Assessment

Standard Implementation Assessment measures how effectively data standards are applied to real data assets. It helps enterprises move from “having standards” to “actually implementing standards”.

Core capabilities include:

  • Automatic standard matching for data assets;
  • Compliance scanning for field names, field types, field lengths, field comments, value domains, etc.;
  • Full-domain compliance scanning;
  • Standard implementation rate statistics;
  • Issue list generation;
  • Remediation tracking and execution traceability;
  • Quantitative evaluation of standard implementation effectiveness.

6. Data Quality Management Center

The Data Quality Management Center is based on DAMA standards. It builds a quality rule system around six major dimensions: completeness, consistency, accuracy, timeliness, uniqueness, and conformity. It supports scheduled batch quality checks, real-time streaming quality checks, and user-defined quality rules.

Rule CategoryQuantityExamples
Single-Table Structure Checks9Non-empty table, timestamp field, complete field comments, primary key integrity, duplicate data, referential integrity, last update time compliance, incremental data existence, incremental anomaly
Single-Table Field Content Checks50+Null values, full-width characters, value ranges, field length, date format, mobile phone number, ID card number, passport number, bank card number, military officer ID, email, unified social credit code, administrative division code, vehicle license plate, blood type, VIN code, tax number, etc.
Single-Table Conditional ChecksSeveralBusiness condition combination validation
Multi-Table / Full-Database Structure ChecksSeveralCross-table consistency, full-database conformity
Multi-Table Dynamic ChecksSeveralCross-table dynamic logic validation
Real-Time Data ChecksSeveralReal-time streaming data quality monitoring

Core capabilities include:

  • Quality rule configuration, rule grouping, and rule template management;
  • Offline batch quality validation;
  • Real-time data quality monitoring;
  • Quality task scheduling and exception alerts;
  • Quality report generation;
  • Closed-loop handling of quality issues;
  • Integration with the Data Standards module to automatically generate certain quality rules based on standards.

7. Data Asset Management

SubmoduleCore Capabilities
Asset MarketplaceA “data supermarket” for browsing, searching, and requesting data assets
Data Source Table AssetsAsset cataloging, business classification, lineage tracing, multidimensional evaluation
Metrics SystemAtomic metrics, derived metrics, and composite metrics to build a three-level metrics system
API AssetsAPI browsing, request, and approval
File ManagementDocument storage, upload, and archiving
Intelligent RecognitionOCR recognition, document summarization, keyword extraction for multimodal data such as images, audio, video, and documents

The Data Asset Management Center helps enterprises transform data resources into data assets, turn data assets into services, and convert services into business value. It enables the construction of a unified data asset catalog, data asset marketplace, and asset operation system.


8. Data Sharing Service Center

SubmoduleCore Capabilities
Automatic API GenerationWizard-based one-click conversion of data tables into RESTful APIs
API MarketplaceAPI publishing, registration, version management, and traffic monitoring
Dynamic Data MaskingAutomatic masking during API calls
Approval WorkflowFull lifecycle management: data request → approval → subscription → authorization
Interface MarketplaceAPI publishing/unpublishing management with customizable approval workflows

The Data Sharing Service Center provides governed data assets externally through APIs and an interface marketplace. It supports the full lifecycle of data requests, approvals, authorization, invocation, monitoring, and retirement.


9. Data Security and Compliance

SubmoduleCore Capabilities
Classification and GradingAutomatic sensitive data scanning and data classification, supporting S1–S5 grading
EncryptionSupports Chinese national cryptographic algorithms SM2, SM3, and SM4
Data Masking4 masking algorithms: character masking, SM4 encryption, HASH, and character replacement
Dual-Sandbox Isolation“Data Black Box · Model White Box”: production sandbox data is not visible; development sandbox uses only sample data; models can be published to production with one click
End-to-End LineageFull traceability from source systems to applications
Tamper-Resistant AuditFull operation records with hash-based evidence preservation
ComplianceSupports compliance with laws and regulations such as the Data Security Law and Personal Information Protection Law of China

The Data Security and Compliance module runs through the entire process of data ingestion, development, governance, sharing, and application. It ensures that data is usable but not directly visible, controllable and auditable, traceable and compliant.


10. Visual Data Warehouse Modeling

SubmoduleCore Capabilities
Kimball Dimensional ModelingVisual creation of dimension tables and fact tables
Drag-and-Drop Cube DesignMultidimensional cubes supporting slicing, roll-up, and drill-down
Three-Level Metrics SystemAtomic metrics → derived metrics → composite metrics
Database-Agnostic DesignSupports any compatible database as the data warehouse backend, such as MySQL, Oracle, Doris, Greenplum, Hive, etc.

Visual data warehouse modeling helps enterprises build subject-domain models, dimensional models, fact models, and metrics systems in a low-code way, reducing the complexity of traditional data warehouse modeling.


11. BI Analytics and Visualization

SubmoduleCore Capabilities
Built-In BIIntegrated based on the open-source DataEase project
Visual DashboardsDrag-and-drop report creation with no coding required
Chart TypesBar charts, line charts, pie charts, dashboards, large-screen visualizations
Self-Service AnalyticsBusiness-user-friendly analytics interface

The BI Analytics and Visualization module supports business analysis, operational monitoring, KPI dashboards, and large-screen data visualization. It provides self-service analytics capabilities for business users.


12. AI Intelligence Center

SubmoduleCore Capabilities
Large Model ConfigurationConnects to public cloud LLMs such as Tongyi Qianwen and ERNIE Bot, or privately deployed models
AI AgentData collection agents and data development agents with editable prompt templates
LangChain OrchestrationMulti-tool + LLM collaborative workflows
Planned CapabilitiesAPI extensions, MCP / Model Context Protocol extensions, and Skills plugin mechanism

The AI Intelligence Center provides unified large language model access, agent orchestration, and intelligent assistance capabilities for the platform. It supports intelligent scenarios such as data collection, data development, data quality inspection, data asset cataloging, and knowledge Q&A.


13. Upgrade Path to a Trusted Data Space

SubmoduleCore Capabilities
Zero-Trust ArchitectureConnector management and automatic deployment
Sample EngineDifferential privacy, synthetic data, and format-preserving encryption
Space ManagementIndependent data spaces and compliant cross-space sharing
Blockchain Evidence PreservationTamper-resistant logs + blockchain-based evidence storage

Ottomi Nexus can be further upgraded into a trusted data space foundation, supporting secure data circulation, compliant sharing, and trusted collaboration among multiple parties.


14. Task Scheduling Engine

The Task Scheduling Engine is responsible for unified orchestration, scheduling, execution, and monitoring of tasks within the platform, including data collection, data development, quality checks, standard implementation assessment, data synchronization, and script execution.

SubmoduleCore Capabilities
DolphinScheduler IntegrationProvides distributed task scheduling and supports complex workflow orchestration
Scheduling ConfigurationSupports schedule configuration by second, minute, hour, day, etc.
Dependency OrchestrationSupports complex upstream and downstream workflow dependencies
Monitoring and AlertsSupports runtime log monitoring, task status monitoring, and exception alerts
Parallel Computing EngineUses concepts from SeaTunnel such as host, engine node, and resource group to enable cross-host, multi-node parallel computing
Resource Group SchedulingSupports assigning business tasks to resource groups. The platform automatically schedules all cross-host compute nodes within the resource group for parallel execution

14.1 Distributed Task Scheduling

The platform integrates DolphinScheduler to provide task workflow orchestration, scheduled execution, dependency management, failure retry, backfill execution, and runtime monitoring.

Typical capabilities include:

  • Unified scheduling for data synchronization tasks, ETL tasks, SQL script tasks, Shell / Python / Flink scripts, and other script tasks;
  • Support for upstream and downstream task dependencies;
  • Support for task failure retries;
  • Support for task reruns and backfill;
  • Support for periodic task configuration;
  • Support for task runtime logs and execution status monitoring.

14.2 Parallel Computing Engine

The platform includes a built-in parallel computing engine. It adopts core concepts from SeaTunnel, including host, engine node, and resource group, to provide cross-host and cross-node parallel execution capabilities for data synchronization, data transformation, and batch processing tasks.

Its core execution model is:

Business task → assigned to a resource group → automatically scheduled across all compute nodes in the group for parallel execution

Detailed explanation:

  • Host: A physical machine, virtual machine, or container runtime environment that hosts compute nodes;
  • Engine Node: A compute execution node deployed on different hosts and responsible for actual data processing tasks;
  • Resource Group: A collection of compute resources composed of multiple engine nodes. It can be divided by business domain, task type, environment, or resource specification;
  • Task Assignment: A business task can specify the resource group in which it will run;
  • Automatic Scheduling: After a task is submitted, the platform automatically schedules available compute nodes within the resource group;
  • Parallel Execution: Multiple cross-host compute nodes within the same resource group can process tasks in parallel, improving efficiency for large-scale data synchronization, transformation, and processing;
  • Elastic Scaling: Computing capacity can be expanded by adding hosts and engine nodes;
  • Resource Isolation: Different business tasks can be bound to different resource groups to prevent resource contention.

This capability is suitable for:

  • Large-scale table synchronization;
  • Concurrent extraction of multiple tables;
  • Batch file processing;
  • CDC data consumption and processing;
  • Parallel computing for offline ETL tasks;
  • Cross-system data migration;
  • Compute resource isolation across multiple business domains.

15. Operations and Maintenance Management

SubmoduleCore Capabilities
Hardware MonitoringService status monitoring
Data BackupBackup of configuration databases and configuration files
High AvailabilityPrimary-standby architecture + automatic failover

The Operations and Maintenance Management module ensures stable platform operation. It supports deployment status monitoring, service health checks, configuration backup, failure recovery, and high-availability operation.


Summary

The core product philosophy of Ottomi Nexus can be summarized as follows:

  • “Data Black Box · Model White Box”: The dual-sandbox mechanism keeps data secure and controlled while keeping models transparent and auditable.
  • “Modeling Through Conversation”: The AI Canvas Assistant converts natural language instructions into visual workflows.
  • “All-in-One Package”: The platform can be quickly deployed with a single Docker Compose command.
  • “Standards First, Closed-Loop Governance”: Data standards, quality management, standard implementation assessment, and asset operations form a complete enterprise data governance loop.
  • “Multi-Source Ingestion, Unified Management”: Supports multiple ingestion methods including table extraction, CDC synchronization, API-based collection, and file collection.
  • “Parallel Computing, Elastic Scheduling”: Uses the concepts of hosts, engine nodes, and resource groups to enable cross-host, multi-node parallel execution.
  • “Foundation for Multimodal AI Data”: Provides standardized, asset-oriented, and intelligent processing capabilities for multimodal data such as text, images, audio, video, and documents.
  • “Enterprise-Grade Security and Compliance”: Builds secure and trusted data infrastructure with 6-level permission granularity, 4 data masking algorithms, national cryptographic algorithms, end-to-end auditing, and data classification and grading.

Ottomi Nexus 3.0 integrates data ingestion, data standards, data quality, data development, asset management, sharing services, AI intelligence, and task scheduling into one unified platform. It provides government and enterprise customers with complete capabilities from data resources to data assets, from data governance to AI applications, and from a standalone platform to a trusted data space.