Ottomi Multimodal AI Data Platform (Ottomi-Nexus 3.0) — Capability Overview
Ottomi Nexus is an end-to-end data processing platform built around the DataOps methodology. It is delivered as an all-in-one software package and deployed in containers. Designed for government and enterprise scenarios, it supports data governance, data development, data asset operations, multimodal AI data management, and trusted data space construction.
The platform provides integrated capabilities across the full data lifecycle, including data ingestion, standardization and governance, data quality management, data development and modeling, task scheduling, data asset services, and AI-powered applications.
The functional modules are organized into the following major areas:
1. Management Center
| Submodule | Core Capabilities |
|---|---|
| Account Management | User/member management, organizational unit management, role management |
| Permission System | RBAC + ABAC, with 6-level access granularity: System → Project → Data Source → Table → Row → Column |
| Row-Level Permission | Fine-grained row-level data access control |
| Log Management | Full-operation audit trail and tamper-resistant logs |
| AI Assistant Configuration | Large language model integration configuration and API key management |
| System Configuration | Notification channel configuration, supporting in-platform messages, email, WeCom / Enterprise WeChat, etc. |
2. Business Planning
| Submodule | Core Capabilities |
|---|---|
| Data Layering Design | ODS raw data layer → DWD standardized detail layer → ADS application metric layer |
| Business Domains and Subject Areas | Business domain creation and subject area definition to support enterprise-level data architecture planning |
| Project Workspace Management | Project creation, compute source management, member management |
| Dual-Sandbox Architecture | Strong isolation between development sandbox and production sandbox, or an integrated single-sandbox mode |
| Three Deployment Modes | Standard dual-sandbox deployment for large enterprise groups, flexible hybrid deployment for mid-sized organizations, and lightweight all-in-one deployment for smaller scenarios |
3. Data Ingestion Engine
The Data Ingestion Engine is designed for enterprise scenarios involving multi-source, heterogeneous data collection. It supports multiple ingestion methods, including databases, CDC, APIs, and files. It enables unified management and consolidation of structured, semi-structured, and multimodal data resources.
| Submodule | Core Capabilities |
|---|---|
| Source Database Management | Register heterogeneous data sources. Supports 40+ data sources including MySQL, PostgreSQL, Oracle, DB2, SQL Server, Dameng, Kingbase, OceanBase, TiDB, ArgoDB, Greenplum, ClickHouse, Doris, StarRocks, GBase, Hive, etc. |
| Database Type Management | Extensible through JDBC drivers, supporting custom enterprise data source integration |
| Table Extraction | Supports table-level data extraction, including full synchronization, incremental synchronization, and differential update synchronization |
| CDC Synchronization | Supports millisecond-level change data capture for MySQL CDC, Oracle CDC, PostgreSQL CDC, SQL Server CDC, MongoDB CDC, etc. |
| API-Based Collection | Automatically generates API collection tasks through UI configuration and retrieves data from source APIs. Supports HTTP methods such as GET, POST, PATCH, etc. |
| API Parameter Configuration | Supports URL parameters, body parameters, and request headers. Supports parameter transformation, including no transformation and Java script-based transformation |
| Body Parameter Formats | Supports multiple body formats including form-data, application/json, text/plain, etc. |
| Parameter Sources | API collection parameters can come from custom configuration or database table configuration, enabling dynamic parameterized collection |
| File Collection | Supports importing data files in various formats including CSV, TXT, XLSX, LSX, etc.; planned support for JSON, XML, and ORC file uploads |
| Sample Rules & Sample Engine | 5 sample generation strategies: binding sample rules, expression-based calculation, external table value-domain generation, base type generation, and generation based on original table data. Provides a three-layer rule system: basic rules, business rules, and special rules. Supports privacy-preserving computation transformation, allowing sample data to participate in computation |
| Resource Explorer | Data source browsing, table schema viewing, DDL copying, and data querying |
| Metadata Management | Automatic cataloging and asset publishing/unpublishing |
| AI-Powered Auto Cataloging | AI-assisted automatic cataloging and aggregation of source-side data assets |
3.1 Data Collection Capability Details
3.1.1 Table Extraction
Table extraction is designed for synchronizing data from traditional business system databases. It supports synchronizing table data from source databases to designated target systems within the platform. Typical use cases include historical data initialization, periodic data collection, and business system data consolidation.
Supported methods include:
- Full Synchronization: Extracts all data from the source table at once. Suitable for initial loading and historical data migration.
- Incremental Synchronization: Synchronizes only newly added or changed data based on incremental identifiers such as time fields, primary keys, or version numbers.
- Differential Update Synchronization: Compares source and target data, identifies differences, and performs insert or update operations accordingly.
3.1.2 CDC Synchronization
CDC synchronization captures change logs from source databases to enable low-latency data change synchronization. It is suitable for real-time data warehouses, real-time metrics, asynchronous decoupling of business systems, and real-time ingestion into data lakes.
Supported capabilities include:
- Millisecond-level data synchronization latency;
- Capture of insert, update, and delete events;
- Support for MySQL CDC, Oracle CDC, PostgreSQL CDC, SQL Server CDC, and MongoDB CDC;
- Integration with real-time computing, real-time data quality validation, and real-time data services.
3.1.3 API-Based Collection
API-based collection is a tool that automatically generates API data collection tasks through visual configuration and retrieves data from source system APIs. Users can collect data from third-party APIs without writing complex code.
Supported capabilities include:
- Support for HTTP methods such as GET, POST, PATCH, etc.;
- Support for URL parameters, body parameters, and request header configuration;
- Support for parameter transformation, including no transformation and Java script-based transformation;
- Support for body parameter formats such as
form-data,application/json, andtext/plain; - Parameters can come from custom configuration or database table configuration;
- Applicable to data collection from SaaS systems, business systems, government service interfaces, third-party open platforms, and similar scenarios.
3.1.4 File Collection
File collection is designed for batch file import and external file resource management. It supports importing local or remote files into the platform and converting them into processable data resources.
Supported formats include:
- Currently supported: CSV, TXT, XLSX, LSX, etc.;
- Planned support: JSON, XML, ORC, and other file upload resources;
- Can be integrated with data standards, quality checks, data development, asset cataloging, and other modules to standardize and govern file-based data.
4. Development Center · Data Development
Core capabilities: visual drag-and-drop ETL canvas + AI assistant for “modeling through conversation”.
4.1 Development Component Library: 9 Categories, 95+ Components
| Category | Quantity | Representative Components |
|---|---|---|
| Real-Time Input | 7 | Kafka, MySQL CDC, Oracle CDC, SQL Server CDC, MongoDB CDC, PostgreSQL CDC, EventStore |
| Real-Time Output | 3 | Single-table output, StarRocks output, Kafka output |
| Offline Input | 14 | Single table, API, MongoDB, StarRocks, Excel, CSV, XML, Text, S3, JSON, logical table, FTP, SFTP, RabbitMQ |
| Offline Output | 9 | Text, Excel, CSV, XML, JSON, ORC, S3, FTP, SFTP |
| Data Transformation, Common for Real-Time and Offline | 19 | Outlier detection, unique ID generation, column-to-row conversion, NULL replacement, data filtering, value replacement, string trim / case conversion / split / concatenation / slicing, field filtering, field name mapping, advanced Java transformation, JsonPath extraction, function calculation, data encryption/decryption, data masking |
| Offline Scripts | 11 | Script management, SQL, Shell, Python, Flink, MR, FlinkSQL, HQL, DataX, Sqoop, Flink JAR |
| Offline Data Operations | 3 | Aggregation, deduplication, sorting |
| Offline Multi-Table Synchronization | 1 | Batch synchronization of multiple tables |
| Offline Data Fusion | 1 | Table merging |
4.2 Built-In Function Library: 84+ Functions
| Category | Quantity | Examples |
|---|---|---|
| Numeric Functions | 27 | ABS, CEIL, FLOOR, ROUND, MOD, SQRT, EXP, LN, LOG, POWER, RAND, etc. |
| String Functions | 28 | CONCAT, SUBSTR, TRIM, REPLACE, REGEXP_LIKE, REGEXP_REPLACE, LEFT, RIGHT, LPAD, RPAD, etc. |
| Date and Time Functions | Several | Date formatting, date calculation, time difference calculation, etc. |
| System Functions | Several | System variables, environment information, etc. |
4.3 AI Canvas Assistant
- An intelligent assistant chat panel embedded in the visual modeling canvas;
- Natural language description → automatic interpretation → automatic data source selection, operator placement, parameter configuration, and workflow connection on the canvas;
- Supports AI model integration, including cloud-based models and locally deployed private models;
- Covers AI-assisted scenarios such as data collection, data development, and data quality inspection;
- Helps data engineers, data analysts, and business users lower the barrier to data development.
5. Data Standards Management
The Data Standards module is divided into four major functional areas: Standards Management, Reference Data, General Configuration, and Standard Implementation Assessment.
It covers the complete process of industry standard hierarchy construction, full-lifecycle management of business data standards, standard resource accumulation, standardized template configuration, intelligent feature recognition, automatic data standard matching, full-domain compliance scanning, implementation effectiveness evaluation, and execution traceability.
This module supports traditional structured data standardization and governance for government and enterprise users. It is also designed to support standardized management of multimodal AI data such as text, images, audio, and video. The goal is to provide clean, unified, consistent, and compliant high-quality foundational data for automated modeling.
It helps solve common industry pain points such as inconsistent business definitions, chaotic field naming, inconsistent coding rules, difficulty in implementing standards, and lack of quantitative evidence for governance effectiveness.
| Submodule | Core Capabilities |
|---|---|
| Standards Management | Supports the construction of standard systems including industry standards, enterprise standards, business standards, field standards, coding standards, etc. |
| Reference Data | Builds unified reference data resources such as administrative divisions, industry classifications, certificate types, status codes, enumeration values, etc. |
| General Configuration | Supports configuration of standard templates, naming rules, coding rules, data type mappings, and standard recognition rules |
| Standard Implementation Assessment | Supports automatic data standard matching, standard compliance detection, implementation rate statistics, issue tracing, and remediation closed-loop management |
5.1 Standards Management
Standards Management is used to build an enterprise-level data standards system and supports full-lifecycle management from standard definition, publication, and reference to change management.
Core capabilities include:
- Support for multi-level standard systems such as industry standards, enterprise standards, and business standards;
- Maintenance of standard attributes such as field name, Chinese name, English name, data type, length, precision, value domain, coding rule, and business definition;
- Standard classification, version, and status management;
- Standard publishing, retirement, and change traceability;
- Linkage between standards and data assets, data models, and data quality rules.
5.2 Reference Data
Reference Data is used to consolidate enterprise-wide base codes, enumerations, dictionaries, and value-domain resources. It solves problems such as inconsistent coding and inconsistent meanings across different systems.
Core capabilities include:
- Maintenance of reference data such as administrative divisions, organizations, industry classifications, certificate types, personnel types, and business status codes;
- Reference data grouping, version, and status management;
- Linkage between reference data and field standards, quality rules, and data development tasks;
- Unified value-domain validation to ensure consistent definitions across business systems and the data platform.
5.3 General Configuration
General Configuration supports rule-based, template-based, and automated standardization governance processes.
Core capabilities include:
- Standard template configuration;
- Field naming convention configuration;
- Data type mapping configuration;
- Coding rule configuration;
- Intelligent feature recognition rule configuration;
- Standard matching rule configuration;
- Multi-scenario and multi-industry standard adaptation configuration.
5.4 Standard Implementation Assessment
Standard Implementation Assessment measures how effectively data standards are applied to real data assets. It helps enterprises move from “having standards” to “actually implementing standards”.
Core capabilities include:
- Automatic standard matching for data assets;
- Compliance scanning for field names, field types, field lengths, field comments, value domains, etc.;
- Full-domain compliance scanning;
- Standard implementation rate statistics;
- Issue list generation;
- Remediation tracking and execution traceability;
- Quantitative evaluation of standard implementation effectiveness.
6. Data Quality Management Center
The Data Quality Management Center is based on DAMA standards. It builds a quality rule system around six major dimensions: completeness, consistency, accuracy, timeliness, uniqueness, and conformity. It supports scheduled batch quality checks, real-time streaming quality checks, and user-defined quality rules.
| Rule Category | Quantity | Examples |
|---|---|---|
| Single-Table Structure Checks | 9 | Non-empty table, timestamp field, complete field comments, primary key integrity, duplicate data, referential integrity, last update time compliance, incremental data existence, incremental anomaly |
| Single-Table Field Content Checks | 50+ | Null values, full-width characters, value ranges, field length, date format, mobile phone number, ID card number, passport number, bank card number, military officer ID, email, unified social credit code, administrative division code, vehicle license plate, blood type, VIN code, tax number, etc. |
| Single-Table Conditional Checks | Several | Business condition combination validation |
| Multi-Table / Full-Database Structure Checks | Several | Cross-table consistency, full-database conformity |
| Multi-Table Dynamic Checks | Several | Cross-table dynamic logic validation |
| Real-Time Data Checks | Several | Real-time streaming data quality monitoring |
Core capabilities include:
- Quality rule configuration, rule grouping, and rule template management;
- Offline batch quality validation;
- Real-time data quality monitoring;
- Quality task scheduling and exception alerts;
- Quality report generation;
- Closed-loop handling of quality issues;
- Integration with the Data Standards module to automatically generate certain quality rules based on standards.
7. Data Asset Management
| Submodule | Core Capabilities |
|---|---|
| Asset Marketplace | A “data supermarket” for browsing, searching, and requesting data assets |
| Data Source Table Assets | Asset cataloging, business classification, lineage tracing, multidimensional evaluation |
| Metrics System | Atomic metrics, derived metrics, and composite metrics to build a three-level metrics system |
| API Assets | API browsing, request, and approval |
| File Management | Document storage, upload, and archiving |
| Intelligent Recognition | OCR recognition, document summarization, keyword extraction for multimodal data such as images, audio, video, and documents |
The Data Asset Management Center helps enterprises transform data resources into data assets, turn data assets into services, and convert services into business value. It enables the construction of a unified data asset catalog, data asset marketplace, and asset operation system.
8. Data Sharing Service Center
| Submodule | Core Capabilities |
|---|---|
| Automatic API Generation | Wizard-based one-click conversion of data tables into RESTful APIs |
| API Marketplace | API publishing, registration, version management, and traffic monitoring |
| Dynamic Data Masking | Automatic masking during API calls |
| Approval Workflow | Full lifecycle management: data request → approval → subscription → authorization |
| Interface Marketplace | API publishing/unpublishing management with customizable approval workflows |
The Data Sharing Service Center provides governed data assets externally through APIs and an interface marketplace. It supports the full lifecycle of data requests, approvals, authorization, invocation, monitoring, and retirement.
9. Data Security and Compliance
| Submodule | Core Capabilities |
|---|---|
| Classification and Grading | Automatic sensitive data scanning and data classification, supporting S1–S5 grading |
| Encryption | Supports Chinese national cryptographic algorithms SM2, SM3, and SM4 |
| Data Masking | 4 masking algorithms: character masking, SM4 encryption, HASH, and character replacement |
| Dual-Sandbox Isolation | “Data Black Box · Model White Box”: production sandbox data is not visible; development sandbox uses only sample data; models can be published to production with one click |
| End-to-End Lineage | Full traceability from source systems to applications |
| Tamper-Resistant Audit | Full operation records with hash-based evidence preservation |
| Compliance | Supports compliance with laws and regulations such as the Data Security Law and Personal Information Protection Law of China |
The Data Security and Compliance module runs through the entire process of data ingestion, development, governance, sharing, and application. It ensures that data is usable but not directly visible, controllable and auditable, traceable and compliant.
10. Visual Data Warehouse Modeling
| Submodule | Core Capabilities |
|---|---|
| Kimball Dimensional Modeling | Visual creation of dimension tables and fact tables |
| Drag-and-Drop Cube Design | Multidimensional cubes supporting slicing, roll-up, and drill-down |
| Three-Level Metrics System | Atomic metrics → derived metrics → composite metrics |
| Database-Agnostic Design | Supports any compatible database as the data warehouse backend, such as MySQL, Oracle, Doris, Greenplum, Hive, etc. |
Visual data warehouse modeling helps enterprises build subject-domain models, dimensional models, fact models, and metrics systems in a low-code way, reducing the complexity of traditional data warehouse modeling.
11. BI Analytics and Visualization
| Submodule | Core Capabilities |
|---|---|
| Built-In BI | Integrated based on the open-source DataEase project |
| Visual Dashboards | Drag-and-drop report creation with no coding required |
| Chart Types | Bar charts, line charts, pie charts, dashboards, large-screen visualizations |
| Self-Service Analytics | Business-user-friendly analytics interface |
The BI Analytics and Visualization module supports business analysis, operational monitoring, KPI dashboards, and large-screen data visualization. It provides self-service analytics capabilities for business users.
12. AI Intelligence Center
| Submodule | Core Capabilities |
|---|---|
| Large Model Configuration | Connects to public cloud LLMs such as Tongyi Qianwen and ERNIE Bot, or privately deployed models |
| AI Agent | Data collection agents and data development agents with editable prompt templates |
| LangChain Orchestration | Multi-tool + LLM collaborative workflows |
| Planned Capabilities | API extensions, MCP / Model Context Protocol extensions, and Skills plugin mechanism |
The AI Intelligence Center provides unified large language model access, agent orchestration, and intelligent assistance capabilities for the platform. It supports intelligent scenarios such as data collection, data development, data quality inspection, data asset cataloging, and knowledge Q&A.
13. Upgrade Path to a Trusted Data Space
| Submodule | Core Capabilities |
|---|---|
| Zero-Trust Architecture | Connector management and automatic deployment |
| Sample Engine | Differential privacy, synthetic data, and format-preserving encryption |
| Space Management | Independent data spaces and compliant cross-space sharing |
| Blockchain Evidence Preservation | Tamper-resistant logs + blockchain-based evidence storage |
Ottomi Nexus can be further upgraded into a trusted data space foundation, supporting secure data circulation, compliant sharing, and trusted collaboration among multiple parties.
14. Task Scheduling Engine
The Task Scheduling Engine is responsible for unified orchestration, scheduling, execution, and monitoring of tasks within the platform, including data collection, data development, quality checks, standard implementation assessment, data synchronization, and script execution.
| Submodule | Core Capabilities |
|---|---|
| DolphinScheduler Integration | Provides distributed task scheduling and supports complex workflow orchestration |
| Scheduling Configuration | Supports schedule configuration by second, minute, hour, day, etc. |
| Dependency Orchestration | Supports complex upstream and downstream workflow dependencies |
| Monitoring and Alerts | Supports runtime log monitoring, task status monitoring, and exception alerts |
| Parallel Computing Engine | Uses concepts from SeaTunnel such as host, engine node, and resource group to enable cross-host, multi-node parallel computing |
| Resource Group Scheduling | Supports assigning business tasks to resource groups. The platform automatically schedules all cross-host compute nodes within the resource group for parallel execution |
14.1 Distributed Task Scheduling
The platform integrates DolphinScheduler to provide task workflow orchestration, scheduled execution, dependency management, failure retry, backfill execution, and runtime monitoring.
Typical capabilities include:
- Unified scheduling for data synchronization tasks, ETL tasks, SQL script tasks, Shell / Python / Flink scripts, and other script tasks;
- Support for upstream and downstream task dependencies;
- Support for task failure retries;
- Support for task reruns and backfill;
- Support for periodic task configuration;
- Support for task runtime logs and execution status monitoring.
14.2 Parallel Computing Engine
The platform includes a built-in parallel computing engine. It adopts core concepts from SeaTunnel, including host, engine node, and resource group, to provide cross-host and cross-node parallel execution capabilities for data synchronization, data transformation, and batch processing tasks.
Its core execution model is:
Business task → assigned to a resource group → automatically scheduled across all compute nodes in the group for parallel execution
Detailed explanation:
- Host: A physical machine, virtual machine, or container runtime environment that hosts compute nodes;
- Engine Node: A compute execution node deployed on different hosts and responsible for actual data processing tasks;
- Resource Group: A collection of compute resources composed of multiple engine nodes. It can be divided by business domain, task type, environment, or resource specification;
- Task Assignment: A business task can specify the resource group in which it will run;
- Automatic Scheduling: After a task is submitted, the platform automatically schedules available compute nodes within the resource group;
- Parallel Execution: Multiple cross-host compute nodes within the same resource group can process tasks in parallel, improving efficiency for large-scale data synchronization, transformation, and processing;
- Elastic Scaling: Computing capacity can be expanded by adding hosts and engine nodes;
- Resource Isolation: Different business tasks can be bound to different resource groups to prevent resource contention.
This capability is suitable for:
- Large-scale table synchronization;
- Concurrent extraction of multiple tables;
- Batch file processing;
- CDC data consumption and processing;
- Parallel computing for offline ETL tasks;
- Cross-system data migration;
- Compute resource isolation across multiple business domains.
15. Operations and Maintenance Management
| Submodule | Core Capabilities |
|---|---|
| Hardware Monitoring | Service status monitoring |
| Data Backup | Backup of configuration databases and configuration files |
| High Availability | Primary-standby architecture + automatic failover |
The Operations and Maintenance Management module ensures stable platform operation. It supports deployment status monitoring, service health checks, configuration backup, failure recovery, and high-availability operation.
Summary
The core product philosophy of Ottomi Nexus can be summarized as follows:
- “Data Black Box · Model White Box”: The dual-sandbox mechanism keeps data secure and controlled while keeping models transparent and auditable.
- “Modeling Through Conversation”: The AI Canvas Assistant converts natural language instructions into visual workflows.
- “All-in-One Package”: The platform can be quickly deployed with a single Docker Compose command.
- “Standards First, Closed-Loop Governance”: Data standards, quality management, standard implementation assessment, and asset operations form a complete enterprise data governance loop.
- “Multi-Source Ingestion, Unified Management”: Supports multiple ingestion methods including table extraction, CDC synchronization, API-based collection, and file collection.
- “Parallel Computing, Elastic Scheduling”: Uses the concepts of hosts, engine nodes, and resource groups to enable cross-host, multi-node parallel execution.
- “Foundation for Multimodal AI Data”: Provides standardized, asset-oriented, and intelligent processing capabilities for multimodal data such as text, images, audio, video, and documents.
- “Enterprise-Grade Security and Compliance”: Builds secure and trusted data infrastructure with 6-level permission granularity, 4 data masking algorithms, national cryptographic algorithms, end-to-end auditing, and data classification and grading.
Ottomi Nexus 3.0 integrates data ingestion, data standards, data quality, data development, asset management, sharing services, AI intelligence, and task scheduling into one unified platform. It provides government and enterprise customers with complete capabilities from data resources to data assets, from data governance to AI applications, and from a standalone platform to a trusted data space.