Data Ingestion and Collaborative Mapping

Anzo connects to both internal and external data sources –including cloud or on-premise Hadoop based data lakes – to rapidly ingest and catalog large volumes of structured and unstructured data through horizontally scaled, automated Extract, Transform and Load (ETL) processes that can be mapped to establish a Semantic Layer of business meaning.

Able to sustain extremely high parallel load rates, Anzo adds enormous amounts of rich data to Enterprise Knowledge Graphs or tabular targets in just minutes. Anzo ingests most structured data without manual mapping, automatically creating a graph model from the data structure or logical model. This information may be enriched by applying any available metadata data dictionaries or taxonomies. Capabilities that enable collaborative mapping allow business and data analysts to create sophisticated transformations and add business meaning as well as establish relationships linkages during data movement.

Anzo can be used to ingest to non-graph targets – using the Semantic Layer as a business understandable canonical model in a virtual hub-and-spoke ETL to move and transform data between data environments such as data warehouses or Apache Hive. Point-to-point Apache Spark jobs that require no programming are generated automatically from reusable mappings between all the sources, targets and the Semantic Layer.

Unlike other approaches that only flatten data for the benefit of Big Data tools, Anzo ingestion capabilities preserve the multi-dimensional data mode sourced from upstream applications and relational databases. Unstructured data is processed in parallel through configurable text analytics and Natural Language Processing (NLP) pipelines and harmonized with data from multiple sources in the knowledge graph.

Ingested graph data or target tables come to rest in scalable shared ﬁle storage – HDFS, cloud buckets, NFS or Apache Hive. Virtual data sets are another attractive capability for those organizations wary of duplicating data – pulled only on-demand from data sources and ingested directly into memory for analytics, as needed.

Users have access to the full data provenance and lineage of all data in the catalog through a friendly visual interface. Anzo captures extensive metadata describing data sources as a precursor to ingestion and this information guides data preparers and data consumers as additional context in the data catalog or analytics.