🌙

面向工业物联网应用数据湖的自动化元数据生成方法

An Automated Metadata Generation Method for Data Lake of Industrial WoT Applications

IEEE Transactions on Systems, Man, and Cybernetics: Systems · 2021
被引 18
ABS 3

中文导读

提出一种自底向上的自动化元数据生成方法,通过数据驱动框架将原始数据标注为链接数据,并利用自组织映射在线聚类和基于短文本的实体发现技术,为工业物联网数据湖构建富含语义的全维度元数据。

Abstract

Recent trends in the Web of Things (WoT) have led to data explosion. Data lake (DL), as a flexible on-demand heterogeneous data management architecture, has become a feasible solution in data management. Metadata modeling for DLs is the key basis for smart analysis and processing. However, the varieties in structures and semantics of industrial WoT data hinder metadata modeling and maintenance. Moreover, the lack of textual descriptions and the semantics hidden in value streams make it hard to automatically construct semantic metadata. The dynamic nature of WoT requires on-time evolution on metadata. To overcome these challenges, we propose an automated bottom-up metadata generation approach for DL of WoT applications. Applying a data-driven framework, raw data are notated as linked data and self-organizing map-based online clustering is applied to real timely extract data characteristics. To recognize entities, concepts and relations, semantics-based entity discovery approach from short texts is proposed according to the feature of WoT data. The numerical analysis is performed to find the hidden relations from raw values. Full-dimensional metadata with rich semantic knowledge are finally built. Experiments on a real-world dataset are conducted to verify the effectiveness of methods and a case study on an energy WoT system is provided to demonstrate the feasibility of the approach.

物联网数据湖元数据管理语义网工业应用