The conventional focus of data mining was on mining resident data stored in large data repositories. 数据挖掘的传统重点是挖掘存储在大型数据仓库中的常驻数据。However, the growth of technologies such as wireless sensor networks have contributed to the emergence of data streams. Examples of such streams of data:

  • NASA’s Earth Observing System Data and Information System (EOSDIS) adds about 6.4 TB of data to its archives and distributes almost 28 TB worth of data to an average of 11,000 unique users around the world every day.

Characteristics of Data Streams数据流特点:

  • Huge volumes of continuous data, potentially infinite 巨大的连续数据量,可能是无限的
  • Fast changing and requires fast, real-time response 快速变化,需要快速、实时的响应
  • Data stream methods can also apply to massive non-streaming data 数据流方法也可以应用于海量的非流数据
  • Random access is expensive—need single scan algorithm (can only have one look at each record!)随机访问非常昂贵—需要单一扫描算法(每条记录只能查看一次!)
  • Store only the summary of the data seen thus far 仅存储迄今为止所见数据的摘要
  • Most stream data are at pretty low-level or multi-dimensional in nature, needs multi-level (ML) and multi-dimensional (MD) processing 大多数流数据本质上是非常低级或多维的,需要多级和多维的处理

    Examples of Data Streams:

  • Telecommunication calling records

  • Business: credit card transaction flows
  • Network monitoring and traffic engineering
  • Financial market: stock exchange
  • Engineering & industrial processes: power supply and manufacturing
  • Sensor, monitoring & surveillance: video streams, RFIDs
  • Security monitoring
  • Web logs and Web page click streams
  • Internet of Things

Difference between Streaming and Traditional Processing

image.png