1.1 Big Data

浏览 105 扫码分享 2023-11-25 14:30:46

The problem of gathering Big Data
Solutions to problems of gathering data
Considering data storage formats and databases

The problem of gathering Big Data

Big data is characterised by four main problems, known as the four Vs.
Volume
- sheer amount of data
Variety
- The many different typesa from multiple sources such as sensor readings, webpage entries, banking. Both structured or unstructured.
Velocity
- The speed of the data processed.
Veracity
- How true is the data.

Solutions to problems of gathering data

Intelligent data processing
- Pre-processing
- Meta-data creation
Why you need metadata for Big Data Success

https://www.datasciencecentral.com/profiles/blogs/why-you-need-metadata-for-big-data-success

Considering data storage formats and databases

Traditionally, data is stored in what is called a Relational database.
The SQL standard is commonly used for relational databases, including MySQL, Oracle and PostGreSQL.

Structured databases
Relational databases work by imposing a structure – relations between different data elements – on the data, and utilising this for efficient storage and querying.

Unstructured data storage
A leading unstructured language (referred to as a document store) is called MongoDB and has been gaining a significant following.

Document stores
All of these non-relational databases are grouped under the umbrella of NoSQL(Not Only SQL)

DB Rank

https://db-engines.com/en/ranking

NoSQL database

https://www.infoworld.com/article/3240644/what-is-nosql-databases-for-a-cloud-scale-future.html

Among the NoSQL databases, you will find four common models for storing data, which lead to four common types of NoSQL systems:
- Document databases (e.g. CouchDB, MongoDB).
  - Inserted data is stored in the form of free-form JSON structures or “documents,” where the data could be anything from integers to string s to freeform text. There is no inherent need to specify what fields, if any, a document will contain
- Key-value stores (e.g. Redis, RIak).
  - Free-form values-from simple integers or strings to complex JSON documents-are accessed in the database by way of keys.
- WIde column stores(e.g. HBase, Cassandra).
  - Data is stored in columns instead of rows as in a conventional SQL system. Any number of columns (and therefore many different types of data) can be grouped or aggregated as needed for queries or data views.
- Graph databases (e.g. Neo4j). Data is represneted as a network or graph of entities and their relationships, with each node in the graph a free-form chunk of data.

若有收获，就点个赞吧

上一篇:

下一篇:

让时间为你证明

展开/收起文章目录