The problem of gathering Big Data
- Big data is characterised by four main problems, known as the four Vs.
- Volume
- sheer amount of data
- Variety
- The many different typesa from multiple sources such as sensor readings, webpage entries, banking. Both structured or unstructured.
- Velocity
- The speed of the data processed.
- Veracity
- How true is the data.
Solutions to problems of gathering data
- Intelligent data processing
- Pre-processing
- Meta-data creation
- Why you need metadata for Big Data Success
https://www.datasciencecentral.com/profiles/blogs/why-you-need-metadata-for-big-data-success
Considering data storage formats and databases
- Traditionally, data is stored in what is called a Relational database.
The SQL standard is commonly used for relational databases, including MySQL, Oracle and PostGreSQL.
Structured databases
Relational databases work by imposing a structure – relations between different data elements – on the data, and utilising this for efficient storage and querying.
Unstructured data storage
A leading unstructured language (referred to as a document store) is called MongoDB and has been gaining a significant following.
Document stores
All of these non-relational databases are grouped under the umbrella of NoSQL(Not Only SQL)
DB Rank
https://db-engines.com/en/ranking
NoSQL database
https://www.infoworld.com/article/3240644/what-is-nosql-databases-for-a-cloud-scale-future.html
- Among the NoSQL databases, you will find four common models for storing data, which lead to four common types of NoSQL systems:
- Document databases (e.g. CouchDB, MongoDB).
- Inserted data is stored in the form of free-form JSON structures or “documents,” where the data could be anything from integers to string s to freeform text. There is no inherent need to specify what fields, if any, a document will contain
- Key-value stores (e.g. Redis, RIak).
- Free-form values-from simple integers or strings to complex JSON documents-are accessed in the database by way of keys.
- WIde column stores(e.g. HBase, Cassandra).
- Data is stored in columns instead of rows as in a conventional SQL system. Any number of columns (and therefore many different types of data) can be grouped or aggregated as needed for queries or data views.
- Graph databases (e.g. Neo4j). Data is represneted as a network or graph of entities and their relationships, with each node in the graph a free-form chunk of data.
- Document databases (e.g. CouchDB, MongoDB).