The problem of gathering Big Data

image.png

  • Big data is characterised by four main problems, known as the four Vs.
  • Volume
    • sheer amount of data
  • Variety
    • The many different typesa from multiple sources such as sensor readings, webpage entries, banking. Both structured or unstructured.
  • Velocity
    • The speed of the data processed.
  • Veracity
    • How true is the data.

image.png

Solutions to problems of gathering data

  • Intelligent data processing
    • Pre-processing
    • Meta-data creation
  • Why you need metadata for Big Data Success

https://www.datasciencecentral.com/profiles/blogs/why-you-need-metadata-for-big-data-success

Considering data storage formats and databases

  • Traditionally, data is stored in what is called a Relational database.
  • The SQL standard is commonly used for relational databases, including MySQL, Oracle and PostGreSQL.

    Structured databases

  • Relational databases work by imposing a structure – relations between different data elements – on the data, and utilising this for efficient storage and querying.

    Unstructured data storage

  • A leading unstructured language (referred to as a document store) is called MongoDB and has been gaining a significant following.

    Document stores

  • All of these non-relational databases are grouped under the umbrella of NoSQL(Not Only SQL)

DB Rank

https://db-engines.com/en/ranking

NoSQL databaseimage.png

https://www.infoworld.com/article/3240644/what-is-nosql-databases-for-a-cloud-scale-future.html

  • Among the NoSQL databases, you will find four common models for storing data, which lead to four common types of NoSQL systems:
    • Document databases (e.g. CouchDB, MongoDB).
      • Inserted data is stored in the form of free-form JSON structures or “documents,” where the data could be anything from integers to string s to freeform text. There is no inherent need to specify what fields, if any, a document will contain
    • Key-value stores (e.g. Redis, RIak).
      • Free-form values-from simple integers or strings to complex JSON documents-are accessed in the database by way of keys.
    • WIde column stores(e.g. HBase, Cassandra).
      • Data is stored in columns instead of rows as in a conventional SQL system. Any number of columns (and therefore many different types of data) can be grouped or aggregated as needed for queries or data views.
    • Graph databases (e.g. Neo4j). Data is represneted as a network or graph of entities and their relationships, with each node in the graph a free-form chunk of data.

image.png