Databases

The concept of a database has never really changed, however, the way in which data is stored and retrieved has seen some major changes over the years. We now have two main types of databases ‘SQL Relational Databases’ and ‘No SQL Databases.’ The difference between these two databases are complex from their infrastructure to the functionality, however, they can be summarised by the way in which data is stored hence the two terms SQL Relational Database and No SQL Database (also referred to as ‘Not Only SQL’).

The use of the SQL language is consistent across all relational databases (Oracle, MySQL, MsSQL etc…) this tutorial assumes that you are using Oracle 11g Express (free to download). However, although SQL can be used with all SQL databases, you need to be aware that each database has there own little ‘special way of doing it.’ For example, one database provides a data type of Integer where another offers Number or one database may use quotes and the other apostrophes. Having said this all SQL databases will use the same keywords and the SQL statement structure. Also, it is important to recognise that SQL commands can be divided into groups:

  • Data Manipulation Language (DML) is about adding, updating, modifying and deleting data.
  • Data Definition Language (DDL) is about the creating and altering the structure of the database
  • Data Control Language (DCL) is about controlling access to the data and database components
  • Transaction Control Language (DTL) is about controlling transaction within the database

Understanding the concept of a data store in the modern world is difficult especially given the complexity of infrastructure and functionality in relation to the intended purpose of the data store. I use the following term to help me remember the major factors in understanding the infrastructure and functionality and data stores, which I very briefly explain below.

D = BASE ACID Star Cap

 

Distribution

  • Homogeneous vs heterogeneous
  • Immediate consistency and accuracy vs Scalability and Resilience
  • The 3V’s: Data Volume, Velocity, and Variety
  • Data Security vs Time to Consistency
  • Global Transaction Manager (node assumes temporarily control)
  • Aggregate Stores
    • Document, Column Family, and Key-Value

Consistency Models: BASE vs ACID

  • BASE: Basic availability, soft state, eventual consistency
  • ACID: Atomic, consistency, isolated durable

Consistency models, in simple terms, are about the speed at which data is stored and acceptable. If we are using an ACID consistency model (SQL Relational Database) the data is immediately consistent across the data store and available to all. Whereas using a BASE consistency model (No SQL Database) the data will take time to filter across all the nodes which means that data is not immediately available to all and as such the consistency is not guaranteed.

Scalability and Participation Tolerance

  • Horizontal (> in Nodes)
  • Vertical (> in Capacity)
  • Elasticity (> and < scalability)

Transparency

  • Data replication
  • Fragmentation
    • Horizontal (subset of rows)
    • Vertical (subset of attributes)

Availability and Reliability

  • Acces to database
  • Database working

CAP Theorem (triangle)

Consistency, Availability and Partition Tolerance