History of Database
In 1970, a journal named Communications of ACM published a paper named ’A Relational Model of Data for Large Shared Data Banks’ by Dr. E. F. Codd. The concept of a Relational Database was given a theoretical and mathematical foundation in this study. A relational database is a type of information system that displays information in the form of rows in tables with one or more columns.
Dr. Codd’s study provided information on relational database aspects such as attributes, relations domains, and a data storage system with three properties.
- Logical Data Independence – Changes to one attribute have no impact on the other columns.
- Referential and Data Integrity – Primary Key(PK), Foreign Key(FK) etc.
- Ad Hoc Query – Data Retrieval.
However, there were several drawbacks to using these database systems. Data retrieval was impossible for the non-programmer unless a specific program was built. To complete simple tasks, numerous programs had to be modified.
Relational databases were developed in the 1980s, allowing users to analyze data through interactive queries and reports.
RDBMS (Relational Database Management Systems) is software that manages relational databases and is provided by vendors (for example, Oracle).
SQL is a non-procedural language that allows users to specify how they want to access and alter data. In the later stages of this dbms tutorial, we’ll learn more concepts about database management.
Logical Data Model
The core of any database application is the logical data model. It is a representation of data elements and their relationships. One of the most prevalent ways for constructing a logical data model is the entity-relationship model.
Logical Data Model.
Relational Database Theory
Some basic concepts of Relational Database Theory:
- Every entity has a set of attributes that describes the same. Eg. An entity named Topic would describe various topics covered in a subject.
- Each instance of an attribute is called a row. Eg. A row in the Topic entity would describe the specific topic in a course.
- Some of the entity’s attributes uniquely define a row. That’s the Primary Key.
Eg. Topic ID in topic entity would uniquely define a specific topic.
- Entities are linked to each other.
- A FK is an attribute in one entity whose values must exist as Primary Key in another entity.
An entity is implemented as a database table. Attributes of the same are implemented as Columns and a single set of column values as Row(a.k.a Record).
Order of Rows
There is no way to order the table. The order in which rows are obtained can only be specified by a programmer. This is one of the Relational Database’s power packs since it allows you to examine tables abstractly rather than physically.
Order of Columns
Columns, like rows, do not have any sorting. The columns are returned by SQL Query in the same order as they were when the table was created. They can, however, be stated in order. It is also possible to change the columns without impacting the others.
Each entity contains a collection of properties that define a row uniquely, as explained in the previous section. This helps in removing duplicate data from the database, implying that each table has a Primary Key. The term “data integrity” describes this concept. Eg: Each employee in a company has a distinct ID.
The Primary Key, as defined, helps in the unique identification of an entity instance. Students can be uniquely defined by Student ID, Course may be uniquely defined by CourseID, and Department may be uniquely defined by DepartmentID.
Can a primary key be Null?
There is a rigorous restriction in Relational Theory that no portion of PK can be null. This is self-evident since PK uniquely identifies a row; if it is Null, it cannot identify anything. Eg. a Course without a CourseID cannot be identified.
Foreign keys can be used to define relationships between tables. A foreign key can be one or more columns for which the PK of the second table contains a set of possible values. When values in a foreign key column are restricted to the referencing PK, referential integrity is established.
Association between two entities is defined using relationships. Generally in a relationship, entities are defined as Parent and Child(one as Parent and the other as Child).
Characteristics of relationship
- Identifying(IR) or Non-Identifying(NIR) : IR is the one wherein PK of parent forms a part of PK of the child. In NIR no such part exists.
- Cardinality : Cardinality is defined as the number of rows in the child entity that must be present for a single row in the parent entity to exist. Cardinality can be 0, 1, many. Cardinality may be mandatory.
- When a row in the parent entity must have a certain number of rows in the child entity, the relationship is considered mandatory. A relationship can be treated as optional when a row in the parent entity can exist without a linked row in the child entity.
- Integrity Rules define the action to be executed when a row in a parent/child entity is changed/modified in some way.
Null Values are the primary distinction between relational and non-relational database technologies. The term Null denotes that a column’s value is either not applicable or unassigned.
A relational database allows the user to fix the column value equal to null or perform a test if the column is null.
Normalization theory is basically the study of tables, columns, and the dependency of the same.
Following are the normalization objectives
- Minimize Data Redundancy.
- Avoid Anomalies.
- Reduction in data inconsistency.
- Designing data structures for easier maintenance.
Normalization theory specifies the expected arrangement of tables(relations) and columns(Attributes) as Normal Forms. There are 3 types of Normal Forms namely 1NF, 2NF, 3NF. Other NFS i,e. Boyce Codd and NFs are used for resolving more difficult normalization challenges.
Normalization Rule 1
A table is said to be in First Normal Form(1NF) if all of its attributes are atomic i,e. each attribute should consist of a single fact about the entity. eg. When recording a piece of information about a student, he/she might have given 1 or more contact numbers which will be stored in a single field. For a table to be in 1NF, this contact field should be changed accordingly to consist of only one entry(contact number in this case).
First Normal Form.
Normalization Rule 2
For an entity to be in Second NF(2NF), two conditions should be satisfied namely, the table must be in 1NF and all columns must depend on PK.
Second Normal Form.
Data redundancy is eliminated using 2NF.
Normalization Rule 3
To be in Third Normal Form(3NF), columns of a table must be entirely dependent upon the primary key. Entirely is the keyword here to be focused upon.
For example: Consider a table named course which consists of three columns namely CourseID, Department ID, and DeptChairPerson, Now notice that CourseID and DepartmentID can be used to define a course uniquely(both the columns are needed) unlike DeptChairPerson which can be defined only using DepartmentID. So it depends on part of the primary key not Entirely on the primary key.
This concept is often referred to as the Derived Column. In the above example, DeptChairPerson can be derived using DepartmentID.
DO’s and DON’Ts of Normalization
- DO ensure that the design is in 3NF.
- DON’T make assumptions about poor performance. Generate realistic test data and characterize the performance of DB.
- DO try to solve the performance issue with hardware improvements.
- DON’T denormalize tables unless there is a complete understanding of trade-offs.
With this, we are at the end of the DBMS Tutorial. This blog covers the basic concepts of database management systems. We hope that you were able to gain valuable insights from the same. Apart from a DBMS Tutorial, you can also learn by taking up various free online courses. Happy Learning!