DBMS is an abbreviation for Data Base Management System. The DBMS is a collection of interrelated data and a set of programs to access this data in a convenient and efficient way. It controls the organization, storage, retrieval, security and integrity of data in a database.
Levels/Abstraction of DBMS
The generalized architecture of DBMS is called ANSI/SP ARC model. The architecture is divided into three levels:
1. External level or View level/User view
It is the highest level of data abstraction. Here, each user has a different external view and it is described by means of a scheme called external scheme.
For Example- The student of University can view his detail in university record but cannot access other student details.
2. Conceptual level or Logical level
It describes data stored in database, and the relationship among the data.
Here we are not only concerned of how the data is stored but also how we can deal with the data.
3. Internal level / Storage level or Physical level
This is the lowest level of abstraction. It describes how the record is stored, what is the data storage structure and the method of accessing this data.
Schemas and Instances :
Schema: Schema can be defined as the design of a database. It can also be called as the plan of organizing a database.
For Example- Plan for construction of a building is schema of building construction.
Note- Schemas are not the actual data, these are meta-data.
Type of Schemas :
- Physical Schema: The design of database at physical level is called physical schema. It tells actually how data is physically organized in a database.
Logical Schema: The logical schema can be defined as the design of the database at the logical level. It is the overall logical structure of the database.
- Consumer schema:
Name Customer ID Account No
- Account schema:
Account No Account Type Available Balance Interest Rate
- Consumer schema:
- View Schema: View schema can be defined as the design of a view-level database that typically describes end-user interaction with database systems.
Instance: Database changes over time as information is inserted and deleted. The actual content of the database at a particular point of time is called Instance.
Note- Schema is fixed but the Instance keeps on changing .The Schema is only changed when the database is designed or updated.
A database model shows the logical structure of a database, including the relationship and the constraints that determine how data can be stored and accessed.
Types of database model
There are five types of database model.
- Object-Oriented Model
- Hierarchical Database Model
- Network Database Model
- Entity-Relationship Database Model
- Relational Database Model
Object oriented database model is a database management system that supports the Modelling and creation of data as objects.
ODBMS, which is an abbreviation for object-oriented database management system, is the data model in which data is stored as objects, which are instances of classes.
Hierarchical Database Model
The oldest model was the hierarchical database model, resembling an upside down tree. The files are linked in a parent-child fashion, with each parent being able to relate to more than one child, but each child being linked to only one parent.
Network Database Model
The network database model can be thought of as an upside-down tree where the member information is the branch related to the owner, which is the bottom of the tree. The network database model was a progression from the hierarchical database model and was designed to solve some of the problems with that model, especially the lack of flexibility. It addresses the need to model more complex relationships such as the many-to-many relationship that the hierarchical model could not handle.
The network model replaces the hierarchical tree with a graph thus allowing more general connections between nodes. The main difference between the network model and the hierarchical model is its ability to manage many to many relationships. In other words, it allows a record to have more than one parent.
Entity-Relationship Database Model
Entity-Relationship Model is a graphical representation of entities and their relationship to each other, typically used in computing in regard to the organization of data within database or information systems.
It is implemented with the use of the ER diagrams.
The ER diagram is a graphical representation of data that describes how data is communicated and related to each other. Any object such as entities, attributes of an entity, relationship sets, and other relationship attributes can be characterized using the ER diagram.
Later on, we will discuss more about E-R model.
Relational Database Model
The relational data model is widely used for data storage and processing. The relational database model was a huge step up from the network database model. Instead of relying on a parent-child for the owner-member relationship, the relational model allows any file to be linked to any other by means of a common field.
Data Module objectives
- Understand the models of the database management system.
- Become familiar with the main components of the database engine.
- Become familiar with internal components and database architecture.
- To understand the historical perspective.
There are mainly four types of database languages
Data definition language:
DDL is the language used to define and manipulate the schema of a database.
Data manipulation language:
DML is used to access and manipulate data in a database.
Data control language:
DCL is used to grant and revoke user access on a database
Transaction Control Language:
The Transaction Control Language command is used to manage transactions in the database. They are used to manage changes made by DML statements. It also allows statements to be grouped into logical transactions.
The table below shows the different database language commands used in SQL:
To have good understanding of ER model we first need to understand different types of keys.
Although keys are part of the RDBMS, we will go over the concept of keys in this part.
Types of keys
There are mainly six types of keys:
- Candidate key
- Primary key
- Super Key
- Composite key
- Alternate or secondary key
- Foreign key
|Roll No||Name||Mobile No||Email Id||Address|
|01||John||1542xxxx||[email protected]||H no-24, Noida, India|
|02||Maya||5452xxxx||[email protected]||H no-28, Patna, India|
|03||Karan||4574xxxx||[email protected]||H no-04, Ranchi, India|
Note: We will use this table to understand all the concepts of keys.
First, we need to understand all the terms related to this table.
|What we call in informal terms||What we call in formal terms|
|All possible column value||Domain|
|Table defination||Schema of relation|
A candidate key is a unique field or any combination of fields that uniquely identifies each record in the table. In another word, the attribute that uniquely identifies the record is called the candidate key.
Candidate keys are unique but cannot be null, that is, they cannot be left blank.
In the table given above Roll No, Mob No, Email Id can be Candidate key.
A primary key is a key that uniquely identifies a record in a table. This is a unique key and is selected from the candidate key.
In the table given above Roll No is a Primary key.
A set of attributes that can uniquely identify a record is called a super key. They are supersets of candidate keys.
In the table given above Roll No, Mobile No, Email ID, Roll No + name, roll no + mobile no, etc. can be a super key
A composite key is a combination of two or more columns in a table that you can use to uniquely identify each row in the table when you combine the columns Uniqueness is guaranteed, but it is retrieved individually and is not guaranteed to be unique.
In the table given above Roll No + Name, Name + Email Id, Roll No + Name + Email Id etc can be Composite key.
Note – What is difference between composite and super key?
-> The super key can also be of a single attribute, but the composite key cannot be of a single attribute.
Alternate or secondary key:
Any attitudes that remain after the selection of the primary key from the candidate key is called an alternate or secondary key.
A Foreign key is a column that creates a relationship between two tables. The purpose of a foreign key is to maintain data integrity and allow navigation between two different entity states. It acts as a cross-reference between the two tables because it refers to the main key of another table.
The below figure represents the same –
Above, you can see three tables. The primary key for students and course tables is the foreign key of the third table.
Peter Chen developed Entity Relationship Diagram (ERD) in 1976. Since then Charles Bachman and James Martin have made some refinements to the basics of ERD.
An ER diagram shows the relationship between sets of entities. A set of entities is a group of similar entities and these entities can have attributes.
The ER diagram is also called ER notation or Chen notation and occurs during the design phase.
Components of an ER diagram
ER diagram has three main components-
An entity is an object or a component of data in a simple word any real world object that has a property is an entity. An entity is represented as rectangle in an ER diagram.
The entity which has a primary key is also called as strong entity
In the following ER diagram we have two entities Student and Teacher.
Weak Entity: An entity that cannot be uniquely identified by its own attributes and which relies on the relationship with another entity is called a weak entity. This type of entity does not have a primary key. The weak entity is represented by a double rectangle.
An attribute describes the property of an entity. It is represented as an oval in an ER diagram.
a) Key Attribute-
The attribute that uniquely identifies each entity in the entity set is called a key attribute.
The key attribute is represented by an oval with underlying lines.
b) Partial key attribute or discriminator-
It is one or more attributes that uniquely identify a weak entity for a given owner entity, it’s called partial because it can’t be a primary key by itself, it needs another column, which is the foreign key of the owning entity.
The Partial key attribute is represented by an oval with underlying dotted lines.
c) Multivalued Attribute –
An attribute consisting of more than one value for a given entity. For example, Phone No.
Multivalued attribute is represented by double oval.
d) Composite Attribute –
An attribute composed of many other attributes is called as a composite attribute. For example, Address which consists of Street, City, State, and Country.
The composite attribute is represented by an oval comprising of ovals.
e) Derived Attribute –
An attribute that can be derived from other attributes of the entity type is called a derived attribute. for example.; Age (can be derived from date of birth).
Derived attribute is represented by a dotted oval.
The relationship connects two or more entities in an association. Relationship is represented by diamond shape in ER diagram.
Degree of a relationship set:
The number of different entity sets participating in a relationship set is called the degree of a relationship set.
There are three types of relationship set:
- Unary Relationship / One to One Relationship – When there is only one set of entities participating in a relationship, the relationship is called a unary relationship.
- Binary Relationship – When there are two defined entities participating in a relationship, the relationship is called a binary relationship.
- n-ary Relationship – When there are n entities participating together in a relation, the relation is called an n-ary relation.
The number of times an entity from a set of entities participates in a set of relationships is cardinality.
There are three types of Cardinality-
- One to one: When each entity in each entity set can only participate in the relationship once, the cardinality is one-to-one.
- Many to one/ One to Many: When entities from one set of entities can only participate in the relationship set once, and entities from another set of entities can participate in the relationship set more than once, the cardinality is many to one.
- Many to many: When entities from all entity sets can participate in the relationship more than once, the cardinality is many-to-many.
Normalization is a process of organizing data in the database to avoid data redundancy, insert anomaly, update anomaly, and delete anomaly. Thus, database normalization is a technique of database schema design, whereby an existing schema is modified to minimize redundancy and data dependency.
It is a bottom-up approach and a form of generalization.
Note- Let us know what is Anomalies in database management?
Anomalies are inconvenient or error-prone situations arising when processing tables.
There are three types of anomalies.
- Update anomalies: Incorrect data may need to be edited, which would involve many records to be edited, leading to the possibility of some changes being made incorrectly
- Delete anomalies: The data record can legitimately be deleted from a database, and deletion may result in the detection of the only instance of other, required data.
- Insert anomalies: The nature of a database may be such that it is not possible to add a required data item unless another unavailable data item is also added.
What is Normal forms(NF)?
The formal classifications used to describe a relational database level of normalization are called normal forms.
There are various level of normalization.
- First Normal Form (1NF)
- Second Normal Form (2NF)
- Third Normal Form (3NF)
- Boyce-Codd Normal Form (BCNF)
- Forth Normal Form (4NF)
- Fifth Normal Form (5NF)
First Normal Form (1NF)
If a relation has a composite or multivariate attribute, it violates the first normal form, or a relation is in the first normal form if it does not have a composite or multilevel attribute.
A relation which is in 1 NF must follow:
- All attributes in a table must have an atomic value, that is, a unique value.
- All entries in a column must be of the same type.
- Each column must have a unique attribute.
- No two rows should be identical.
- All column attributes must be regular, that is, there must be no hidden values.
Second Normal Form (2NF)
The second normal form (2NF) is based on the concept of full functional dependency, i.e. the relationship should only depend on one key. To be in the second normal form, a relation must be in the first normal form and the relation must not contain any partial dependencies.
A relation which is in 2 NF must follow:
- 2NF = 1NF + full functional dependency
Consider the examples below.
In the example given, the table depends on two primary keys. To make the relation in a second normal form, we need to remove this flaw by splitting the table into two parts and making them dependent on a single primary key.
Third Normal Form (3NF)
For a relation to be in 3NF first, it must be in 2 NF and there should be no transitive dependencies for non-prime attributes.
A relation which is in 1 NF must follow:
- 3NF= 2NF + Non Transitivity Or Non Trivial Functional dependency
- If Alpha (α) tends to Beta (β) is a functional dependency, then alpha must be Super-key or Beta must be the Prime attribute.
Boyce-Codd Normal Form (BCNF)
Sometimes Third normal form (3NF) may not remove 100% redundancy because of X?Y functional dependency, if X is not a candidate key of the given relation. This can be solve by Boyce-Codd Normal Form (BCNF).
Boyce – Codd Normal Form (BCNF) is based on functional dependency that take into account all candidate keys in a relation.
A relation which is in BCNF must follow:
- BCNF= 3NF + Functional dependency
- If Alpha (α) tends to Beta (β) is a functional dependency, then alpha must be Super-key.
- The BCNF differs from the 3NF only when there are more than one candidate keys and the keys are composite and overlapping.
Forth Normal Form (4NF)
A relation will be in 4NF if it is in Boyce-Codd normal form and has no multivalued dependency.
For a dependency A → B, if for a single value of A, several values of B exist, then the relation will be a multi-valued dependency.
The given table is in 3NF, but the COURSE and HOBBY are two independent entity. Hence, there is no relationship between COURSE and HOBBY.
The student with Student Id, 1 contains two courses, Computer and Math and two hobbies, Cricket and Singing. So there is a Multi-valued dependency on Student Id, which leads to unnecessary repetition of data.
So to make the table into 4NF, we decompose it into two tables.
Fifth Normal Form (5NF)
5NF is also known as Project-join normal form (PJ/NF).
A relation is in 5NF if it is in 4NF and contains no join dependencies and the join must be lossless.
5NF is satisfied when all tables are divided into as many tables as possible to avoid redundancy.
A relation R is in 5th normal form if and only if each join dependency in R is a consequence of the candidate keys of R.