Assignment 2: Database Normalization using PostgreSQL with PostGIS Extension#
Objective:#
This assignment will guide you through the process of designing and normalizing a relational database to the first and second normal forms (1NF and 2NF) using PostgreSQL with the PostGIS extension. You’ll learn the concepts of normalization, how to reduce data redundancy, and improve data integrity. The focus will be on creating spatial tables that adhere to these normalization rules.
This assignment will solidify your understanding of database normalization principles, specifically focusing on the first two normal forms, and introduce you to working with spatial data in PostgreSQL using the PostGIS extension.
Tools Required:#
PostgreSQL with PostGIS extension
pgAdmin or any PostgreSQL client
Background on Normal Forms:#
First Normal Form (1NF): Requires that all table values are atomic (indivisible) and each column contains unique values. Essentially, no repeating groups or arrays are allowed within any single row, and each column must contain values of a single type.
Second Normal Form (2NF): Achieved when it is in the first normal form and all non-key attributes are fully functional and dependent on the primary key. It eliminates partial dependency.
Third Normal Form (3NF): A table is in 3NF if it is in 2NF and all its columns are not transitively dependent on the primary key.
Fourth Normal Form (4NF): A table is in 4NF if it is in 3NF and does not have any multi-valued dependencies.
Primary vs. Foreign Keys#
Understanding primary and foreign keys is crucial for database integrity and functionality, enabling the efficient organization, retrieval, and management of relational data.
Primary Key#
A primary key is a column (or a set of columns) in a database table that uniquely identifies each row in that table. The primary key has two main purposes:
Uniqueness: It ensures that no two rows have the same primary key value, guaranteeing the uniqueness of each record. This is crucial for accurately identifying and referencing specific data entries.
Indexing: The primary key automatically creates an index in the table, which speeds up data retrieval operations. An index allows the database system to find and access data efficiently without scanning the entire table.
Characteristics of a Primary Key:#
Uniqueness: Each value must be unique across all records in the table.
Not Null: A primary key cannot have a NULL value, as it must identify a record.
Stability: The value of a primary key should not change. While technically possible, updating primary keys is generally discouraged because they are used to reference rows in other tables.
Simplicity: Ideally, a primary key should be compact and consist of as few columns as necessary.
Example:#
Consider a table Users
with columns UserID
, Username
, and Email
. The UserID
could serve as the primary key because each user has a unique ID.
Foreign Key#
A foreign key is a column (or a set of columns) in one table that uniquely identifies a row in another table or the same table (in case of self-referencing). The foreign key establishes a relationship between two tables, often to enforce referential integrity.
Purpose of a Foreign Key:#
Referential Integrity: It ensures that the relationship between two tables remains consistent. For example, if table A references table B, a foreign key in table A prevents you from adding a record that does not have a corresponding value in table B.
Relationship Establishment: Foreign keys define how tables relate to each other, allowing for complex queries that span multiple tables.
Characteristics of a Foreign Key:#
Reference: It references the primary key or a unique key in another table.
Nullable: A foreign key can be NULL, indicating that a relationship may not exist for some rows.
Multiplicity: A single foreign key value can reference multiple rows in the related table, supporting one-to-many relationships.
Example:#
Continuing with the Users
table example, suppose there’s another table Orders
with columns OrderID
, OrderDate
, and UserID
. The UserID
in Orders
is a foreign key that references UserID
in the Users
table. This setup ensures that every order is linked to a user who placed it.
Primary vs. Foreign Key - Key Differences:#
Purpose: Primary keys uniquely identify records within a table. Foreign keys link records between tables.
Uniqueness: Primary key values must be unique within its table. Foreign key values do not need to be unique and often repeat in the referencing table.
Nullability: Primary keys cannot be NULL. Foreign keys can be NULL, depending on the relationship and design requirements.
Referential Role: Only foreign keys have a referential role, establishing a direct relationship between tables.
Assignment Overview#
In Part 1: Understanding and Designing for 1NF#
Task 1.1: Create a Non-Normalized Table
You will design a simple table that violates the 1NF to understand what not to do.
Example table:
Parks_Info
ID |
Park Name |
Facilities |
---|---|---|
1 |
Central Park |
Playground, Restroom, Picnic area |
2 |
Liberty Park |
Restroom, Picnic area |
3 |
Riverside Park |
Playground, Bike Path |
Task 1.2: Normalize the Table to 1NF
Modify the
Parks_Info
table to ensure it complies with 1NF.Split the
Facilities
column to remove the list of values, ensuring each row/column intersection has a single value.
In Part 2: Advancing to 2NF#
Task 2.1: Identify Partial Dependencies
Analyze the 1NF table to identify any partial dependencies, where some columns depend only on a part of the primary key in case of composite primary keys.
Task 2.2: Create Related Tables to Achieve 2NF
Eliminate partial dependencies by creating separate tables.
Establish relationships using foreign keys.
Setup:#
PostgreSQL and PostGIS:
Create a new database named
dataNorm
.Install the PostGIS extension by executing
CREATE EXTENSION postgis;
in yourdataNorm
.
Access Your Database:
Use pgAdmin or a command-line tool to connect to
dataNorm
.
Part 1: Creating a Non-Normalized Table and Normalizing to 1NF#
Manually Create a Non-Normalized Table:
Create a table named ‘Parks_Info’ with three columns: ID, ParkName, and Facilities.
‘ID’ is a serial number that automatically increments with each new row and is the primary key.
‘ParkName’ and ‘Facilities’ are text fields that can store up to 255 characters.
CREATE TABLE Parks_Info (
ID SERIAL PRIMARY KEY,
ParkName VARCHAR(255),
Facilities VARCHAR(255)
);
Insert data into the ‘Parks_Info’ table. Each row represents a park with its name and facilities.
‘Central Park’ has a playground, restroom, and picnic area.
‘Liberty Park’ has a restroom and picnic area.– ‘Riverside Park’ has a playground and bike path.
INSERT INTO Parks_Info (ParkName, Facilities) VALUES
('Central Park', 'Playground, Restroom, Picnic area'),
('Liberty Park', 'Restroom, Picnic area'),
('Riverside Park', 'Playground, Bike Path');
Normalize to 1NF:
Break down the
Facilities
column so each facility is in its own row.Create a new table ‘Parks’ to store information about parks with a more normalized structure.
‘ParkID’ is a unique identifier for each park, and ‘ParkName’ stores the name of the park.
CREATE TABLE Parks (
ParkID SERIAL PRIMARY KEY,
ParkName VARCHAR(255)
);
- Create a 'Facilities' table to separately store facilities available in each park.
- 'FacilityID' is a unique identifier for each facility.
- 'ParkID' links each facility to a specific park in the 'Parks' table, establishing a relationship between the two tables.
- 'FacilityName' describes the facility.
- A foreign key constraint ensures that 'ParkID' in 'Facilities' references a valid 'ParkID' in 'Parks'.
CREATE TABLE Facilities (
FacilityID SERIAL PRIMARY KEY,
ParkID INT,
FacilityName VARCHAR(255),
FOREIGN KEY (ParkID) REFERENCES Parks(ParkID)
);
Populate
Parks
andFacilities
with data split fromParks_Info
.
Part 2: Advancing to 2NF#
Identify and Remove Partial Dependencies:
Suppose you have a composite key and identify a column that depends only on part of that key, you’ll need to adjust your table structure.
Manually Create Related Tables:
Use foreign keys to link related data across tables.
Create a table ‘ParkFacilities’ to further normalize the data by separating out facility information.
This table will store a unique list of all types of facilities.
CREATE TABLE ParkFacilities (
FacilityID SERIAL PRIMARY KEY,
FacilityName VARCHAR(255)
);
Modify the ‘Facilities’ table to include a reference to ‘ParkFacilities’.
This involves adding a new column ‘ParkFacilityID’ to link to ‘ParkFacilities’.
ALTER TABLE Facilities ADD COLUMN ParkFacilityID INT;
Update the ‘Facilities’ table to set the new ‘ParkFacilityID’ column.
This matches each facility by name with its corresponding ID from ‘ParkFacilities’.
It effectively links the facility in ‘Facilities’ to a unique identifier in ‘ParkFacilities’.
ALTER TABLE Facilities
ADD CONSTRAINT fk_parkfacilityid FOREIGN KEY (ParkFacilityID) REFERENCES ParkFacilities(FacilityID);
Remove the now redundant ‘FacilityName’ column from ‘Facilities’, since the relationship is maintained via ‘FacilityID’.
Finally, add a foreign key constraint to ‘Facilities’ to enforce the relationship between ‘Facilities’ and ‘ParkFacilities’.
This ensures that every ‘FacilityID’ in ‘Facilities’ corresponds to a valid ‘FacilityID’ in ‘ParkFacilities’.
UPDATE Facilities
SET ParkFacilityID = (SELECT FacilityID FROM ParkFacilities WHERE FacilityName = Facilities.FacilityName;
-- Cleanup redundant columns
ALTER TABLE Facilities DROP COLUMN FacilityName;;
Assignment Submission Instructions:#
For this assignment, you will leverage Git Bash to version control your work and push your final submission to a GitHub repository. This approach will not only allow us to review your SQL scripts, report, and screenshots but also introduce you to best practices in code versioning and collaboration. Follow the steps below to prepare and submit your assignment.
Step 1: Prepare Your Submission#
Organize Your Project Folder:
Ensure your project folder is well-organized and contains only the files you intend to submit. Typically, this will include:
SQL scripts for creating and normalizing tables.
A Markdown or text file (
REPORT.md
orREPORT.txt
) containing your normalization report.Screenshots showing your final table structures in pgAdmin or the command-line outputs.
Write Your Report:
In
REPORT.md
orREPORT.txt
, explain the process and rationale behind normalizing your database to 1NF and then to 2NF. Include any challenges faced and how you addressed them.
Capture Screenshots:
Take screenshots of your table structures in pgAdmin or command-line outputs that demonstrate your database’s compliance with 1NF and 2NF. Save these screenshots in your project folder.
Step 2: Initialize a Git Repository#
Open Git Bash:
Navigate to your project folder in Git Bash. Use the
cd
command to change directories to your project folder.
Initialize the Repository:
Run
git init
to initialize a new Git repository in your project folder.
Create a .gitignore File (Optional but Recommended):
Create a
.gitignore
file to specify files or directories that Git should ignore, such as large data files or personal configuration files. Use thetouch .gitignore
command to create this file, then edit it with a text editor to add your rules.
Step 3: Add and Commit Your Files#
Add Files to the Repository:
Run
git add .
to stage all files in your project folder for commit. Alternatively, usegit add <file>
to add specific files.
Commit Your Changes:
Run
git commit -m "Submit database normalization assignment"
to commit your staged files to the repository with a descriptive message.
Step 4: Push to GitHub#
Create a GitHub Repository:
Go to GitHub and create a new repository named
DatabaseNormalizationAssignment
. Do not initialize it with a README, .gitignore, or license.
Link Your Local Repository to GitHub:
Follow the instructions provided by GitHub to push an existing repository from the command line, which will include:
Setting the remote repository with
git remote add origin <repository-URL>
.Pushing your commits to GitHub with
git push -u origin master
orgit push -u origin main
depending on your branch name.
Step 5: Submit Your Assignment#
Provide the GitHub Repository URL:
Submit the URL of your GitHub repository to your instructor and course assistant through your course’s submission system or as instructed by your course guidelines.
Tips for a Successful Submission:#
Double-check your repository to ensure it includes all required files and that your
.gitignore
is correctly configured.Review your report for clarity and completeness.
Make sure your GitHub repository is set to public so it can be accessed and reviewed by your instructor and course assistant.
By following these detailed submission instructions, you’ll not only demonstrate your understanding of database normalization but also gain practical experience in using Git and GitHub for version control and collaboration, skills that are invaluable in the field of software development and data management.
Tips for Success:#
Pay close attention to the definitions of each normal form and ensure your table structures comply with these rules.
Use clear and consistent naming conventions for tables and columns.
Test your SQL scripts incrementally to catch and correct errors early in the process.
Writing Your Report in Markdown#
Markdown is a lightweight markup language that allows you to add formatting elements to plaintext text documents. Here’s a quick guide on how to use Markdown to format your report, including how to insert screenshots, which is particularly useful for your database normalization assignment report hosted on GitHub.
Basic Markdown Syntax:#
Headers: Use
#
for a top-level header (similar to HTML<h1>
), and use multiple##
for sub-headers (<h2>
,<h3>
, etc.).## Database Normalization Assignment ### Part 1: Introduction
Emphasis:
Italic: Use single asterisks or underscores around the text
*italic*
or_italic_
.Bold: Use double asterisks or underscores
**bold**
or__bold__
.
Lists:
Unordered lists can be created using asterisks, plus signs, or hyphens interchangeably.
- Item 1 - Item 2
Ordered lists are created with numbers:
1. First item 2. Second item
Links: To create a link, wrap the link text in brackets
[ ]
, and then wrap the URL in parentheses( )
.[GitHub](https://github.com/)
Code Blocks: For inline code, use single backticks:
`SELECT * FROM table;`
. For larger code snippets or SQL queries, use triple backticks with an optional language identifier:SELECT * FROM Parks;
Inserting Screenshots in Your Markdown Report:#
To include screenshots in your GitHub-hosted report, you’ll first need to host the image file somewhere accessible over the internet. Since GitHub repositories can store binary files like images, you can directly upload your screenshots to your repository and reference them in your Markdown file.
Uploading Images:
Place your screenshots in a directory within your repository, such as
/images
.Use the GitHub interface to upload files or push them via Git commands.
Referencing Images in Markdown:
Once the images are in your repository, you can reference them using relative paths. Here’s the syntax for embedding an image:
![Alt text for the image](/images/screenshot.png)
Replace
/images/screenshot.png
with the path to your image file and provide suitable alt text. The alt text is important for accessibility, providing a text alternative to the image content.
Tips for Writing Your Report:#
Clarity and Conciseness: Keep your sentences clear and to the point. Markdown doesn’t affect the textual content but makes it more readable.
Consistent Formatting: Use the same style for headers, lists, and code blocks throughout your report for a cohesive look.
Preview Your Document: Use GitHub’s preview feature to review your document for formatting errors before finalizing it.
Assignment Rubric#
Total points available: 90#
SQL Scripts (20 Points)#
Correct Implementation of 1NF (10 Points)
10 Points: Table(s) perfectly normalized to 1NF with all columns atomic, no repeating groups, and primary keys correctly identified.
5-9 Points: Minor issues in normalization to 1NF, such as slight redundancy or improper atomicity.
0-4 Points: Major issues in normalization, with significant redundancy or lack of atomicity.
Correct Implementation of 2NF (10 Points)
10 Points: Table(s) perfectly normalized to 2NF with no partial dependencies and all non-key attributes fully functional and dependent on the primary key.
5-9 Points: Minor issues in achieving 2NF, with some partial dependencies not fully addressed.
0-4 Points: Major issues in achieving 2NF, with numerous partial dependencies and improper key usage.
Written Report (30 Points)#
Explanation of Normalization Process (15 Points)
15 Points: Provides a clear, detailed explanation of the normalization process, including reasoning behind each normalization step to 1NF and 2NF.
8-14 Points: Adequate explanation of the normalization process but may lack detail or clarity in certain areas.
0-7 Points: Explanation is vague, incomplete, or significantly lacks understanding of normalization concepts.
Challenges and Solutions (10 Points)
10 Points: Thoughtfully identifies challenges faced during the assignment and explains the solutions or strategies used to overcome them.
5-9 Points: Identifies some challenges with basic explanations of solutions, lacking depth or reflection.
0-4 Points: Minimal or no discussion of challenges faced, or solutions are not logical or relevant.
Formatting and Clarity (5 Points)
5 Points: Report is well-organized, using Markdown formatting effectively for headers, lists, code blocks, and images. Writing is clear and concise.
3-4 Points: Report is generally well-organized with minor issues in formatting or clarity.
0-2 Points: Report lacks organization, with significant issues in formatting, readability, or coherence.
Use of Git and GitHub (20 Points)#
Repository Organization (10 Points)
10 Points: Repository is well-organized with clear, logical structure; includes a
.gitignore
file; and all relevant files are present.5-9 Points: Repository is mostly organized but may lack a
.gitignore
file or have minor organizational issues.0-4 Points: Repository is disorganized, missing key files, or includes unnecessary files tracked.
Commit History (10 Points)
10 Points: Commit history is detailed with meaningful commit messages that accurately reflect each change.
5-9 Points: Commit history is present but messages may be vague or some changes poorly documented.
0-4 Points: Sparse commit history, nonsensical commit messages, or large commits that lack specificity.
Screenshots and Formatting (20 Points)#
Inclusion of Screenshots (10 Points)
10 Points: Relevant screenshots are included and clearly demonstrate the database structure and normalization results.
5-9 Points: Screenshots are included but may not clearly or completely demonstrate key aspects of the assignment.
0-4 Points: Few or no screenshots included, or screenshots are irrelevant/unrelated to the assignment requirements.
Markdown Formatting (10 Points)
10 Points: Markdown is used effectively throughout the report to enhance readability and organization; images are embedded properly with descriptive alt text.
5-9 Points: Markdown used with minor issues in effectiveness; images may lack alt text or be improperly formatted.
0-4 Points: Minimal or incorrect use of Markdown; images not embedded or poorly integrated into the report.