Assignment 3 - Normalizing Spatial Data in a Real Estate Database#
Objective#
You will design and normalize a simplistic database for a real estate platform that stores property details, including spatial data. The assignment will take you through creating an initial table structure and progressively normalizing it to the Third Normal Form (3NF) and Fourth Normal Form (4NF). Additionally, you will use PostgreSQL with the PostGIS extension for handling spatial data.
Part 1: Initial Setup#
Ensure PostgreSQL and PostGIS are installed and properly configured on your system.
Create a new database named
RealEstateDB
.CREATE DATABASE "RealEstateDB"; -- Use double quotations to preserve case sensitivity.
Connect to your database and enable the PostGIS extension.
-- Connect to RealEstateDB \c RealEstateDB -- If you defined your database without double quotation marks, then to connect, you'll type in the name in all lowercase (specific to PostgreSQL) \c realestatedb -- Enable PostGIS CREATE EXTENSION IF NOT EXISTS postgis;
Part 2: Creating the Initial Table#
Create an initial PropertyDetails
table that intentionally violates normalization principles.
CREATE TABLE PropertyDetails (
PropertyID SERIAL PRIMARY KEY,
Address VARCHAR(255),
City VARCHAR(100),
State VARCHAR(50),
Country VARCHAR(50),
ZoningType VARCHAR(100),
Utility VARCHAR(100),
GeoLocation GEOMETRY(Point, 4326), -- Spatial data type
CityPopulation INT
);
This table is in First Normal Form (1NF) because:
Each column holds atomic, indivisible values.
Rows are uniquely identifiable by
PropertyID
.
It’s also in Second Normal Form (2NF) by default since:
It’s already in 1NF.
It does not have a composite primary key, making partial dependency concerns non-applicable.
Part 3: Normalize to 3NF#
To achieve 3NF, remove transitive dependencies. This involves creating a new table for city demographics and adjusting the PropertyDetails
table accordingly.
Create
CityDemographics
Table:CREATE TABLE CityDemographics ( City VARCHAR(100) PRIMARY KEY, State VARCHAR(50), Country VARCHAR(50), CityPopulation INT );
Modify
PropertyDetails
Table:ALTER TABLE PropertyDetails DROP COLUMN CityPopulation, DROP COLUMN State, DROP COLUMN Country;
All attributes in
PropertyDetails
are now directly dependent on the primary key.CityDemographics
handles the transitive dependency ofCityPopulation
.
Part 4: Normalize to 4NF#
Address multi-valued dependencies by creating separate tables for ZoningType
and Utility
.
Create
PropertyZoning
andPropertyUtilities
Tables:CREATE TABLE PropertyZoning ( PropertyZoningID SERIAL PRIMARY KEY, PropertyID INT REFERENCES PropertyDetails(PropertyID), ZoningType VARCHAR(100) ); CREATE TABLE PropertyUtilities ( PropertyUtilityID SERIAL PRIMARY KEY, PropertyID INT REFERENCES PropertyDetails(PropertyID), Utility VARCHAR(100) );
Remove Columns from
PropertyDetails
:ALTER TABLE PropertyDetails DROP COLUMN ZoningType, DROP COLUMN Utility;
By separating
ZoningType
andUtility
into their tables, we eliminate multi-valued dependencies inPropertyDetails
.
Part 5: Spatial Data Manipulation#
Inserting and querying spatial data using PostGIS.
Insert a Property with Geolocation:
INSERT INTO PropertyDetails (Address, City, GeoLocation) VALUES ('123 Main St', 'Springfield', ST_GeomFromText('POINT(-89.6501483 39.7817213)', 4326));
Query Properties within a Radius:
SELECT Address, City FROM PropertyDetails WHERE ST_DWithin( GeoLocation, ST_GeomFromText('POINT(-89.6501483 39.7817213)', 4326), 10000 -- 10km radius );
Rubric#
Deliverables and Learning Objectives#
Enhance understanding of database normalization principles (1NF to 4NF) and spatial data manipulation using PostgreSQL and PostGIS. Submission includes SQL scripts and a Markdown report detailing how the normalization principles are applied.
Criteria |
Excellent (15-20) |
Good (10-14) |
Satisfactory (5-9) |
Needs Improvement (0-4) |
---|---|---|---|---|
3NF Compliance |
No transitive dependencies; all explanations are thorough and demonstrate a deep understanding. |
Minimal transitive dependencies; explanations are good but miss some nuances. |
Some transitive dependencies are not addressed; explanations lack clarity. |
Many transitive dependencies; explanations are missing or fail to demonstrate understanding. |
4NF Compliance |
Multi-valued dependencies are expertly resolved; the Markdown report provides a comprehensive explanation. |
Mostly resolved multi-valued dependencies; explanations are good but lack some detail. |
Incomplete resolution of multi-valued dependencies; explanations are basic or vague. |
Poor handling of multi-valued dependencies; lack of or incorrect explanations. |
SQL Script Quality |
Scripts are exemplary: well-organized, commented, and showcase advanced SQL and PostGIS usage. The Markdown report complements the scripts with detailed explanations. |
Scripts are well-written with adequate comments; the Markdown report explains well but could be more detailed. |
Scripts are functional with basic comments; the Markdown report is simplistic and lacks depth. |
Scripts lack organization and comments; the Markdown report is inadequate or missing. |
Spatial Data Handling |
Advanced manipulation and querying of spatial data, with the Markdown report detailing the use and benefits of PostGIS features. |
Good spatial data handling with a solid explanation of PostGIS features in the report. |
Basic spatial data queries; the report mentions PostGIS but lacks detailed explanation. |
Incorrect or inefficient spatial queries; the report fails to properly explain or justify the use of PostGIS. |
Markdown Report, Collaboration, and Documentation |
GitHub repository is organized with a comprehensive README.md and the report details how each SQL command satisfies normalization principles. Collaborators are invited for review. |
The report covers most normalization principles with examples but lacks some depth in explanations. The repository is organized with clear instructions and a good report. Minor issues in documentation or organization. |
The report addresses normalization principles but with limited examples and superficial explanations. The repository and documentation lack some organization, making review challenging. |
The report is incomplete, lacks clarity, or misses significant portions of normalization principles. Poorly managed repository with inadequate documentation and report. Collaborators are not properly invited or cannot access materials easily. |
Additional Notes for Students:#
Ensure SQL scripts are executable without errors and clearly formatted. Comments within the scripts should explain their purpose and how they apply to normalization principles.
The Markdown report should detail the journey from 1NF through to 4NF, providing specific examples from your SQL scripts and explaining why each step satisfies the corresponding normalization form.
Pay special attention to the clarity of your explanations in the Markdown report. The goal is to demonstrate your understanding of database normalization and spatial data management through detailed examples and explanations.
Include a section in your README.md on how to set up PostGIS and prerequisites for running your SQL scripts, alongside the Markdown report for normalization explanation.
Submission Instructions#
Push all SQL scripts and your Markdown report detailing the normalization process to your designated GitHub repository.
Ensure your README.md includes setup instructions and a brief overview of the repository contents.
Invite your instructor and course assistant to your repository as collaborators using their provided GitHub usernames.
Submit the link to your GitHub repository on Canvas before the deadline.