Assignment 3 - Normalizing Spatial Data in a Real Estate Database#

Objective#

You will design and normalize a simplistic database for a real estate platform that stores property details, including spatial data. The assignment will take you through creating an initial table structure and progressively normalizing it to the Third Normal Form (3NF) and Fourth Normal Form (4NF). Additionally, you will use PostgreSQL with the PostGIS extension for handling spatial data.

Part 1: Initial Setup#

  1. Ensure PostgreSQL and PostGIS are installed and properly configured on your system.

  2. Create a new database named RealEstateDB.

    CREATE DATABASE "RealEstateDB"; -- Use double quotations to preserve case sensitivity.
    
  3. Connect to your database and enable the PostGIS extension.

    -- Connect to RealEstateDB
    \c RealEstateDB
    
    -- If you defined your database without double quotation marks, then to connect, you'll type in the name in all lowercase (specific to PostgreSQL)
    \c realestatedb
    
    -- Enable PostGIS
    CREATE EXTENSION IF NOT EXISTS postgis;
    

Part 2: Creating the Initial Table#

Create an initial PropertyDetails table that intentionally violates normalization principles.

CREATE TABLE PropertyDetails (
    PropertyID SERIAL PRIMARY KEY,
    Address VARCHAR(255),
    City VARCHAR(100),
    State VARCHAR(50),
    Country VARCHAR(50),
    ZoningType VARCHAR(100),
    Utility VARCHAR(100),
    GeoLocation GEOMETRY(Point, 4326), -- Spatial data type
    CityPopulation INT
);

This table is in First Normal Form (1NF) because:

  • Each column holds atomic, indivisible values.

  • Rows are uniquely identifiable by PropertyID.

It’s also in Second Normal Form (2NF) by default since:

  • It’s already in 1NF.

  • It does not have a composite primary key, making partial dependency concerns non-applicable.

Part 3: Normalize to 3NF#

To achieve 3NF, remove transitive dependencies. This involves creating a new table for city demographics and adjusting the PropertyDetails table accordingly.

  1. Create CityDemographics Table:

    CREATE TABLE CityDemographics (
        City VARCHAR(100) PRIMARY KEY,
        State VARCHAR(50),
        Country VARCHAR(50),
        CityPopulation INT
    );
    
  2. Modify PropertyDetails Table:

    ALTER TABLE PropertyDetails DROP COLUMN CityPopulation, DROP COLUMN State, DROP COLUMN Country;
    
    • All attributes in PropertyDetails are now directly dependent on the primary key.

    • CityDemographics handles the transitive dependency of CityPopulation.

Part 4: Normalize to 4NF#

Address multi-valued dependencies by creating separate tables for ZoningType and Utility.

  1. Create PropertyZoning and PropertyUtilities Tables:

    CREATE TABLE PropertyZoning (
        PropertyZoningID SERIAL PRIMARY KEY,
        PropertyID INT REFERENCES PropertyDetails(PropertyID),
        ZoningType VARCHAR(100)
    );
    
    CREATE TABLE PropertyUtilities (
        PropertyUtilityID SERIAL PRIMARY KEY,
        PropertyID INT REFERENCES PropertyDetails(PropertyID),
        Utility VARCHAR(100)
    );
    
  2. Remove Columns from PropertyDetails:

    ALTER TABLE PropertyDetails DROP COLUMN ZoningType, DROP COLUMN Utility;
    
    • By separating ZoningType and Utility into their tables, we eliminate multi-valued dependencies in PropertyDetails.

Part 5: Spatial Data Manipulation#

Inserting and querying spatial data using PostGIS.

  1. Insert a Property with Geolocation:

    INSERT INTO PropertyDetails (Address, City, GeoLocation)
    VALUES ('123 Main St', 'Springfield', ST_GeomFromText('POINT(-89.6501483 39.7817213)', 4326));
    
  2. Query Properties within a Radius:

    SELECT Address, City
    FROM PropertyDetails
    WHERE ST_DWithin(
        GeoLocation,
        ST_GeomFromText('POINT(-89.6501483 39.7817213)', 4326),
        10000 -- 10km radius
    );
    

Rubric#

Deliverables and Learning Objectives#

Enhance understanding of database normalization principles (1NF to 4NF) and spatial data manipulation using PostgreSQL and PostGIS. Submission includes SQL scripts and a Markdown report detailing how the normalization principles are applied.

Criteria

Excellent (15-20)

Good (10-14)

Satisfactory (5-9)

Needs Improvement (0-4)

3NF Compliance

No transitive dependencies; all explanations are thorough and demonstrate a deep understanding.

Minimal transitive dependencies; explanations are good but miss some nuances.

Some transitive dependencies are not addressed; explanations lack clarity.

Many transitive dependencies; explanations are missing or fail to demonstrate understanding.

4NF Compliance

Multi-valued dependencies are expertly resolved; the Markdown report provides a comprehensive explanation.

Mostly resolved multi-valued dependencies; explanations are good but lack some detail.

Incomplete resolution of multi-valued dependencies; explanations are basic or vague.

Poor handling of multi-valued dependencies; lack of or incorrect explanations.

SQL Script Quality

Scripts are exemplary: well-organized, commented, and showcase advanced SQL and PostGIS usage. The Markdown report complements the scripts with detailed explanations.

Scripts are well-written with adequate comments; the Markdown report explains well but could be more detailed.

Scripts are functional with basic comments; the Markdown report is simplistic and lacks depth.

Scripts lack organization and comments; the Markdown report is inadequate or missing.

Spatial Data Handling

Advanced manipulation and querying of spatial data, with the Markdown report detailing the use and benefits of PostGIS features.

Good spatial data handling with a solid explanation of PostGIS features in the report.

Basic spatial data queries; the report mentions PostGIS but lacks detailed explanation.

Incorrect or inefficient spatial queries; the report fails to properly explain or justify the use of PostGIS.

Markdown Report, Collaboration, and Documentation

GitHub repository is organized with a comprehensive README.md and the report details how each SQL command satisfies normalization principles. Collaborators are invited for review.

The report covers most normalization principles with examples but lacks some depth in explanations. The repository is organized with clear instructions and a good report. Minor issues in documentation or organization.

The report addresses normalization principles but with limited examples and superficial explanations. The repository and documentation lack some organization, making review challenging.

The report is incomplete, lacks clarity, or misses significant portions of normalization principles. Poorly managed repository with inadequate documentation and report. Collaborators are not properly invited or cannot access materials easily.

Additional Notes for Students:#

  • Ensure SQL scripts are executable without errors and clearly formatted. Comments within the scripts should explain their purpose and how they apply to normalization principles.

  • The Markdown report should detail the journey from 1NF through to 4NF, providing specific examples from your SQL scripts and explaining why each step satisfies the corresponding normalization form.

  • Pay special attention to the clarity of your explanations in the Markdown report. The goal is to demonstrate your understanding of database normalization and spatial data management through detailed examples and explanations.

  • Include a section in your README.md on how to set up PostGIS and prerequisites for running your SQL scripts, alongside the Markdown report for normalization explanation.

Submission Instructions#

  1. Push all SQL scripts and your Markdown report detailing the normalization process to your designated GitHub repository.

  2. Ensure your README.md includes setup instructions and a brief overview of the repository contents.

  3. Invite your instructor and course assistant to your repository as collaborators using their provided GitHub usernames.

  4. Submit the link to your GitHub repository on Canvas before the deadline.