This assignment will guide students through setting up a PostGIS-enabled PostgreSQL database, visualizing and analyzing spatial data using QGIS, and managing their project with Git. The project will involve real-world spatial data to give students hands-on experience with the tools and processes used in spatial database projects.

Assignment 1: Urban Green Spaces Analysis#

Objective:#

The objective of this project is to analyze the distribution and accessibility of urban green spaces in a city of your choice. You will import spatial data into a PostGIS-enabled PostgreSQL database, perform spatial queries to analyze the data, and use QGIS for visualization and further spatial analysis. The project will be version-controlled using Git.

Tools and Technologies:#

  • PostgreSQL with PostGIS extension

  • pgAdmin (for database management)

  • QGIS

  • Git (and optionally Git Bash for Windows users)

Setup Instructions:#

  1. Install Git:

    • Download Git from Git’s official site and follow the installation instructions. Choose Git Bash as the terminal emulator if you’re on Windows.

  2. Create a GitHub account (if you don’t have one):

Data Sources:#

  • OpenStreetMap (OSM): Use the OSM data for your chosen city. You can download city-specific OSM data from websites like Geofabrik or BBBike.

  • Population Density Data: Optionally, you can also incorporate population density data to analyze the accessibility of green spaces for the city’s population. This data might be available from local government websites or global datasets like SEDAC’s Gridded Population of the World.

  • Other Sources: You are free to gather OSM or population density data from a preferred source if you choose to do so.


Assignment Overview:#

You will complete the following six (two are optional) task for the assignment. Details for each task are shared in the next subsection.

  • Data Exploration and Cleaning:

    • Explore the imported datasets using pgAdmin and QGIS.

    • Identify and clean any anomalies or inconsistencies in the green spaces data (e.g., overlapping areas, incorrect classifications).

  • Spatial Analysis:

    • Query 1: Calculate the total area of green spaces within the city.

    • Query 2: Identify the top 5 largest green spaces in the city.

    • Query 3: (Optional) If population density data is available, calculate the average distance to the nearest green space for different population density quartiles.

  • Visualization (Optional):

    • Use QGIS to create visualizations of the green spaces distribution in your city. Highlight the top 5 largest green spaces and, if applicable, areas with poor accessibility to green spaces.

  • Advanced Analysis (Optional):

    • Perform a more detailed analysis, such as accessibility analysis using network roads data or an ecological impact assessment of green spaces.

The following examples provide a foundational approach to each task, showcasing how spatial SQL queries, GIS tools, and analytical techniques can be applied to real-world spatial data scenarios.

  • Given the nature of the assignment and the variability of the data based on the city chosen, these examples are illustrative and may require adaptation or additional troubleshooting on the part of the student.

Step 1: Setup and Initial Tasks:#

  1. Database Setup:

    • Create a new database for your project (e.g., urban_green_spaces) and enable the PostGIS extension using the SQL query window and run:

    CREATE EXTENSION postgis;
    
  2. Set Up Your Project Folder

    • Create a new folder on your computer where you will store your project files, e.g., UrbanGreenSpacesProject.

  3. Git Repository Setup:

    • Initialize a new Git repository in your project directory. Open Git Bash or your terminal, navigate to your project folder, and run:

    git init
    
  4. Create a GitHub Repository

    • Go to GitHub, click on “New repository”, name it UrbanGreenSpacesProject, and follow the instructions to push your local repository to GitHub.

Creating a .gitignore file is a straightforward process that helps you specify intentionally untracked files that Git should ignore. Files and directories listed in a .gitignore file won’t be added to your repository, which is useful for excluding sensitive information, system files, or large data files that shouldn’t be shared or tracked. Here’s how to create and set up a .gitignore file:

Using Command Line:#

  1. Navigate to Your Project Directory: Open your command line interface (CLI) and navigate to the root of your project directory using the cd command. For example:

    cd path/to/your/project
    
  2. Create the .gitignore File:

    • On Linux or macOS, use the touch command:

      touch .gitignore
      
    • On Windows, you might use the echo command if using Command Prompt:

      echo.>.gitignore
      

      Or, if you’re using PowerShell:

      New-Item .gitignore -type file
      
  3. Edit the .gitignore File: Open the .gitignore file in a text editor of your choice. You could use a command-line text editor like nano, vim, or notepad on Windows. For example, using nano:

    nano .gitignore
    

Using a Text Editor:#

Alternatively, you can create a .gitignore file directly in a text editor:

  1. Open Your Text Editor: Open your preferred text editor and create a new file.

  2. Save the File as .gitignore: Save the new file with the name .gitignore in the root of your project directory. Ensure the file is saved as plain text.

Configuring .gitignore:#

In your .gitignore file, you can specify individual files, directories, or patterns to ignore. Here are some common examples:

  • Ignore a Specific File: Simply write the name of the file:

    secret.txt
    
  • Ignore a Directory: Include the directory name followed by a slash:

    node_modules/
    
  • Ignore Files with a Specific Extension: Use an asterisk as a wildcard:

    *.log
    
  • Ignore All Files in a Directory: Combine the directory name with an asterisk:

    data/*
    
  • Exclude Specific Files or Directories: You can negate a pattern by prefixing it with an exclamation point, which can be useful if you want to ignore all files of a certain type except one or a few:

    !important.log
    

Final Steps:#

After adding the necessary rules to your .gitignore file, save and close the file. The next time you use git status or attempt to add files to your repository, Git will ignore the files and directories specified in .gitignore.

Remember to add and commit the .gitignore file to your repository so that the ignore rules are shared with any other users or contributors:

git add .gitignore
git commit -m "Add .gitignore file"

This ensures consistent behavior for all collaborators by preventing unwanted files from being tracked and shared through the repository.

Step 2: Data Preparation#

  1. Download Data:

    • Choose a city and download its OpenStreetMap (OSM) data from Geofabrik or BBBike.

  2. Import OSM Data to Your Database:

    • You can use tools like osm2pgsql for importing OSM data.

    • If using population density data, import this into your database as well.

    • Use the osm2pgsql tool to import the data. Instructions can be found here. The basic command looks like:

      osm2pgsql -d urban_green_spaces -U postgres -W your_osm_data.osm.pbf
      
    • You’ll be prompted for your postgres user’s password.

Approach:#

  • After importing the OSM data into the PostGIS database, use pgAdmin or SQL queries to explore the data. Identify the tables and columns relevant to green spaces, often tagged as leisure=park in OSM data.

  • Perform a query to detect any anomalies, such as green spaces with null or extremely large areas that could indicate data entry errors.

SQL Query Example:#

SELECT id, name, ST_Area(way) as area
FROM planet_osm_polygon
WHERE leisure = 'park'
ORDER BY area DESC;

Cleaning Actions:#

  • Suppose you find a park with an unusually large area, a closer inspection might reveal it’s due to incorrect boundary definitions. You could correct this by editing the boundary data directly in the database or using a GIS tool like QGIS.

  • Remove or correct any green spaces with null names or areas to ensure data quality.

  1. Create a green_spaces Table:

    • In pgAdmin, open a SQL query window for your urban_green_spaces database and run:

      CREATE TABLE green_spaces (
          id SERIAL PRIMARY KEY,
          name VARCHAR(255),
          location GEOMETRY(Point, 4326),
          area_sq_m NUMERIC
      );
      
  2. Populate green_spaces Table:

    • Still in the SQL query window, insert data into your green_spaces table from the OSM data. For example:

      INSERT INTO green_spaces (name, location, area_sq_m)
      SELECT name, ST_Centroid(way), ST_Area(way)
      FROM planet_osm_polygon
      WHERE leisure = 'park';
      

Step 3: Spatial Analysis#

  1. Analyze Green Space Distribution:

    • Write SQL queries to analyze the green spaces, such as finding the total number of parks, total area covered by parks, and average park size. For example:

      SELECT COUNT(*) AS total_parks, SUM(area_sq_m) AS total_area, AVG(area_sq_m) AS average_park_size
      FROM green_spaces;
      

SQL Query Example 1: Total Area of Green Spaces:#

SELECT SUM(ST_Area(way)) as total_green_space_area
FROM planet_osm_polygon
WHERE leisure = 'park';

Example Result:#

  • “The total area of green spaces in the city is approximately 15 square kilometers.”

SQL Query Example 2: Top 5 Largest Green Spaces:#

SELECT name, ST_Area(way) as area
FROM planet_osm_polygon
WHERE leisure = 'park'
ORDER BY area DESC
LIMIT 5;

Example Result:#

  • The query lists the top 5 largest green spaces by name and area, e.g., “Central Park: 3.41 sq km, Green Park: 2.58 sq km, …”

SQL Query Example 3: Average Distance to Nearest Green Space for Population Density Quartiles:#

This query is more complex and assumes the presence of population density data and requires the use of spatial functions to calculate distances.

WITH quartiles AS (
  SELECT ntile(4) OVER (ORDER BY population_density) as quartile, id
  FROM population_data
)
SELECT quartile, AVG(distance) as avg_distance_to_park
FROM (
  SELECT q.quartile, q.id, MIN(ST_Distance(p.geom, g.geom)) as distance
  FROM quartiles q
  JOIN population_data p ON q.id = p.id
  JOIN planet_osm_polygon g ON g.leisure = 'park'
  GROUP BY q.quartile, q.id
) as distances
GROUP BY quartile;

Example Result:#

  • “The average distance to the nearest green space increases from 0.5 km in the highest population density quartile to 2 km in the lowest quartile.”

  1. Commit Your SQL Scripts to Git:

    • Save your SQL queries in a file, e.g., analysis.sql, within your project folder. Add this file to your Git repository using:

      git add analysis.sql
      git commit -m "Add spatial analysis queries"
      

Step 4: Visualization with QGIS (Optional)#

Approach:#

  • Use QGIS to load the green spaces layer from your PostGIS database.

  • Apply a thematic style to differentiate the top 5 largest parks, perhaps using a distinct color or pattern.

  • If conducting accessibility analysis, use graduated symbols or heatmaps to represent areas with varying accessibility levels.

Example Visualization:#

  • A map showing all green spaces in the city with the top 5 largest highlighted. If population data was used, a heatmap overlay could indicate areas with lower accessibility to green spaces.

  1. Visualize Green Spaces:

    • Open QGIS and connect to your PostgreSQL database (Layer > Add Layer > Add PostGIS Layers…).

    • Add the green_spaces table as a layer and apply styling to visualize the parks.

  2. Export Your Map:

    • Once you’re happy with your map, export it as an image (Project > Import/Export > Export Map to Image…) and save it to your project folder.

  3. Commit Map to Git:

    • Add the exported map image to your Git repository and commit:

      git add your_map_image.png
      git commit -m "Add green spaces map visualization"
      

Step 5: Advanced Analysis (Optional)#

Approach:#

  • For a network analysis, you might calculate the actual walking distance to parks along the road network rather than straight-line distance.

  • Ecological impact assessments might involve overlaying green spaces with biodiversity data to assess their ecological value.

Network Analysis Example:#

SELECT name, AVG(distance) as avg_walking_distance
FROM (
  SELECT g.name, p.id, ST_Distance(ST_Transform(p.geom, network_srid), ST_Transform(g.geom, network_srid)) as distance
  FROM population_data p, planet_osm_polygon g, road_network r
  WHERE g.leisure = 'park'
  AND p.geom && r.geom
  AND ST_DWithin(ST_Transform(p.geom, network_srid), ST_Transform(r.geom, network_srid), 50) -- within 50 meters of a road
  ORDER BY p.id, distance
) as walking_distances
GROUP BY name;

Example Result:#

  • “The average walking distance to Central Park is 0.75 km, indicating good accessibility for nearby residents.”

Step 6: Finalize and Share Your Project#

  1. Write a Project Report:

    • Document your project in a Markdown file, README.md, including your objectives, methods, findings, and visualizations.

  2. Push to GitHub:

    • Push your local commits to your GitHub repository:

      git push origin master
      
  3. Share Your Project:

    • Share the link to your GitHub repository in the assignment submission on Canvas.

Documentation and Submission:#

  • Code and Queries: Document all SQL queries and any scripts used for data cleaning or analysis in your Git repository.

  • Analysis Report: Write a report detailing your methodology, analyses, and findings. Include screenshots from QGIS of your visualizations.

  • Repository Link: Submit the link to your Git repository containing all code, queries, and your analysis report.

Deliverables:#

  • A populated PostgreSQL database with PostGIS enabled and containing your analysis.

  • A Git repository containing:

    • SQL scripts used for analysis.

    • A QGIS project file with your visualizations.

    • A README.md documenting your project.

  • Visualizations and maps of urban green spaces within your chosen city (Optional).

Tips for Success:#

  • Start by familiarizing yourself with the tools and datasets. Small, incremental steps can help avoid overwhelming situations.

  • Use Git to commit changes regularly, especially after completing significant steps or solving complex problems.

  • Engage with your peers and instructors for ideas, troubleshooting, and feedback on Slack.

This assignment is designed to provide a practical introduction to spatial database development, integrating real-world data and tools used in the industry. It offers a foundation on which more complex spatial analyses and projects to be built throughout the semester.


Assignment Rubric#

This rubric is designed to evaluate the completeness, accuracy, depth of analysis, technical skills, and presentation quality of the student submissions.

1. Data Exploration and Cleaning (20 Points)#

  • Completeness (5 Points): Successfully imported and identified relevant data in the database.

    • 5: All relevant datasets imported; correctly identified green spaces.

    • 3: Most relevant datasets imported; minor issues in identifying green spaces.

    • 1: Relevant datasets partially imported; significant issues in identifying green spaces.

  • Accuracy (5 Points): Accurate identification and correction of data anomalies.

    • 5: Correctly identified and fixed all anomalies.

    • 3: Identified most anomalies; minor errors in corrections.

    • 1: Identified only some anomalies; significant errors in corrections.

  • Documentation (10 Points): Clear documentation of the exploration and cleaning process.

    • 10: Comprehensive documentation, including SQL queries and rationales for decisions.

    • 6: Adequate documentation with some details missing.

    • 3: Minimal documentation, lacking clarity or detail.

2. Spatial Analysis (30 Points)#

  • Query Complexity (10 Points): Complexity and correctness of SQL queries.

    • 10: Complex queries with correct syntax and logic, fulfilling all requirements.

    • 6: Moderately complex queries, minor syntax or logic errors.

    • 3: Simple queries, significant errors or omissions.

  • Analysis Depth (10 Points): Depth of spatial analysis performed.

    • 10: In-depth analysis with insightful findings, beyond basic requirements.

    • 6: Satisfactory analysis that meets basic requirements.

    • 3: Superficial analysis, lacking depth.

  • Accuracy (10 Points): Accuracy of results and conclusions drawn from the analysis.

    • 10: Highly accurate results, logical and supported conclusions.

    • 6: Mostly accurate results, some unsupported or illogical conclusions.

    • 3: Inaccurate results, flawed or illogical conclusions.

3. Visualization (Optional) (20 Bonus Points)#

  • Clarity and Aesthetics (10 Points): Visual clarity and aesthetic quality of the maps and charts produced.

    • 10: Highly clear and aesthetically pleasing visualizations.

    • 6: Clear visualizations with minor aesthetic issues.

    • 3: Poorly designed visualizations, lacking clarity or aesthetic appeal.

  • Relevance and Insightfulness (10 Points): Relevance of visualizations to the analysis and the insights they provide.

    • 10: Highly relevant, insightful visualizations that enhance understanding.

    • 6: Generally relevant visualizations with some insightful aspects.

    • 3: Visualizations are somewhat relevant but offer little insight.

4. Advanced Analysis (Optional) (10 Bonus Points)#

  • Complexity and Creativity (5 Points): Complexity and creativity of the advanced analysis.

    • 5: Highly complex and creative analysis, demonstrating advanced skills.

    • 3: Moderately complex, shows some creativity.

    • 1: Minimal complexity, lacking creativity.

  • Contribution to Overall Understanding (5 Points): How much the advanced analysis contributes to the overall understanding of the topic.

    • 5: Significant contribution, offering deep insights.

    • 3: Moderate contribution, some new insights.

    • 1: Minimal contribution, little to no new insights.

5. Documentation and Presentation (20 Points)#

  • Clarity and Organization (10 Points): Overall clarity and organization of the project report and code.

    • 10: Exceptionally clear and well-organized, logical flow, easy to follow.

    • 6: Clear and organized, minor issues with flow or organization.

    • 3: Disorganized or unclear, difficult to follow.

  • Use of Git (10 Points): Effective use of Git for version control, including commit messages and repository organization.

    • 10: Excellent use of Git, meaningful commit messages, well-organized repository.

    • 6: Adequate use of Git, some meaningful commit messages, generally organized.

    • 3: Minimal use of Git, unclear commit messages, poorly organized repository.

Total Points: 70 (100 with Bonus)#