8000 GitHub - delaney-data/SQL-DataAnalysis: Using SQL to build insights and analysis.
[go: up one dir, main page]

Skip to content

delaney-data/SQL-DataAnalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

19 Commits
ย 
ย 

Repository files navigation

Data Analysis with SQL

A walk-through using intermediate SQL statements to provide analytical insights for Data Science or Business Intelligence from a database. This project assumes the reader knows the basic syntax and select statements.

Concepts covered:

  • Database
    • Altering tables, creating new columns
    • Inserting values, Arithmetic Operators
    • Views
  • Functions
    • Aggregating values
  • SQL statements using JOIN, GROUP BY, LIMIT

Utilities & Enviroments Used:

Table of Contents

  1. Context of the Dataset
  2. Database Schema
  3. Data Analysis - Question #1
  4. Data Analysis - Question #2
    • [In Progess]

The Problem [ The Digital Divide ]

As the technology in the world continues to advance and more services are exclusively accessed via the Internet, the impact of the Digital Divide (a gap between those who have access to digital technology and those who do not) can be felt harder to those on the receiving end.

The Digital Divide issue is complex and is a culmination of a Lack of Technology, a Lack of Internet Access and a Lack of Digital Literacy. The coronavirus crisis has shown that not being digitally connected, or not having internet access in the household, can result in negative affects that range from:

๐Ÿง‘โ€๐Ÿ’ป Lack of job opportunities & employment

  • Remote or Hybrid work
  • Applying to Jobs
  • Lack of Job-ready Skills (navigating the internet, programs, etc)

๐Ÿ“š Lack of educational resources

  • Online Schooling
  • Online Education resources

โ›” Restricted from internet-only services

  • Account Activation / Cancellation
  • Paying Bills Online
  • Paperless Documentation
  • Ordering Services (Food, Supplies, etc)
  • Smart Devices that require internet to fully function

Notes & Prerequisites

I will be analyzing data focusing specifically on the Lack of Internet.

Prior to this chapter, I have completed the prerequisite step of importing the dataset into PostgreSQL database.

A note on the dataset:

  • According to the source data, data in the file was only collected in US counties where the population was greater than 65K in the year 2016.

Content: This dataset contains data for counties with population over 65000, compiled from the 2016 ACS 1-year estimate. ACS 1-year estimates only summarize data for large geographic areas over 65000 population. It provides sufficient data for us to gain insight into internet use.

Database Schema

County_Pop Table

Column Name Data Type, Constraint Description
id_pop integer, primary key Used as primary key for the county population table.
county varchar Each county per state
state varchar Each state
p_total integer Total Population in numbers per county
p_white integer Population grouped by race: white
p_black integer Population grouped by race: black
p_asian integer Population grouped by race: asian
p_native integer Population grouped by race: native
p_hawaiian integer Population grouped by race: hawaiian
p_others integer Population grouped by race: other
percent_no_internet numeric Percentage without Internet
**pop_no_internet decimal Population in numbers without Internet

** I manually added new this column (as it did not exist in the dataset) and inserted values into the rows in order to solve our Analysis Question #1.

Education_Income Table

Column Name Data Type, Constraint Description
id integer, primary key Used as primary key for the education and income table.
county varchar Each county per state
state varchar Each state
p_below_middle_school integer Education level per population: Below Middle School
p_some_high_school integer Education level per population: Some High School
p_high_school_equivalent integer Education level per population: High School Equivalent
p_some_college integer Education level per population: Some College
p_bachelor_and_above integer Education level per population: Bachelors and above
p_below_poverty integer Income Level per population: Below Poverty
median_age numeric Median age of household per county
median_household_income numeric Median Household Income per county
median_rent_per_income numeric Median Rental Income per county

Data Analysis

Question 1: What are the top 10 states with the highest number of people without internet?