Data Analytics Notes
1. What are the characteristics of data in data analytics?
Data is the foundation of data analytics, and its characteristics play a crucial role in determining the
accuracy, reliability, and effectiveness of analytical outcomes. Below are the key characteristics:
- **Volume**: Large amounts of data require efficient storage and processing techniques.
- **Variety**: Includes structured, semi-structured, and unstructured data.
- **Velocity**: Speed at which data is generated and processed.
- **Veracity**: Accuracy and reliability of data.
- **Value**: Usefulness of data for decision-making.
- **Variability**: Changes in data patterns over time.
- **Completeness**: Ensures no critical data is missing.
- **Timeliness**: Data must be up-to-date.
- **Accessibility**: Ease of data retrieval and sharing.
- **Granularity**: Level of detail in the data.
---
2. What are the steps in the discovery phase of data analytics?
- **Define the Business Problem**: Identify challenges and goals.
- **Identify Key Metrics**: Determine relevant KPIs.
- **Understand Data Requirements**: Identify necessary data sources.
- **Data Collection and Exploration**: Gather and examine data.
- **Assess Data Quality**: Clean and validate data.
- **Identify Analytical Approaches**: Choose statistical or machine learning methods.
- **Develop Hypotheses**: Formulate and test assumptions.
- **Prepare a Project Plan**: Define timelines and responsibilities.
- **Communicate Findings**: Present initial insights.
---
3. What are the steps in the data preparation phase of data analytics lifecycle?
- **Data Collection**: Gather data from multiple sources.
- **Data Integration**: Merge data from different formats.
- **Data Cleaning**: Remove missing values, duplicates, and inconsistencies.
- **Data Transformation**: Normalize and encode variables.
- **Feature Engineering**: Create new relevant features.
- **Data Reduction**: Reduce dataset size while maintaining accuracy.
- **Data Validation**: Ensure correctness and consistency.
- **Data Storage**: Store processed data securely.
---
4. What is Linear Regression?
Linear Regression is a statistical method used to model the relationship between a dependent
variable (Y) and one or more independent variables (X).
- **Simple Linear Regression**: Y = mX + c (one independent variable)
- **Multiple Linear Regression**: Y = b0 + b1X1 + b2X2 + ... + bnXn (multiple independent variables)
**Example**: Predicting sales based on advertising spend.
**Applications**: Used in business forecasting, finance, healthcare, and engineering.
---
5. What is a Digital Analytics Sandbox?
A Digital Analytics Sandbox is a secure environment where data is tested before applying it in
real-world scenarios.
**Purpose**:
- Experimenting with analytics tools.
- Ensuring data accuracy before deployment.
- Enhancing data privacy and security.
**Example**:
A retail company tests a customer tracking system before launching it on its website.
**Applications**:
- Website & App Analytics
- E-commerce Optimization
- Marketing Campaign Analysis
- Big Data Processing