NUS GEA1000 Quantitative Guide

biz law

Uploaded by

christiinelhz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views3 pages

NUS GEA1000 Quantitative Guide

biz law

Uploaded by

christiinelhz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

lOMoARcPSD|45105042

GEA1000-cheatsheet - Summary made.

Quantitative reasoning with data (National University of Singapore)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university

Downloaded by Christine (christiinelhz@gmail.com)
lOMoARcPSD|45105042

GEA1000 Summary Categorical Variables: Association: Positive / Negative Association: If there is

no association, we write that
changed, unlike bar graph. No gaps between bars in a
histogram.
AY22/23 Sem 2 Either category or label values (mutually exclusive,
variable cannot be placed in two different categories) rate(A|B) = rate(A|N B)
github.com/gerteck
Ordinal Variables: Natural ordering, numbers represent Four comparisons are mathematically equivalent:
order (e.g. Happiness)
1. Data Collection Nominal Variables: No intrinsic (e.g. Eye colour)
Biasness Numerical Variables:
• Selection Bias: Associated with Researcher’s Biased
Discrete Variables: Possible values form a set of numbers
selection of units. Imperfect sampling frame (units
with ”gaps” e.g. Number of siblings Symmetry Rule on Rates:
excluded). Caused by non-probability sampling.
Continuous Variables: Can take on all possible values in
• Non-Reponse Bias: Associated with Participants’ non an interval e.g. Time. Boxplots
participation, or non-disclosure of (sensitive)
Summary Statistics for Numerical Variables • Five Number Summary: Minimum, Q1 (25th), Median
information.
(Q2) , Q3 (75th), Maximum.
Central Tendency Measures: Mean, Median, Mode
Probability Sampling Mean: Adding constant value to all changes mean by that • Outliers: Greater than Q3 + 1.5 ∗ IQR or smaller than
Four types. Every unit has a known non-zero probability value. Multiplying all changes mean similarly. Dispersion Q1 − 1.5 ∗ IQR.
of being selected (need not be same). Element of chance Measures: Standard deviation, Inter-quartile Range
to eliminate bias. Randomized mechanism. Standard deviation: distance between each point and the
• SRS: Simple Random: All units selected randomly mean. Measure of data distribution/spread. Basic Rule on Rates: The overall rate(A) will always lie
without replacement, with equal chance. Subject to between rate(A|B) and rate(A|N B)
non-response. • Simpson’s Paradox: is a phenomenon in which a trend
appears in more than half of the groups of data but
• Systematic Sampling: Apply some selection interval k
disappears or reverses when the groups are combined.
and random starting point from the first interval. List Coefficent of Variation: Here, “disappears” means the two variables in question • Understanding boxplots: Shape, and Spread.
should be random.
(say A and B) are no longer associated. Rate of A given Shape: left-skewed vs right-skewed (variability of data
• Stratified Sampling: (some units of all groups) Divided B is now equal to rate of A given not B. on lower and upper half respectively).
into strata based off similar nature, size may vary. SRS Centre: Described by Median. Cross represents mean.
Median: Middle value of (ascending/descending ordered) • Confounder: A confounder is a third variable that is
to each strata. We can compare the relative positions of the median and
data set. Overall median will always be between lowest associated with both the independent and dependent
• Cluster Sampling: (whole cluster of only certain and highest median amongst all subgroups. variables whose relationship is being investigated. (Can mean from the boxplot.
clusters): Divide into clusters. Fixed number of clusters Quartile 1: 25th percentile value, be positive or negative association.) Spread: IQR gives us idea of the spread for the middle
chosen using SRS, which all units are used. Quartile 3: 75th Percentile value. IQR: Q3-Q1. They can be addressed by the splicing of data according 50% of the data set, used to measure across different
Mode: Value that appears the most often. to the confounding variable or by randomized distributions.
assignment (general solution across all confounders). Boxplots vs. Histograms:
• Observation of the Simpson’s paradox implies that Histogram: Better sense of shape of distribution of a
there is definitely a (third) confounding variable present. variable. Boxplot: Better identifies and indicates outliers.
Experimental Study However, existence of confounder does not necessarily Bottom line: Used together to complement each other.
lead to Simpson’s paradox, nor does lack of observation
Controlled experiment, manipulate independent variable
imply lack of confounder.
to observe effect on dependent variable. Goal is to provide Bivariate EDA
evidence for cause-effect relationship. Make sure
Non Probability Sampling
independent variable is the only factor, through random
3. Numerical Data Focus on relationship between two variables in a
Selection not done by randomisation but by human assignment. (Uses probability to allocate subjects into Univariate EDA population.
discretion. Broad Types include: (Non mutually exclusive) treatment and control groups) By law of probability,
Quota, Convenience, Judgement, Volunteer Samplings. Exploratory Data Analysis of Univariate (one variable) • Deterministic Relationship: Value of one variable can
subjects will tend to be similar in all aspects.
• Convenience Sampling: Subjects most easily available numerical data: Consider Distribution, Histograms, be determined exactly from the other. (e.g. Conversion
Placebo: Inactive substance, likely caused by the
to participate, e.g. Mall surveys Boxplots. of units of measurement, m ⇔ f t, ◦ C ⇔◦ F .)
psychology of believing.
Describing Distributions (Overall Pattern + Deviations): • Association (Non-Deterministic) Statistical relation,
• Volunteer Sampling: Self-selected sample, biased and Double Blinding: Patients and researchers both unaware
Focus on shape, centre and spread of distribution, and given one variable value, we can describe average value
non representative. of grouping.
outliers. Can be in the form of (mode) multimodal of the other variable.
Approach + Generalizability Criteria Observational Study distribution (local maxima), unimodal, (Standard
Variation, range of distribution) low variability vs. high • Consider scatterplots (idea of pattern), correlation
• Choose Sampling frame. (Larger than or equal to target Used when there are ethical issues. Observes individuals coefficients (check for linear relation) and regression
and measures variable of interest, without direct variability, and outliers.
population, members of target pop must not be left out. analysis (fitting line or curve to data).
manipulation of variables. Does not provide convincing Median and Mode are robust statistics - Outliers have
• Sample from Sampling frame (Decide if Probability little to no effect on these values. (e.g. median salary)
evidence of cause-effect relationship, and only
Sampling in sample frame is feasible.) Scatter Plots
Association. Histograms
• Remove unwanted Units.
• Generalizability Criteria: Good sampling frame that 2. Categorical Data • Graphical representation that organises data points into Direction, Form, Strength and Outliers.
covers target population, probability based sampling Joint Rate: Chance of an event occurring out of all the ranges/bins. Useful for large data sets.
• Direction: Positive / Negative relationship or neither
(Need to be used to minimise selection bias), large possible outcomes: • Histogram vs. Bar Graph: A histogram shows the (curved).
sample size (Helps to reduce variability of data, reduce Conditional Rate: Based on a given condition (X) , in distribution of a numerical variable across a number line,
• Form: General shape, classify as linear or non-linear.
error amount in sample estimate, Minimal non-response which rate of success/failure is found. but a bar graph makes comparisons across categories of
rate. Downloaded by Christine (christiinelhz@gmail.com)
Rate(Success|X) a variable. Orderings of bar in histogram cannot be • Strength: How closely data follows form.
lOMoARcPSD|45105042

Correlation Coefficient, r Probability Random Variables This means: we are 95% confident that the population
Correlation coefficient between two numerical values, r, is A random variable is a numerical variable with proportion (parameter in this case) of food transactions
Probability as a mathematical means to reason about that are from Terrace (a certain category), lies within the
a measure of linear association between them. Always uncertainty. probabilities assigned to each of the possible numerical
ranges between -1 and 1. values taken by the numerical variable. Conceived as confidence interval.
• Sign and Magnitude of r: Tells us about the direction • Sample Space: Collection of all possible outcomes of a Idea of confidence level: 95 of 100 SRS of same size
mathematical way to model data distribution.
of the linear association. If r > 0, association is probability experiment. will contain population parameter. (Exact value not
• May be Discrete or Continuous Random Variables.
positive, when one increases the other tends to increase • Event: Subcollection of the sample space is an event. known) (** Not 95% chance, chances are in sampling
Visualisation: (respectively)
as well. r < 0, association is negative, increase in one procedure, not parameter.)
• Rules of Probability: Probability of an event E, P (E),
variable leads to decrease of the other. If r = 1 or is between 0 and 1 inclusive. Probability of entire • Properties of CI: The larger the sample size, the smaller
r = −1, there is perfect positive/negative association. sample space P (S) is 1. the random error, narrower CI. The higher the
When r = 0, there is no linear association. Magnitude confidence level, the wider the CI. CI is way to quantify
of r tells us the strength of the linear association. • If E and F are mutually exclusive events, then the
random error.
Approx: (0 - 0.3 weak, 0.3 - 0.7 moderate, 0.7 - 1 strong) probability of E union F is equal to the sum of the
probabilities of E and F. That is, • For discrete rv, sum of probabilities assigned to each Hypothesis Testing
• Calculation of r: P P P P (E ∪ F ) = P (E) + P (F ). outcome must equals 1. For continuous rv, area under
1. Null and Alternative Hypothesis.
r = √ Pn(2 xy)−( x)( y)
P 2 P 2 P 2 • Uniform Probability and Rates: Way of assigning
density curve is always equal to 1.
• Null hypothesis usually asserts stand of no effect /
[n x −( x) ][n y −( y) ]
probabilities to outcomes such that equal probability is difference. Alternative is what we wish to confirm and
assigned to every outcome in the finite sample space.
Normal Distributions
• Properties of r: r is not affected by adding a number to pit against null hypothesis. (Mutually exclusive) e.g.
all values of a variable, or by multiplying a positive Relevant in random sampling. A class of continuous random variables. N (x, y). (bell Null Hypothesis H◦ : P (H) = 0.5
number to all values of a variable. curve god) Alt. Hypothesis: H1 : P (H) > 0.5
Conditional Probability and Independence • Normal Distributions only differ by means and 2. Collect data and determine test statistic.
• Limitations of r: Association is not causation.
variances. (mean x, variance y). • Testing usually involves some random variable, and its
r does not give indication of non-linear association. Conditional Probability is written using the notatoin
Outliers can affect the correlation coefficient r • Common Properties: Bell-shaped curve, Peak of curve probability distribution. (e.g. coin, vaccine safety)
P (E|F ) and read as ”probability of E given F”.
significantly. occurs at the mean, Curve is symmetrical about the 3. Set level of significance and compute p-value.
P (E∩F ) mean. (Mean = Mode = Median).
P (E|F ) = P (F ) • Significance level: How convincing evidence must be to
Linear Regression reject H◦
If we believe that two variables are linearly associated, we • Mutually Exclusive Events: No overlap between E and • The lower the S.L., the greater the evidence needed.
may model relationship by fitting a straight line to the F, meaning not simultaneously possible. Then, Commonly used is 0.05 level, or 5% level of Sig, or 0.1
observed data, known as linear regression. P (E ∩ F ) = 0. If an event F itself cannot occur, then (10%), or 0.01 (1%).
• The slope of the line is the amount of change in Y when by convention P (E ∩ F ) is also equal 0.
the value of X increases by 1. • p-value: Probability of obtaining test result at least as
• Law of Total Probability: extreme as result observed, assuming null hypothesis is
• Finding Regression Line: Method of least squares:
true.
Fit the line to minimize the square of error terms.
Also the probability of observing test result that
Hence, two regression lines are different and not Confidence Intervals favours alternative hypothesis at least as much as
interchangeable.
Using a sample statistic to estimate the population observed in current sample, assuming null hypo is true.
parameter is subjected to inaccuracies (bias / random
error).
• Analogy between Probability and Sampling: • A Confidence Interval is a range of values that is likely
to contain a population parameter based on a certain
degree of confidence. This degree of confidence is
known as the confidence level and is usually expressed 4. Compare p-value and level of significance.
as a percentage (%). • Hence, we reject null hypothesis in favor of alternate if
• Conditional Probabilities: equivalent to conditional
p-value < significance
rate: • To construct confidence intervals for population
(logically it is very unlikely)
P (A|B) = rate(A|B) proportion:
• Slope vs. Correlation Coefficient Slope of regression
q
p∗ (1−p∗ ) • However, if
• Independent Events: For independent events A and B, p∗ ± z ∗ × n p-value > significance
line and correlation coefficient related by:
the probability of A is the same as the probability of A
m = ssxy r given B.
where:
p∗ = sample proportion
We do not reject the null hypothesis
(cannot accept, does not mean H◦ is true) (we don’t
where sy is the standard deviation for y and sx is the P (A) = P (A|B) z ∗ = ”z-value” from standard normal distribution (table) know if observation is due to chance, inconclusive)
standard deviation for x. If we express conditional probability P (A|B) as: n = sample size
P (A∩B) • We only carry out hypothesis test with sample data.
• Important to remember that correlation coefficient is not P (B) • To construct confidence intervals for population mean When given population data, all can be determined.
necessarily equal to gradient of the regression line. µ:
then A and B being dependent means that x̄ ± t∗ × √sn
• Extrapolation: Prediction beyond the observed range Common Hypothesis Tests: One-sample t-test and
P (A) ∗ P (B) = P (A ∩ B)
is dangerous (Not advisable) where: Chi-squared test:
which is an equivalent definition for two independent
• Linear Regression on Non-Linear Models: Model µ = sample mean
events.
relationship indirectly (e.g. property of log) to form a t∗ = ”t-value” from t-distribution (table)
• Independence as non-association: A and B are s = sample standard deviation
linear relation.
independent event whenever A and B are not associated n = sample size
4. Statistical Inference with each other.
• Interpreting Confidence Interval:
Statistical Inference is the use of samples to draw • Independent Probability Experiments: E.g. Coin toss, Two parts: Confidence Level (e.g. 95%) and Interval
inferences or conclusions about population in question. Downloaded
where one instance is independent by Christine (christiinelhz@gmail.com)
of the other. (e.g. 0.254 ± 0.0191 [margin of error])

G's GEA1000 Cheatsheet
No ratings yet
G's GEA1000 Cheatsheet
2 pages
Gea1000 Cheatsheet Summary Made
No ratings yet
Gea1000 Cheatsheet Summary Made
6 pages
NUS GEA1000 Finals Cheatsheet
No ratings yet
NUS GEA1000 Finals Cheatsheet
3 pages
GEA1000 Finals
No ratings yet
GEA1000 Finals
2 pages
Stats Midterms Cheat Sheet
No ratings yet
Stats Midterms Cheat Sheet
3 pages
GEA1000 Finals Cheatsheet
No ratings yet
GEA1000 Finals Cheatsheet
2 pages
A. Variables:: Types of Distributions
No ratings yet
A. Variables:: Types of Distributions
10 pages
Mathematics Statistics
No ratings yet
Mathematics Statistics
4 pages
Gea1000 Finals Cheatsheet
No ratings yet
Gea1000 Finals Cheatsheet
2 pages
STATS Mids
No ratings yet
STATS Mids
10 pages
STAB22 Lecture's Notes
No ratings yet
STAB22 Lecture's Notes
64 pages
QR Midterm Memo
No ratings yet
QR Midterm Memo
2 pages
Introduction To Statistics Final
No ratings yet
Introduction To Statistics Final
30 pages
ST Formula Sheet Midterm
No ratings yet
ST Formula Sheet Midterm
4 pages
Bio Statistics
No ratings yet
Bio Statistics
72 pages
COMM 191 Reviewer
No ratings yet
COMM 191 Reviewer
17 pages
Intro to Statistics & Sampling
100% (1)
Intro to Statistics & Sampling
30 pages
Study Guide For Statistics
No ratings yet
Study Guide For Statistics
7 pages
Unit-2 Data Analytics Approaches
No ratings yet
Unit-2 Data Analytics Approaches
24 pages
Economics Stats Guide
No ratings yet
Economics Stats Guide
10 pages
Statistics For Data Analysis
No ratings yet
Statistics For Data Analysis
13 pages
Finals CheatsheetMatthew
No ratings yet
Finals CheatsheetMatthew
2 pages
Gea Cheatsheet
No ratings yet
Gea Cheatsheet
3 pages
Introduction To Statistics 2024-2025
No ratings yet
Introduction To Statistics 2024-2025
40 pages
Reviewer Stat Midterm
No ratings yet
Reviewer Stat Midterm
4 pages
Bio Statistics
No ratings yet
Bio Statistics
55 pages
List of Important AP Statistics Concepts To Know
No ratings yet
List of Important AP Statistics Concepts To Know
9 pages
Introduction to Data & Statistics
No ratings yet
Introduction to Data & Statistics
21 pages
Unit II: Basic Data Analytic Methods
No ratings yet
Unit II: Basic Data Analytic Methods
38 pages
Math 140 Final Review Notes
No ratings yet
Math 140 Final Review Notes
20 pages
Data Analysis for Business Students
No ratings yet
Data Analysis for Business Students
27 pages
Year 12 - Maths Standard Notes
No ratings yet
Year 12 - Maths Standard Notes
5 pages
Math Notes Module 4A
No ratings yet
Math Notes Module 4A
4 pages
WK 1 3
No ratings yet
WK 1 3
5 pages
ISOM Cheat Sheet 1
No ratings yet
ISOM Cheat Sheet 1
6 pages
Statisitcs
No ratings yet
Statisitcs
22 pages
Matht Reviewer
No ratings yet
Matht Reviewer
7 pages
Untitled Document
No ratings yet
Untitled Document
3 pages
Untitled Document (3) 2
No ratings yet
Untitled Document (3) 2
3 pages
Psychology 117 Study Guide
100% (3)
Psychology 117 Study Guide
41 pages
Intro to Data Statistics
No ratings yet
Intro to Data Statistics
9 pages
Intro to Research Variables
No ratings yet
Intro to Research Variables
4 pages
RSU - Statistics - Lecture 1 - Final - myRSU
100% (1)
RSU - Statistics - Lecture 1 - Final - myRSU
44 pages
AA SL - Unit 1a - Representing Data (Statistics)
No ratings yet
AA SL - Unit 1a - Representing Data (Statistics)
74 pages
Psychological Statistics Reviewer
No ratings yet
Psychological Statistics Reviewer
8 pages
1.9 Data and Data Analysis
No ratings yet
1.9 Data and Data Analysis
31 pages
Module 2 - Statistical Foundations
No ratings yet
Module 2 - Statistical Foundations
108 pages
2statsnotes 1
No ratings yet
2statsnotes 1
24 pages
ISDS 361A - Cheat Sheet Exam 1 PDF
No ratings yet
ISDS 361A - Cheat Sheet Exam 1 PDF
2 pages
Statistics Guide
No ratings yet
Statistics Guide
27 pages
Data Management and Statistics
No ratings yet
Data Management and Statistics
10 pages
Gea Cheatsheet
No ratings yet
Gea Cheatsheet
4 pages
STA301 IMP Notes Headings and Some Questions Answers Prepared by
No ratings yet
STA301 IMP Notes Headings and Some Questions Answers Prepared by
32 pages
Topic Review 统计学考纲参考
No ratings yet
Topic Review 统计学考纲参考
14 pages
Chapter2-Statistical Analysis
No ratings yet
Chapter2-Statistical Analysis
86 pages
Probstats Reviewer
No ratings yet
Probstats Reviewer
3 pages
GEA1000 Helpsheet v2
No ratings yet
GEA1000 Helpsheet v2
2 pages
Statistics Course Overview
100% (3)
Statistics Course Overview
43 pages
Dhaval Shah: 4.1 Years of IT Experience in Software Development With Knowledge in Different Phases of
No ratings yet
Dhaval Shah: 4.1 Years of IT Experience in Software Development With Knowledge in Different Phases of
4 pages
Al-Maqrizi: Social & Scientific Life
No ratings yet
Al-Maqrizi: Social & Scientific Life
16 pages
Yogesh's Resume
No ratings yet
Yogesh's Resume
1 page
Class 8 New English Grammer 2024-25 Star Plus Publications
No ratings yet
Class 8 New English Grammer 2024-25 Star Plus Publications
36 pages
Pengaruh Motivasi Belajar Mahasiswa
No ratings yet
Pengaruh Motivasi Belajar Mahasiswa
13 pages
Mind-Map Ielts Writing Task 1,2 (Vol 2)
100% (1)
Mind-Map Ielts Writing Task 1,2 (Vol 2)
14 pages
Advanced Vector Rotations
No ratings yet
Advanced Vector Rotations
4 pages
Vsms Doll Book PDF
100% (1)
Vsms Doll Book PDF
698 pages
Concept Paper
No ratings yet
Concept Paper
4 pages
Volvo KAD Etc Operators
100% (1)
Volvo KAD Etc Operators
104 pages
El Mokhtar Et Al Final
No ratings yet
El Mokhtar Et Al Final
18 pages
Haier: Taking A Chinese Company Global: Rituraj Paul - PGP/23/287 - Section E
No ratings yet
Haier: Taking A Chinese Company Global: Rituraj Paul - PGP/23/287 - Section E
4 pages
T45 Acm
0% (1)
T45 Acm
171 pages
Health Statistics - Lesson 1
No ratings yet
Health Statistics - Lesson 1
40 pages
Azrilyan Pleading Final
No ratings yet
Azrilyan Pleading Final
18 pages
Software Engineering
No ratings yet
Software Engineering
10 pages
LESSON PLAN IN MAPEH 9 (Physical Education) : I. Objectives II. Subject Matter A. Health-Related Laws
No ratings yet
LESSON PLAN IN MAPEH 9 (Physical Education) : I. Objectives II. Subject Matter A. Health-Related Laws
2 pages
Chuyên đề ngữ pháp Tiếng anh Tiểu học Verb to be
No ratings yet
Chuyên đề ngữ pháp Tiếng anh Tiểu học Verb to be
9 pages
Cisco Industrial Ethernet 5000 Series Switches - Cisco
No ratings yet
Cisco Industrial Ethernet 5000 Series Switches - Cisco
10 pages
Practical Case 1
No ratings yet
Practical Case 1
7 pages
IB Diploma Programme Physics Standard Level Internal Assessment
No ratings yet
IB Diploma Programme Physics Standard Level Internal Assessment
11 pages
2 3 2016 Used Element Bill Nye Water Cycle Graphic Organizer Video
No ratings yet
2 3 2016 Used Element Bill Nye Water Cycle Graphic Organizer Video
3 pages
Soil and Its Origin and Types of Soil Reporting
No ratings yet
Soil and Its Origin and Types of Soil Reporting
33 pages
Industry and Industrial Dispute
No ratings yet
Industry and Industrial Dispute
4 pages
The Semiotics of Culture and (The Concept of Text)
No ratings yet
The Semiotics of Culture and (The Concept of Text)
8 pages
Questionnaire For Charitable Organizations
No ratings yet
Questionnaire For Charitable Organizations
5 pages
Kobra 2 Max How To Set Up Kobra 2 Max Printer Parameters and Import
No ratings yet
Kobra 2 Max How To Set Up Kobra 2 Max Printer Parameters and Import
6 pages
Strategic Management Text and Cases 10th Edition Gregory Dess Gerry McNamara Alan Eisner SeungHyun Lee Ebook and TestBank Bundle Unlocked Test Bank
No ratings yet
Strategic Management Text and Cases 10th Edition Gregory Dess Gerry McNamara Alan Eisner SeungHyun Lee Ebook and TestBank Bundle Unlocked Test Bank
352 pages
Resume - Lenard Santos
No ratings yet
Resume - Lenard Santos
2 pages
Violin History and Anatomy Guide
100% (1)
Violin History and Anatomy Guide
4 pages

NUS GEA1000 Quantitative Guide

Uploaded by

NUS GEA1000 Quantitative Guide

Uploaded by

lOMoARcPSD|45105042

GEA1000-cheatsheet - Summary made.

Quantitative reasoning with data (National University of Singapore)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university

GEA1000 Summary Categorical Variables: Association: Positive / Negative Association: If there is

You might also like