[go: up one dir, main page]

0% found this document useful (0 votes)
52 views22 pages

Dev Unit 5

yhtr5y
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
52 views22 pages

Dev Unit 5

yhtr5y
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 22
Unit V MULTIVARIATE AND TIME SERIFS ANALYSIS, Introducing a ‘Third Variuble - Causal Explanations ~ Three-Variable Contingency Tables and. Beyond - Longitudinal Data - Fundamentals of TSA - Characteristics of time series data — Data Cleaning — ‘Time-hused indexing — Visualizing — Grouping Resampling Data Exploration and Visualization 4 plain a Third Variables, A Uurd variable problem occurs when an observed correlation between two variables can actually be explained by a third variable that hasn't been accounted for, When this third variable is not taken into account, the correlation between the two variables under study can be misleading und even confusing, Example 1: Dogs & Fire Hydrants A researcher observes that cities with more fire hydrants tend to also have more dogs. However, these two variables ate only correlated because they both have a high correlation with a third variable: population size. Larger cities tend to have both more fire hydrants and more dogs. Conversely, smaller cities tend Jo have fewer fire hydrants and fewer das, ibid “Padeepz App we “Sa eee =: ee Example 2: Ire Cream Sales & Shark Attacks A researcher finds that ice cream siles und shark attacks ure highly positively correlated However, these two variables are only correlated because they both have a high correlation votha third variable: temperiture When it’s warmer out, more people buy iee cream and more people swim in the ocean which explains why the vilues for both ice eream sles and shark allcks (end lo increase during the same times of the year. 2. Whats relationship between variables? A survey is conducted and an interesting st jon between X and V is discovered. There are two basic assumptions that have to be made if we wish to infer from this that X may cause Y “These myolve the relationship between X and Y and other vanables which might be operating, They are designed to ensure that when we compare groups which differ on X, we are comparing like with like. Before giving an exposibon of these assumptions, we need a bit more terminology: other variables ean be causally prior to both X and Y, imtervene between X und ¥, or ensue from X und ¥, as shown helovs. Figure 11.2 Different causal relationships beeween variables, 3. What are all the assumption can be made in casual relationships between variables? ‘There are three assumptions can be made in case of casual relationship berween the variables Assumption 1 “X is causally prior to Y*. There is nothing in the data to tell us whether X causes Y or ¥ causes X, so we have to make the mast plausible assumption we can, based an aur knowledge of the subject mater and our theorencal framework, One particulsr problem comes fram situations in which we suspect that X and Y influence each ciher in a reciproviling prowess. “There ix: no way thal non-experimentil data can be made to yield (wo independent estimates of the effect al X on ¥ und of Yon X. Assumption 2 - “Related prior variables have heen controlled”, All other wuriahles which alfeet hth X and Y must be held canstint In ain expen ment, we e: be sure that there are no third variables which give rise to both X and Y because the only w: Which the randomized contral groups are allowed to vary isin terms al X, Assumption 3—“AJ] variable intervening between X and Y have been controlled. This assumption is not required before you can assume that there is a causal link between X and Y, but rs required if you aim to understand how Xs causing ¥ 4. What is spurious relationship? Spurious relaionsinp- un asso be explained by some third variable. between (wo variables appears La be causal but can in Fagl i control variable? Control variables the effects of potential “third variables” are mathematically: controlled in the data analysis process to highlight the relationship between the independent and dependent vanable; used {a establish the ertleris nf nonespuriousness in nomothelic causal relationships 6. Explain the casual relationship in detail. We consider ways of holding a third variable constant while assessing the relationship between twa others sinim PACE PZ App We will now developed some experience af handling harches of dara, summarizing fearures of their distributions, and investigating selationships between variables. We must now change gear somewhat and ask what it would tke for such relationships to be weated as satisfactory explanations, Hume suggested that 'We may define a cause ta hy un ohyect followed by another, and where all the objects, similar ta the firsl, are Followed by objects similar to the second, Or, in other words, where, if the first object had not been, the second never had existed’, Cau Direct and indirect effects Causality should not necessarily be understood as a simple precess in which one factor or vanable has am impact on unother For example, it is hkely in many cases that two oF more factors will tend to work together to produce an effect, Moreover, the factors. or variables contributing to the cffect may themselves be causally related, For this reason, we have to keep a. clear idea in our heads of the relationships between the variables in the whole causal process. In investigating the causes OF absenteeism Tram work, for esnmple, researchers have fund 4 different contributory factors. We will consider two possible causal factors: heing female and being in a low status job. Let us construct a causal path diagram depicting one possible set of relationships between these variables. Figwe 11.1 Causes of resencen, ‘The diagram in figure 11.1 represents simple sysiem of multiple causal paths. There 1s an. arrow showing that those in Jow status jobs are more likely to go absent. Being female has a causal effect im two ways. ‘There is un arrow sinaight to absentee behaviour, thes says thy women are more likely ta be absent fram work than men, regardless of the kind of job they are in This is termed a direct effect of gender on absenteeism. There is alse another way in which being female has an effect. women are more likely to be in the kind of low status, perhaps unpleasant, johs where absenteeism 16 more likely, irrespective af gender, We can say tht being, femate thereline also has in inditect effect on absenteeism, through the type af work performed Withaul seme emp sal evidence we canmal be sure that this ‘model’ of the relationships between the variables is correct, Gontrolling the world ta learn id causes i ivene dango deat LES oul AD a se ane 1 quite another {hing to find oul what they are (Causal prcesseS are hol obvious, ‘They hide 1n situations of complexity, in which effects may have been produced by several different causes acting together. When investigated, they will reluctantly shed one Jayer of explanation at a time, hut only to reveal another deeper level ef complexity beneath, Kor this reasan, something th: accepted as a satisfactory causal explanation at ene point in time can become problematic at annther Researchers investigating the canes af paycholagieal depressiem spent a long time carefully documenting how severe, waumatizing events that happen to people, such as bereavement or job loss, can induce it, Now that the causal effect of such life events has been established, the research effon is numing co ask how an event such as unemployment has its effect. is it trough the loss of suns! esteem, through the decline of self evaluation und self-esteem, through lack of cash or through the sheer effect afanactivaly? Do opinion polls influence people? Let us take an example ta illustate the differemt inferences which can be drawn from experiments snd nan-experiments, Some people believe that hearing the results of opinion polls before an election always individuals Lowanls the winning emdidate. Imagine two ways in which empincal evidence euld be collected five this proposition An experiment cauld be conducted by taking a largish group of clectors, splitting them into two at random, telling half that the polls indicated one candidate would win and telling the other half that they showed a rival would win. As long as there were a substantial number of people in each group, the groups would start the experiment having the same political preferences on average, since the groups were formed at random. If they difTered substantially m (heir subsequent supparl far the candidates, then we could be almost certain thar the phony poll information they were fed contributed to which candidate they supported. Altematively, the proposition could bs rescarched in a non-experimental way. A survey could be conducted to discover what individuals believed recent opinion polls showed, and to find out which candies the individuals themselves supported ‘The preferences of those who heliewed thal one cancilite was going to win would he compared with thase who believed thal the rival was going to win, The hypothesis would be thar the former would be moze sympathetic to the candidate than the latter, If the second survey did revenl a strong relationship between individuals’ perception of the state of public opinion and ele uy fy PSA opinion polls have a causal effect on nD epe Fei Pa banning polls in pre-election periods asa resull’? Anyone who e th fing af argument would be taken to task hy the pollsters, who have a commercial interest in resisting such reasomng, ‘They would deny thal the effect in any way proves that pills influence opinion: it could, fir instance, be thut supporters of a right-wing, candidate are of a generally conservative predisposition, and purchase newspapers which only report polls sympathetic to their candidate. In short, comparing individuals in a survey who thought that candidate A would win with those who believed tust candidate B would win, would nol be comparing twa groups simulur in all other possthle respects, unlike the experiment discussed above. An experiment would have a better chance of persuading people that the publication of opinion polls affected individual views Assumptions required {o infer causes Imagine a common situation, A survey is conducted and an interesting statistical association between X and ¥ is discovered. ‘Phere wre two basic assumptions thal have to be made if we 6 Wish 40 infer fiom this that X may cause Y, These involve the relationship between X and ¥ and other variables which might be operaring. They are designed to ensure that when we compare groups which differ on X, we are comparing like with like, Before giving an expastiion af these assumptions, we need a bil more terminology: other wunables em be causally prior to both X and Y, intervene between X and Y, or ensue trom X and Y, as shown in figure 11.2. ‘These terms are only relative (othe particular causal model im hand: ina different model we might want to explain what gave rise to the prior variable. Eosuing vanable Figure 11.2. Diferene cuusal relationships betweer varigbles. Let us discuss each af the two core assumptions in turn Assumption # X is causally prior to ¥. ‘There is nothing in the data to tell us whether X causes Y or ¥ causes X, so we iene nner CREED” AO Pee Assumption 2 Related prior variables have been controlled. All other variables which affect both X and Y must be held constant, In an experiment, we can be sure that there are no-third variables whieh give rise ta both X and ¥ because the only way in which the randomized comtral groups are allowed to vary is in terms of X, No such assumption eam he made with non-experimental dit. Assurnption 3 All variables intervening between X and Y have been controlled. ? ‘This assunaption is not required before you can assume that there is a causal link between X and Y, but it istequired ifynu sim to understane haw X as causing ¥ Let us first consider a hypothetical exemple drawn from the earlier discussinn of the causes of absenteeism Suppose previous research had shown a positive bivariate relationship berween low social status jabs and absentecism, The question arises: is there something about such jobs that directly causes the people who do them to go off sick more than others? Before we can draw such a conclusum, two assumptions have lo be made ‘Where are many possible outcomes nee the relationship between all three vanibles is considered af ance, faur of whieh are showvn in figure 11.3 — : cay Sey . oT tres 2 Figure 112 The effect of job status on absentesism: controling prior variable, Figure 11.4 Ouomel from figure 11.3, Figure 11.5 Quuumell from figure 11.3, Figure 11.6 Outcome ill from gure 11.3. If the relationship we ees ef eazy we sally prior variable is brought uler cone! bo he eek ors By tis we do no mean that the bivariate effect did not really exist, but rather that any'kausal conclusions drawn from it would be incorrect. We can naw introduce another meaning, for that verb ‘to explain'. in this situation, many researchers say thal the proportion of females in a job ‘explains’ the relationship between the slalus of the job ame absenteeism, in the sense thal it accounts far at entirely, But what of the fourth situafion which is actually the most hkely outeame? H was the situation portrayed in figure 11.1, vw 7 ‘Absentee behaviour Figure 11.7 Quicome IY from figure 11.3, 4. What {s Multiple and complex eausality? ‘Vhere are many different component causes can add together to produce a particular outcome, A process snmetimes known as multiple causality of complex causality 8. Whatis thick trust and thin trust’? Trust and honesty based on personal experience and on knowing people well over ¢ number of years is conceptualized as ‘thick ust, Lluwever, this is only possible wath a relatively small number of people and is therefore not os useful within a community as “thin truse. 9. ‘Three variuble cantingency duble und heyonil A three-way contingency table is a cross-elassification of observations by the levels of three eategoncal variables, 10. Explain about Longitudinal Data In detail, Longitudinal data, sometimes called panel data, is data that is collected through a series of repeated abservatinns of the same subjects aver some extended time frame —and rs useful for measuring change, [n most of the cases longitudinal data deals with the Iuman dara, Longitudinal dati effectively follows the same sumple over time, which chfTers fimdamentally from cross-sectional data because it follows the sume subjects over some time, while eross- sectional dita samples different subjects (whether individuals, firms, countmes, or regions) at cach point in time. Meanwhile, 2 cross-sectional data set will always draw a new random sample, Typeset ong nal SRG d ee pz A p p ‘The three main types of longitudinal studies are 1. Panel Study 2. Retrospective Study 3, Cohort Study “These methods help researchers to study variables amd account for qualitative amd «quantitative data from the research sample, 1. Panel Study Ina panel study, the researcher uses data collectinn methodls like surveys to gather information from a fixed number of variables at regular but distant intervals, often spinning into a few years. Is prmanly designed for quantititive research, although you cin use this method for qualitative data anal 10 When To Use Panel Study If you want to have first-hand, factual infermation about the changes in a sample population, then you should ont for a panel study. For example, medical researchers rely on panel studies 40 lentify the causes of age-related changes amd {heir consequences, Advantages of Panel Study © Ithelps you identify the causal factors of changes in.a research sample. + Taso allows you to witness the impact of these chunges an the properties of the variables and information needed at different points of their existing relationship. + Panel studies can be used to oblain historical data fram the sample popufatian Disudvamages of Panel Studies © Conducting a panel study is pretty expensive in terms of tune and reseurces. +» Itmight be challenging to gather the same quality of data from respondents at interval very 2. Remagpeetive Study a revrespective study, the researcher depends on existing, information from previous sslematic investigatians (9 discover pallerns leading to the study outcomes, In ather words, retrospective study looks backward. It examines esposures to suspected risk or protection factors concerning an outcome established at the start of the study. When To Use Retosr ESHA ee pz Ap p lexis where you want fo quickly estimate an Retrospective studies are best fur reseurch coi exposure’s effect on an outcome. It also helps you to discover preliminary measures of association in your dala, Medical researchers adopt retrospective study methods when they need to research rare cunditions Advantages of Retrospective Smdy = Retrospective studies happen at a relatively smaller scale and do not require mush time to complete. = Hhelps you to sludy rire aulcomes when prospeclive surveys are nol feasihle Disadvantages of Retrospective Study + Itiseasily affected by recall bias or misclassification bias, = Itoften depends on convenience sampling, which is prone 10 selection bias. n 3. Cohort Study A cohort study entails collecting information from 9 group of people who share specific traits or have experienced a particular occurrence simultaneously For example, a researcher might conduct a ewhart study on a group of Black school children in the UK During cohort study, the researcher exposes some group members to a specific charactenstic or risk factor. Then, she records the outcome of this exposive and its impact on the exposed variables When To Use Cohort Study You should conduct a cohort study if you're Jooking to establish a causal relationship within your data sets. For example, in medical research, cohort studies investigate the causes of disease and estabhsh links between Ti fiaclors and efTects, Advantages of Cohort Studies * Itallows youto stady multiple outcomes that can be associated with one risk factor. = Charl studies are designed ti help you measure all variahles af interest Disadvantages al Cobort Studies, + Cohort studies are expensive to conduct, + ‘Throughout the process, the researcher has less control over variables Padeepz App n Eile ‘plain the Three V; “ontingency ‘Table and Heyond, Causal path models for three variables The set of paths of causal influence, both direct and indirect, that we want to begin to consider are represented in figure 12.5. In this causal model we are trying to explain social trust. the base 1s therefore the belief that "You can’) be too careful’, ‘The base categories selected fior the explanatory variables are having lower levels of qualifications and not being a member of a voluntary organization, to try and avoid negative paths, Each arrow linking two variables in a causal path diagram represents the direct effect of one variable upon the other, controlling all other relevant variables. The rule for identifying the relevant variables was given in chapter U when we are assessing the dhreel effet of one vatiuble upon another, any turd variable which is likely ta be causally connected 1 both varithles and prict to ong of them should be controlled Coefficient b in figure 12.5 shows the ditect effect of being, in a voluntary association on the belief that most people can be trusted, To find its value, we focus srtention on the proportion who say that most people can be trusted, controlling for level of qualifications, ower va certo | aia [he ha icv ortelowi Sweosatet Figure 12.5 Social trust by membership of voluntary association and level of qualifications: causal path diagram. More complex models: going beyond three variables Clearly there are likely to be many other factors or ‘variables’ that will have an influence, both ‘on volunteering behavior and on social trust, For example, in the model discussed above we have not considered! gender ar age, and bolh of these may have an impact on all of the vanubles in our model As can be seen from the discussion above, it becomes quite complicated even to culate the direct and indirect causil paths when we have a simple model with three van: B We therefore need tn go beyond these paper and peneil techniques if we are going to build mare complex models that aim to compare the impact of a number of different explanatory variables ‘on an outcome variable such as social trust. The following section describes the conceptual foundations that underhe models to examine the factors inluencimy a simple chehoamaus (hwo category) wanable. logistne regress movlels Regression analysis is a method for predicting the values of a continuously distributed dependent variable from an independent, or explanatory, variable. The principles behind logistic regression are very similar and the upprouch to huilding models and interpreting the models 1s viruzilly identical However, where is regression (more properly termed Ordinary Least Squares. regression, or OLS regression) is used when the dependent variable is continuous, a binary logistic regression model is used when the dependent variable can only bike lwo values In many examples this dependent variable indicates whether an event occurs or not and logistic reeression is used to model the Twobahility that the event occurs, In the example we have been discussing abave, therefore, logistic segression would be used to madel the probability that an individual believes that most people can be trusted When we are just using a single explanatory variable, such as unteering, the lagistiv regression can be written as “Pddeepz App 12. What is the problem of attrition? A major methodological issue far longitudinal studies in comparison with cross-sectional studies as the problem of atinvion, ie, the drop cut af participants through successive waves af a prospective study: 13. What is Time Si ies Analysi Time sees data includes timestamps and us oflen generated while monitoring the industrial process or tracking, any business metrics An ordered sequence of timestamp values at oqually spwced intervals is referred! to isi Zimte series, Analysis af such a lime series is used in many applications such as sales forecasting, ulility studies, budget analysis, economic forecasting, inventory studies, and so on, 14. Give an example for Time Series Data Set A collection of observations and sequentially in time. Let's take an example of time

You might also like