[go: up one dir, main page]

0% found this document useful (0 votes)
15 views4 pages

M7 Homework

Uploaded by

ellenagere
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views4 pages

M7 Homework

Uploaded by

ellenagere
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

1.

​ Find or Build a Dataset


○​ Choose a dataset that includes a response variable and at least one
potential predictor variable.
○​ This dataset can be from a public source, your own creation, or previous
coursework.
○​ In your report, justify why a dynamic regression model is appropriate for
your case.

For my dataset, I used the "AI & Data Job Salaries and Skills Dataset
2024–2025" from Kaggle, which includes global job postings across AI-related roles.
From this dataset, I selected the response variable as salary_usd because of the goal of
this homework to predict changes in compensation trends over time. Furthermore,
salary is a measurable outcome of economic behavior. In terms of the predictor
variable, I used remote work ratio over time because remote work availability can
directly impact salary trends. Remote work variability is also available for each job
posting and can also be averaged weekly, which allows it to be an extremely relevant
predictor. A dynamic regression model is appropriate for this case because both the
response variable and the predictor variable vary over time and are structured as a
weekly time series. Since the goal of this assignment is to forecast future salaries based
on remote work availability, the model, through aggregating job data weekly, is able to
capture trends in both compensation and workplace flexibility.

2.​ Fit a Regression Model


○​ Regress your response variable on the selected predictor(s).
○​ Include plots of the time series, fitted values, and regression coefficients (if
multiple).

A multiple linear regression model, estimated by Ordinary Least Squares (OLS),


was used with the purpose of quantifying the linear relationship between the predictor
and response variables over a period of time. This model is ideal for identifying how
changing the predictor, remote work availability, affects the response variable, weekly
average AI job salaries. OLS was selected because it provides unbiased and efficient
parameter estimates under the assumption of no autocorrelation in residuals, which was
further validated through subsequent diagnostic testing. Furthermore, multiple linear
regression allows for future scalability if additional predictors are introduced while still
preserving how interpretable the forecasting outcomes are.
Observed vs Fitted Weekly AI Salaries

Regression Coefficient

Regression model results:


Intercept: 106474.88
Remote Ratio Coefficient: 180.47
R-squared: 0.0144
3.​ Check the Residuals
○​ Perform residual diagnostics (e.g., ACF plots, Ljung-Box test).
○​ Comment on whether residuals show autocorrelation or nonstationarity.

In order to check the residuals, I performed residual diagnostics such as ACF


plots, PACF plots, and Ljung-Box test. The Ljung-Box test returned a p-value of 0.700, a
value above the significance threshold of 0.05, which shows that there is no statistically
significant autocorrelation of residuals. Furthermore, the ACF and PACF plots showed
no spikes beyond confidence bounds, thus further supporting the conclusion.
Ljung-Box Test for Residual Autocorrelation:
lb_stat lb_pvalue
10 7.265207 0.700193

ACF Plot of Residuals:

PACF Plot of Residuals


4.​ Add ARIMA Error Structure (if needed)

No ARIMA structure was needed because based on the residual diagnostics,


there was no strong evidence of autocorrelation in the model residuals. The Ljung-Box
test resulted in a p-value of 0.70, while both ACF and PACF plots showed residuals
behaving like white noise. Thus, an ARIMA error structure was deemed unnecessary for
this dataset.

5.​ Forecast 3 Future Points


○​ Generate and interpret forecasts for the next 3 time points.
○​ Clearly state your assumptions (e.g., how you are forecasting future
predictor values).

Through using the fitted dynamic regression model, I forecasted average AI job
salaries for the following three weeks of the last entry of the dataset on May 11, May 18,
and May 25, 2025. The predicted weekly salaries remain constant at approximately
$114,088 and this uniformity arises from the use of a multiple linear regression model,
which includes remote_ratio as the predictor variable. There was no need to add an
ARIMA error structure in this model due to lack of autocorrelation. Furthermore, the
assumptions involved in this forecasting included that the predictor variable of
remote_ratio remained constant, there were no external shocks to the job market, and
model coefficients remained valid and stable during this forecast horizon. Thus, given
the absence of autocorrelation in residuals and the short time horizon, this approach
provides a reasonable forecast of salary expectations in the AI job market.
3-Week Forecast of AI Job Salaries Plot

You might also like