HOME EXAMINATION
BAN432
Autumn, 2022
Start: 14 Nov, 09:00
End: 21 Nov, 14.00
THE HOME EXAMINATION SHOULD BE SUBMITTED IN WISEFLOW
You can find information on how to submit your paper here:
https://www.nhh.no/en/for-students/examinations/home-exams-and-assignments/
Your candidate number will be announced on StudentWeb. The candidate number
should be noted on all pages (not your name or student number). In case of group
examinations, the candidate numbers of all group members should be noted.
SUPPLEMENTARY REGULATIONS FOR HOME EXAMINATIONS
You can find supplementary regulations under the headline “Regulations”
https://www.nhh.no/en/for-students/regulations/
Find more information under chapter 4.0 in the Supplementary provisions to the
regulations for fulltime study programmes
Number of pages, including front page: 4
Number of attachments: 3 files:
(1) WSJdata.zip
(2) earningsCallsTranscripts.RData
(3) dailyReturn.csv
                                                                                   1
                           BAN432, fall 2022 – Final Project
Formalities
This final project will be handed out on 14 Nov, 2022 at 9:00 and has to be submitted on Wiseflow no later
than 21 Nov, 2022 at 14:00.
In addition to the required hand-in files, you will present your findings on 24 or 25 Nov. Presentations will be
strictly 5 minutes long, followed by a 15 minute question and answer session. If your group has not done so,
you have to sign up on Canvas for individual presentation slots. All members of the group have to sign up
at the same time slot and they all have to be present in-person. This exam is group-based and all group
members will receive the same grade for the report. However, the grade for the presentation will be given
individually.
Please note that all group members have to equally contribute to the exam. This implies that we expect all
group members in the oral exam to be capable of answering questions to data, model, as well as interpretation.
Please submit the following three documents
  • Report (.pdf) In the report, you should present, analyze, and interpret your results. You should
    provide numeric and written answers to all questions asked in the exam. Do not discuss your coding in
    the pdf as this should be found in the R-file. Please keep your answers short and precise. Only focus on
    questions specifically asked in the outline of the project. If you want, you can write this document in R
    Markdown, but make sure to submit it as a pdf, and to submit an .R coding file in addition.
  • Coding file (.R) Please describe your general coding approach for each task. You do not need to
    explain the used functions.
  • Presentation (.pdf) This is the presentation that you will hold on 24 or 25 Nov. It is not allowed to
    change the slides between your submission and the actual presentation. Make sure that you use your 5
    minutes presentation time wisely. It is important that you provide an economic rational for choices you
    made during this project.
Your task
Goal: construct a sentiment dictionary targeted at newspaper articles
In this exam, you are supposed to construct a sentiment dictionary using newspaper articles. You will use
returns as signals whether the market interpreted the news as positive/negative. This means, you are not
allowed to pick terms solemnly based on human judgement. You have freedom in how to develop the sentiment
dictionary, however you get several clear tasks of what to investigate with the final sentiment measure.
Data provided
  • Wall Street Journal articles
  • Earnings calls
  • Return data
Task 1 - Explorative analysis:
Try to address following questions:
  • What determines newspaper coverage regarding a given firm? Please consider daily returns and stock
    market volume as potential explanatory variables
                                                       1
  • When relative to the return reaction do newspapers cover the news?
Task 2 - Construct a sentiment dictionary
Construct a sentiment measure. Randomly select 50% of the firms (Group A) and continue only with this
sub-sample. Identify words that will capture sentiment in a newspaper context. Use market returns to decide
what words are positive/negative. Use your insight from Task 1 to determine the right timing to measure
return.
  • Note that MNIR is one tool to use for this task. However, it is not the only tool. Also, only use tools
    that you feel comfortable discussing in the exam. In addition, other - potentially simpler - tools work
    equally well.
Task 3 - Internal validity
Evaluate the dictionary on a set of WSJ articles.
  • Report the relationship between sentiment and return (e.g. in a regression and a plot) for firms of Group
    A. Do the same for the remaining firms (Group B). Are there differences in performance and why?
  • Split the remaining firms in Group B in firms with a lot of news articles and the ones with few articles.
    Make sure that the samples have approximately equal number of unique firms. Run the same regression
    with both sub-groups. Are there differences and why?
  • Change relevant parameters in the building of your sentiment dictionary and show how performance, in
    and out-of-sample, behaves.
Task 4 - External validity
  • Use a corpus of earnings calls and apply your sentiment dictionary. Investigate in a regression if return
    and sentiment correlate. In addition to considering the full sample, apply the same split in Group A
    and B. Does it do equally well as in Task 3?
Comments and suggestions
  • Do not be discouraged if data is fairly messy, it usually is. It might be smart to open some of the WSJ
    files in your browser (those are html) to understand their structure.
  • Please be aware that we only specify the raw data but not how to structure/limit the final corpus. Use
    economic judgement regarding the trade-off corpus size and effort. Also, we have following words of
    advice regarding working with large amounts of data:
        – You could select a sub-sample of the data. Even if you decide to use the whole data, we would
          recommend to try your code on a small sub-sample before you apply it to the whole set of data.
        – You could merge individual articles for different frequencies.
        – We would advice to be careful with how many documents you hold in memory.
  • Your empirical approach has to be creative and convincing. We would like to point out that just using
    an off-the-shelf model might not be the most convincing approach. Make sure that you tailor the tools
    we covered in the course to the exact RQ. You might want to combine several tools into one approach.
  • Only use tools that you understand well enough so that you are comfortable with receiving questions
    about them in the oral exam. Often simple approaches work almost equally well as complicated ones.
  • As we have seen in several lectures, the actual pre-processing steps matter. Make sure that you are
    considerate with your choices but also that you understand the impact of each pre-processing step on
    the final outcome.
  • Both guest lectures might be relevant for the task.
  • If you have questions during the exam, do not visit us in our office but write an email. If we answer
    your request, we will do so on Canvas so that information is public and distributed fairly.
                                                     2
• Good luck!