[go: up one dir, main page]

0% found this document useful (0 votes)
5 views9 pages

Code File Analysis

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views9 pages

Code File Analysis

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

bussat-claire

STAGE 1
1: Exploratory data analysis, preprocessing and cleaning

<class 'pandas.core.frame.DataFrame'>
Int64Index: 11188 entries, 2 to 11095
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 company 11188 non-null object
1 content 11187 non-null object
2 datatype 11006 non-null object
3 date 11188 non-null object
4 domain 11096 non-null object
5 esg_topics 11188 non-null object
6 internal 11188 non-null int64
7 symbol 11187 non-null object
8 title 11188 non-null object
9 url 11096 non-null object
dtypes: int64(1), object(9)
memory usage: 961.5+ KB
None

2.compare the number of ESG documents per company.


average length of the content per document

3.Cleaning
We can see that we have a lot of numbers, dates, symbols, punctuation, web sites adresses.
We are going to remove all this and lematize our data to make it ready for analysis.

still a lot of stopwords and single letters lost in the text which we are also going to remove
and tokenize the data at the same time

3.General Analysis
Now we can check the average length of the content by datatype
We can see that ESG comes last in terms of average length of content with annual report that
comes first.

4.Visualization
Now we will visualize which words come most frequently
We see that some of the words showing up the most are verbs or citation text like 'et al'. To
obtain something more relevant and precise we will work with TF-IDF vectorization, which
will also be useful for further analysis.

5.Time Series
To check the topic distribution over time we will first have a look at all the topics that we
have in the data
Then we are going to convert the data in the date column into date format
STAGE 2
1.Data Annotation
Initialisation: ! pip install transformers

2.Sample Annotation
We only did tokenization for words in stage 1 but now we are going to use sentiment analysis
on sentences so we tokenize again our documents by sentences.

We will perform manual sentiment annotations on a sample of 500 random sentences of our
dataset and extract it as a csv.
now have a samlpe of 500 sentences manually annotated and I want to evaluate three LLMs
on this data.

3.Preprocessing
roBERTa base for sentiment analysis

Now we want to compare the results from the manually annotated data and the annotations
from the model.
We see that in general both annotations have more neutral sentences, then positive sentences
and fewer negative sentences. However, the model has much more neutral than the manual
annotations and therefore fewer positive and a lot fewer negative.

4.distilBERT finetuned
There is no positive sentiment identify in the 300 sentences used which is not realistic.

5.Application to entire data


Based on the result we will use the roberta model.

STAGE 3-4
1.Split the labeled data: 70% train, 15% dev, 15% test.​
2.Train a sentiment model that gives scores from 0 (negative) to 1 (positive).​
3.Compare average sentiment of internal vs. external texts for each company.​
4.Sort companies by the sentiment gap.​
5.Manually check if top-gap companies were involved in greenwashing.

1.Setup

2.Train-Test Split
3.Text Vectorization
-With BoW -With TF-IDF
4.Training of Machine Learning Models
1st algorithm: SVM
2nd algorithm: Decision Tree
3rd algorithm: Naive Bayes
4th algorithm: Logistic Regression

5.ML models evauation


To evaluate each model's performance, there are several common metrics in
use:–Precision–Recall–F-score–Accuracy–Confusion

5.Test with a Pretrained Model


-Evaluation before finetuning
-Finetuning the pretrained model

6.Annotating the full dataset


we have trained our models and selected the logistic regression as the best one we had, the
aim is to apply the model to the entire dataset to have it fully annotated for sentiments.
-With Logistic Regression
-With the Textblob Library(BACKUP)

7Sentiment Analysis
-EDA
Participants then compare the average sentiment of internal vs. external data about a
company. They sort the companies based on the difference between internal and external
sentiment and do a manual follow-up research to see if the companies with the biggest gap
have been explicitly involved in greenwashing during the considered timeframe.
-Comparison of internal vs. external data by companies

-Companies with the bigest difference between internal / external data


-Follow up on the results
Both of these companies are, based on our analysis the top 2 with the most differences
between the weighted average sentiments of internal and external documents. We therefore
went to look for scandals associated with their names and found: Beiersdorf AG was accused
in 2002 of falsely proclaiming CO2 emition neutrality for the production of their products. It
confirms the gap we found in our data regarding internal documents and external documents.
Deutsche Bank AG had to let go one of its leaders in 2022 following a scandal related to
allegedly sustainable funds that did not respect the sustainability criterias promised. This also
shows that if a company publishes documents full of promises the false pretenses can
transpire in the data in the end.
Arian Contessotto
STAGE 1
1. Prerequisites and Load
Import Packages and Make Downloads
1.2 Load Data

2. Data Preprocessing
Data Cleaning
The data cleansing includes the following transformations:
• The columns 'domain' and 'url' are removed from the dataframe as they contain many null values and do not provide important information.
• The columns are reordered.
• The name of Munich Re is changed to Munich RE (looks nicer later on in graphical representations).
• The dataframe is sorted by the column 'company'.
• There are duplicates that need to be removed from the dataframe. 6 duplicates are full duplicates. For these, the first entry is kept
(keep='first'). There are also 600 duplicates based on the 'content' column. This is due to external reports that contain information about
several companies within one article. This means, for example, that a document with the same content is stored once for Allianz and once
for BMW. These contents would interfere with the analysis, so they must be completely removed from the dataframe (keep=False).
• There is a null value for the column 'content'. This row therefore provides no meaningful content for this project and is deleted.
• For the companies Fresenius (6) and Hannover R AG (2) only very few documents exist. Hannover R AG also has no external documents
in the dataframe. These numbers are considered as too small to lead to a meaningful analysis. Therefore, all lines concerning these two
companies are removed from the dataframe.
2.2 Text Preprocessing
The following text preprocessing steps have been considered. An explanation is given as to
why they are or are not applied:
• Language detection and removal:
• Lowercase:
• Expand contractions:
• Remove URL and email:
• Remove punctuation:
• Removal of numbers:
• Removal of emojis and emoticons:
• Spelling correction:
• Word tokenisation:
• Sentence tokenisation:
• Lemmatisation and stemming:

3. Exploratory Data Analysis


3.1.1 Internal / External Reports
3.1.2 Number of Reports by Company(The most reports are available for Adidas AG. The
fewest reports are available for Munich RE.)
3.1.3 Number of Reports by Industry(Most reports come from the pharmaceutical and
automotive industries)
3.1.4 Number of Reports by Datatype(Business, general and tech reports)
3.1.5 Number of ESG-Topics(356)(Social and environment)

3.2 Exploratory Text Analysis( In a first step, two basic features are calculated: Length of
content and polarity.)
3.2.1 Wordclouds(In order to get a feeling of which terms can be significant in which
industry, wordclouds are created for all industries.)
3.2.2 Length of Reports(Most reports are between 400 and 499 words long. External
reports-short, internal reports-long)
3.2.3 N-Gramming
3.2.4 Polarity

4. TF-IDF Analysis(indicates how important a word is in a document or corpus of


documents.)

5. Time Series Analysis(comparing the occurrence of esg issues over time)

STAGE 2
1. Manual Text Annotation
(For manual annotation, a classical sentiment classification approach was applied. That is, sentences were primarily classified according to
negative (level = 0, label = negative), neutral (level = 0.5, label = neutral), and positive (level = 1, label = positive) meaning. For example:
• The strategy of the last years was very successful => Positive.
• The strategy of the last years was alright => Neutral.
• The strategy of the last years turned out to be disadvantageous for the company => Negative
Where possible and useful, an attempt was made to classify sentiment in relation to esg topics. However, this was often not possible at
sentence level outside context, so a classical sentiment classification approach was cosidered the best approach. Classification in terms of
greenwashing was also deemed impractical as no information is available on this. Thus, it can not be seriously assessed whether positive or
negative sentences in relation to esg topics are true or whether greenwashing is present.)

2. Application of Pre-Trained LLM's for Annotation


(five different LLM's are applied to the annotated dataset. Four of them are applied with a
zero-shot strategy. For one model, a few-shot-prompting strategy was used.)
2.1 Zero-Shot-Classification
The following models were used for the zero-shot classification:
• distilbert-base-uncased-finetuned-sst-2-english:
• siebert/sentiment-roberta-large-english:
• ahmedrachid/FinancialBERT-Sentiment-Analysis:
• facebook/bart-large-mnli

2.2 Few-Shot-Classification with GPT-3


(For the few-shot classification, the GPT-3 language model was chosen because it is a
question-answer model that allows the input of contextual information and can be used for
sentiment classification, among other things.)
3. Comparison and Evaluation of LLM's
(The evaluation of the different LLM's is done by comparing the distribution of the sentiment levels (0, 0.5 and
1). Furthermore, the sum of the absolute deviation between the LLM annotation and the manual annotation is
computed and compared.)
Manual annotation had mostly positive sentences.
BERT, RoBERTa, and BartLarge also predicted mostly positive sentiments.
FinancialBERT and GPT-3 predicted mostly neutral sentiments.
BERT and GPT-3 gave the most reasonable sentiment distributions (3 examples each).
BERT’s score range (0 to 1) is similar to manual annotation.
GPT-3 gave the most balanced sentiment distribution overall.
GPT-3 (with 3 example sentences) was the best performing model.
But GPT-3 is not free and using it fully would be expensive.

So, BERT (DistilBERT version) was chosen instead, as the second-best and free option.

5. Dataset Annotation
(Finally, each sentence token in the entire dataset is annotated using the BERT model as
described above.)

STAGE 3
1. Import Packages & Downloads

2. Model Finetuning
The evaluation for the model is based on the following conceptual approach:
1. Select multiple pretrained (Huggingface) models, based on previous stages
2. Train the selected models on a subset of the single sentences to keep the training time short
3. Compare the training outcomes of the different models on the subset and select the best
model

1.1 Finetune Model 1: distilbert-base-uncased


1.2 Finetune Model 2: roberta-base
1.3 Finetune Model 3: xlnet-base-cased
1.4 Finetune Model 4: flan-t5-base (not working correctly)

3. Model Evaluation
3.1 Finetuning/Training Metrics
3.2 Inference/Test Metrics
(In general all models demonstrate bad performance according to the MSE, MAE and R2 on
completely unseen, new data.
Surprisingly, XLNet performs the best out of the 3 models on completely new data.)
(According to the metrics from the finetuning, we expect the best results from a RoBERTa
model even if the model did not show a good inference performance.
Therefore RoBERTa will be finetuned on the complete sentence dataset.)

4. Full Training of selected Model

5. Evaluation of fully trained Model & Sentiment Prediction

6. Compare internal vs. external


(The finetuned model is now used to predict the sentiment of all sentences in the documents.)
(We have trained the full RoBERTa training with 2 different stage2 outputs e.g., different
class/sentiment distributions.
• The initial full training with a quite imbalanced training dataset
• A second full training with a more balanced training dataset
The datasets were both slightly adjusted in discrete class distributions before the training
Both finetuned RoBERTA models were used to perform a sentiment analysis on all
sentences.)

8.1Comparison of internal/external Sentiments on Company Level

(Finally, we compare the internal and external sentiment scores on a company level. We
display the results for both trained classifiers.
Classifier 1 is the finetuned RoBERTa model on dataset 1 (imbalanced dataset). Classifier 2
is the finetuned RoBERTa model on dataset 2 (more balanced dataset).)

(We can conclude from the sentiment analysis that both classifiers 1 and 2 did not detect any
significant greenwashing patterns. Only for the company Qiagen is there possibly
greenwashing due to the result of classifier 2. However, since the classifiers achieved rather
poor results in the model evaluation, this analysis should be treated with caution.)

STAGE-4
Alignment with Sustainable Development Goals

1. Build Embeddings

GPU NEEDED

2. SDG Alignment of DAX Companies

(We model SDG alignment as the similarity between the company-related texts and the SDG
descriptions. In this section, we first define the similarity function using standard cosine
similarity. We then perform some alignment analysis including visualizations and
interpretations. All analysis are executed on a company, sector and industry level.)

3.1 Most Relevant SDGs for DAX Companies - Overview

3.2 Most Relevant SDGs for DAX Companies - Company Level

(n this analysis, we focus on the most important SDGs at the company level. First, we take a
closer look at a specific company defined by the variable COMPANY. We find the 'internal'
and 'external' embeddings for this company, average them, and measure their similarity to
each of the SDGs. We then aggregate and summarize the results for all companies by
displaying heatmaps.)
3.2 Most Relevant SDGs for DAX Companies - Sector Level

3.3 Most Relevant SDGs for DAX Companies - Industry Level

You might also like