Extractive Arabic Text Summarization-Graph-Based Approach
Extractive Arabic Text Summarization-Graph-Based Approach
Article
Extractive Arabic Text Summarization-Graph-Based Approach
Yazan Alaya AL-Khassawneh 1, * and Essam Said Hanandeh 2
1 Data Science and Artificial Intelligence Department, Zarqa University, Zarqa P.O. Box 13110, Jordan
2 Computer Information Systems Department, Zarqa University, Zarqa P.O. Box 13110, Jordan
* Correspondence: ykhassawneh@zu.edu.jo
Abstract: With the noteworthy expansion of textual data sources in recent years, easy, quick, and
precise text processing has become a challenge for key qualifiers. Automatic text summarization
is the process of squeezing text documents into shorter summaries to facilitate verification of their
basic contents, which must be completed without losing vital information and features. The most
difficult information retrieval task is text summarization, particularly for Arabic. In this research,
we offer an automatic, general, and extractive Arabic single document summarizing approach with
the goal of delivering a sufficiently informative summary. The proposed model is based on a textual
graph to generate a coherent summary. Firstly, the original text is converted to a textual graph using
a novel formulation that takes into account sentence relevance, coverage, and diversity to evaluate
each sentence using a mix of statistical and semantic criteria. Next, a sub-graph is built to reduce
the size of the original text. Finally, unwanted and less weighted phrases are removed from the
summarized sentences to generate a final summary. We used Recall-Oriented Research to Evaluate
Main Idea (RED) as an evaluative metric to review our proposed technique and compare it with the
most advanced methods. Finally, a trial on the Essex Arabic Summary Corpus (EASC) using the
ROUGE index showed promising results compared with the currently available methods.
2. Related Work
Luhn first proposed the idea of automatic text summarization in 1958, in the sense of
figuring out how words are distributed inside sentences and identifying the document’s
keywords [7]. Since then, numerous summarizing techniques have been created using vari-
ous methodologies and for various objectives. However, the majority of these approaches
can be seen as advancements over earlier strategies. In this section, we concentrate on the
Electronics 2023, 12, 437 3 of 17
studies using graph-based extraction techniques for a single document. We also explore
research that introduced Arabic text summarizing tools.
Recently, several graph-based approaches for summarizing single and multiple English
documents have been developed. Among these strategies are [8–11].
The researchers in ref. [12] used a graph-based approach to extractive summarization.
The later researchers suggested a brand-new summarizing technique based on a hybrid
modeling graph. They suggested implementing a cutting-edge hybrid similarity function
(H) that combines four different similarity measurements: cosine, Jaccard, word alignment,
and window-based similarity. The method makes use of a trainable summarizer and takes
into account a number of factors. It has been investigated how certain characteristics affect
the work of summarization.
In ref. [13], a graph reduction technique known as the Triangle Counting Method
is developed to select essential phrases in the text. The initial stage is to visualize a text
as a graph, where the phrases serve as the nodes, and the similarities between them
serve as the edges. Following the representation of the bit vector, the creation of triangles
comes next, and acquiring phrases based on the bit vector’s values comes last. This study
demonstrated that it is possible to change one graph into another with a significantly
smaller number of triangles. Adjacency Matrix Representation is simple to use and has
sped up implementation times.
Two Arabic summarizing systems were created by El-Haj et al. [14,15]. The Arabic
Question-Based Single Text Summarizer System (AQBTSS) works with an Arabic doc-
ument and an Arabic query to provide a summary that is appropriate for the query of
the document. The second system, called the Arabic Concept-Based Text Summarization
System (ACBTSS), uses a set of words that reflect a certain concept as its input rather than
a user’s query. The first two phases of the two systems are the same: selecting a document
from the document collection that matches the user’s query and breaking the text up into
sentences. Both systems use the Vector Space Model (VSM) in the summarization phase,
where the weighting scheme is based on VSM and uses two measures, term frequency and
inverse document frequency. In AQBTSS, each sentence is compared to the user query
to find relevant sentences, whereas in ACBTSS, each sentence is matched against a set of
keywords that represent a given concept. A panel of 1500 users evaluated the readability of
the summaries of 251 articles produced by the two systems to evaluate them. The results
revealed that AQBTSS performed better than ACBTSS.
A platform for summarizing Arabic texts was proposed by [16] and includes the
following modules: tokenization, morphological analyzer, parser, relevant sentence extrac-
tion, and extract revision. A variety of texts (short, average, and long) were used in the
evaluation of this platform in terms of execution time, and it was discovered that the run
time of the platform’s modules for a specific text was influenced by its size, i.e., the shorter
the text, the weaker its run time.
The Sakhr Summarizer is an Arabic summarization tool that extracts the key phrases
from the source text and summarizes them [17]. The Summarization engine makes use of
the Sakhr Corrector to automatically correct the input Arabic text for frequent grammatical
errors and the Keywords Extractor to find a prioritized list of keywords to accurately
identify the essential phrases.
Authors in ref. [18] suggested a different summary system—the Arabic Intelligent
Summarizer. The main machine-supervised learning technique is the foundation of this
system. There are two phases to the system. The learning phase, which uses SVMs, is
the first and instructs the algorithm on how to extract summary sentences. The users can
summarize a new document during the use phase—the second stage.
P.M. Sabuna and D.B. Setyohadi [19] describe the development of an abstractive
automatic summarization system for online discussion papers using the vector space
concept. The three modules that make up this system are point curation, point extraction,
and summary creation. By dependency parsing and examining the grammatical structure,
Electronics 2023, 12, 437 4 of 17
points are extracted. Shorter points are created by smaller indirect points after choosing the
topic points and the points that might work for the summary.
An extractive summary technique for Arabic texts has been developed in [20]. This
approach combines rhetorical structure theory (RST), one of the most popular theories in
natural language processing, with semantic data taken from the Arabic word net. The
quality of Arabic text summarization is improved using this method, which combines
linguistic selection methods with sentence feature selection methods. In order to determine
how closely related sentences are to the main title and subheadings, the suggested RST-
based method first constructs an initial summary and then uses the score of each sentence
in that summary.
The automatic Indonesian text summarizing system described in [21] generates sum-
maries by combining sentence scores and decision trees. The C4.5 algorithm is employed
in this system to pick the sentences that are of interest. After that, each sentence is scored
using a sentence-scoring approach that takes into account eight variables, including TF-IDF,
uppercase letters, proper nouns, cue phrases, numerical data, sentence length, sentence
position, and title similarity. Following the creation of a decision tree model using the
training data, the important sentences are identified, and the summary is prepared using
the model’s rules. A combined statistical-linguistic approach-based extractive summary
technique for Indian literature has been described in [22]. Preprocessing, sentence feature
extraction, and genetic algorithm (GA) for ranking sentences based on optimum feature
weights are the three primary components of this summarization method. A sentence fea-
ture vector serves as a representation for each sentence. The statistical-linguistic properties
of each sentence are analyzed, and a score is generated based on the importance of the
features in that sentence. The sentences are then ranked based on the findings. Sentence
characteristics accept values in the range of 0 to 1. After a predetermined number of gener-
ations in the GA, the fittest chromosome is chosen, and the Euclidean distance formula is
used to calculate the distance between each sentence score and the fittest chromosome. The
sentences are then arranged according to increasing distance. Finally, a summary is created
by selecting a specific number of the document’s top-ranked sentences, depending on the
level of summarization that is desired.
Authors in ref. [23] suggested a multi-morphological analysis-based extractive graph-
based approach for summarizing Arabic text. The original text was converted into a graph
using this suggested strategy. The sentences were represented as vertices, and the linkages
between the sentences were determined using the mutual nouns between the connected
phrases and the cosine similarity between the sentences based on Term Frequency-Inverse
Document Frequency (TF-IDF).
The extractive Arabic text summarizing approach proposed by [24] employed the
Firefly algorithm. The proposed approach comprised four basic steps: (1) text preprocessing
techniques such as segmentation, tokenization, stop word elimination, and stemming;
(2) using a phrase’s structural features, such as the title similarity, sentence length, sentence
placement, and term TF-IDF weight, to calculate similarity scores; (3) creating a graph
of potential answers, where the vertices are the original document’s sentences, and the
edges are how close they are to one another; (4) choosing which sentences should be in
the summary using the Firefly algorithm. The suggested method was assessed using the
Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics on the EASC corpus.
The QUESTS system, which was suggested in [25,26], is an integrated query system
for producing extractive summaries from a collection of documents. In order to create
many subgraphs from the main graph, this system first creates an integrated graph of the
relationships between the sentences of all the input documents. Sentences that are more
closely related to the topic at hand and to one another make up these subparagraphs. The
highest-scoring subparagraph that is most pertinent to the query is chosen for inclusion in
the summary after the algorithm ranks the subparagraphs using a scoring model.
Electronics 2023, 12, 437 5 of 17
As indicated in the third phase, nodes are ranked according to their final scores after
specifying the final scores for each node. The best sentences are then chosen to participate
in the final summary. Two of the most significant algorithms based on the graph are
TexRank [27] and LexRank [28]. We then briefly looked at each of these algorithms.
The candidate sentences that might be included in the summary are all represented
in a graph by the multi-document summarizing system called LexRank. If the similarity
between two sentences exceeds a certain level, they are connected in this graph represen-
tation. A competitive advantage between two sentences is created if they have specific
similarities. This similarity is computed using the function COSINUS. The system then
conducts a random walk on the graph after constructing the network to identify the most
crucial sentences.
All graphs that are derived from natural language texts are ranked using the graph-
based model TextRank.
TextRank is a single document summarizing system that derives from the Google page
ranking [27,29] paradigm. Keywords and sentences are extracted using TextRank. To extract
sentences, a completely connected, undirected graph is used. An individual sentence is
regarded as a vertex (or node) in a graph. A similarity connection that is calculated as a
function of shared concepts is used to draw a line between two texts. Additionally, each
edge has a weight that reflects how significant a relationship is. The best sentences are
chosen after ranking each sentence according to its score.
Given a document d, let G = (V, E) be an undirected graph that represents the
document d with the set of nodes V and the set of edges E. This is for the text summarizing
task. The nodes in this model stand in for the sentences in d. Every edge Eij has a weight
Wi that denotes how similar the nodes (sentences) Vi and Vj are to one another. If two
sentences satisfy a similarity threshold t and are similar to one another, then they are related.
Based on the relationships with other connected nodes, each node in the V graph is also
given a salient score. This score, which was determined using a ranking system, shows
how much information is contained in a sentence.
Figure1.1.Overview
Figure Overviewofofthe
theproposed
proposedapproach
approachfor
fortriangle-graph-based
triangle-graph-basedsummarization.
summarization.
5.1.Data
5.1. DataPre-Processing
Pre-Processing
ItItisis challenging
challenging to to test
testand
andevaluate
evaluateanan artificial texttext
artificial summarizing
summarizing system sincesince
system there
is no is
there perfect summary
no perfect summaryfor almost any specific
for almost document
any specific or set of or
document related
set oftexts. Addition-
related texts.
ally, as researchers
Additionally, typicallytypically
as researchers gather their
gatherown information,
their the lack the
own information, of Arabic
lack ofstandard
Arabic
datasets datasets
standard made themadeevaluation process more
the evaluation difficult
process and possibly
more difficult subjective
and possibly in some cir-
subjective in
cumstances [30]. As far as we are aware, there are four Arabic extractive single-document
some circumstances [30]. As far as we are aware, there are four Arabic extractive single-
datasets that
document are available
datasets to the public.
that are available to theSummaries are produced
public. Summaries automatically
are produced by trans-
automatically
bylating an English corpus into Arabic using Google’s translation service. When compared
translating an English corpus into Arabic using Google’s translation service. When
compared
to human translation, this method of dataset generation lowers the cost of creating of
to human translation, this method of dataset generation lowers the cost an
creating an Arabic
Arabic dataset. dataset. However,
However, doingresult
doing so could so could
in a result
documentin a document of poor
of poor quality or quality
have an
orimpact
have an onimpact on semantics.
semantics. To automatically
To automatically produce extractive
produce extractive summaries summaries
that might that
bemight
biased
betoward
biasedcertain
towardsummarizers,
certain summarizers, authors in [31] previously built Arabic summarizers.
authors in [31] previously built Arabic summarizers. Finally,
Finally,
the datasetthe dataset in [15]
in [15] was was using
created created using human-generated
human-generated extractive extractive
summaries. summaries.
In order to
Intest
order to test and assess the suggested strategy, the Essex
and assess the suggested strategy, the Essex Arabic Summaries Corpus (EASC)Arabic Summaries Corpus[15]
(EASC) [15] has been used. A team of scholars at Essex University created the EASC corpus,
has been used. A team of scholars at Essex University created the EASC corpus, an extrac-
an extraction summation that was published. It has 153 articles on various subjects that
tion summation that was published. It has 153 articles on various subjects that were com-
were compiled from Arabic newspapers and Wikipedia. There are five separate reference
piled from Arabic newspapers and Wikipedia. There are five separate reference summar-
summaries produced by five different humans for each article in the EASC corpus. The
ies produced by five different humans for each article in the EASC corpus. The one thing
one thing that sets this dataset apart from others is that it is the only Arabic dataset that
that sets this dataset apart from others is that it is the only Arabic dataset that has been
Electronics 2023, 12, 437 8 of 17
has been created by humans. This makes the evaluation more realistic when compared to
methods that rely on translated datasets or the output of summarizers that have already
been produced.
The first stage in practically all summary methodologies is this one. Its major objective
is to get the input text file ready for processing in subsequent phases. It primarily creates a
uniform representation of the input document.
Due to the complexity of the Arabic language, developing the NLP system is not
simple. The rich and intricate morphological and syntactic flexibility of Arabic is widely
known [32]. The preprocessing stage is essentially the same for all languages and often
entails normalization, tokenization, POS tagging, stemming/lemmatization, and stop-word
removal [33–35]. Since most texts produced in Arabic and saved in electronic form do not
have diacritical marks at first, the system deals with Arabic texts without them.
5.1.1. Tokenization
Tokenization, the first step in text preprocessing, divides input documents into units
of varying levels to make it easier to access all of the input document’s content. These
units can be tokens, sentences, paragraphs, numerals, or any other suitable unit [36]. To
give an example, the proposed tokenization is a morphological decomposition based on
punctuation that begins by identifying the paragraphs the document is made up of. The
newline character n serves as the paragraph divider in this scenario. Following that,
paragraphs are divided into a collection of phrases using the full stop (.), question mark (?),
and exclamation mark (!). Finally, delimiters such as white space, semicolons, commas, and
quotations are used to separate these phrases into tokens. To handle the aforementioned
series of actions, we used the AraNLP tool with minimal modification [37].
5.1.2. Normalization
Some Arabic letters may take on several forms, while others may be used in place
of others because of similarities in their shapes. Writers also employ diacritical marks in
their writing. These result in a set of variations for the same term, which has an impact
on how some attributes, such as term frequency (TF), are computed. To avoid these
variations, a normalization technique is needed to harmonize the many spellings of the
same letter. The following activities are performed by the suggested normalization step
using the AraNLP tool [37]: (i) eliminating non-Arabic letters such as special symbols and
punctuation; (ii) removing diacritics; (iii) replacing @ , @ and @ with @, ø with ø, and è with
è [38]; (iv) removing tattoos (stretching characters).
5.1.3. Stop Words Removal
Stop words are unimportant words that regularly appear in texts to build sentences,
such as pronouns, prepositions, conjunctions, etc. [39]. These words can be removed from
sentences without changing their main ideas since they are not informative (do not add
information). In fact, this phase is very important because several computations are based
on the frequency of the words in the sentence or document. Therefore, by eliminating stop
words, these calculations are made more pertinent and precise. Stop-words are eliminated
from the text using a variety of stop-list techniques, including the general stop-list, corpus-
based stop-list, and combined stop-list. The suggested strategy, which outperformed the
other two ways, relies on a broad stop-list created with the AraNLP tool [37,40].
5.1.4. Stemming
Because Arabic is a highly inflectional and derivational language, words can take on a
wide variety of forms while still having the same action-related abstract meaning. Evidently,
this has an impact on a number of natural language processing techniques, including text
similarity analysis and developing bag-of-word models. Stemming, then, is the process of
deleting all or some affixes from a word, such as prefixes, infixes, postfixes, and suffixes. In
Electronics 2023, 12, 437 9 of 17
other words, stemming reduces a word’s various forms and derivatives to a single, unified
form (such as a root or stem) from which all other forms can be derived. There are two
popular stemming techniques in Arabic: light stemming and morphological root-based
stemming [41]. When comparing these methods for text summarization, authors in ref. [42]
used three well-known Arabic stemmers, namely the Khoja root stemmer. Their research
showed that root stemming outperforms light stemming for summarizing Arabic texts. We
modified a Khoja root stemmer to handle the stemming operation as a preprocessing task
for the proposed study based on those findings.
Sentence lengths: Lines including the date or author are eliminated from sentences
that are too short. The normalized length of each sentence is calculated as:
CountLength S(i,w∈{1...n})
SL(Si ) = (2)
CountLength S( j,w∈{1...m})
Sentence positions: Sentences that appear earlier in their paragraphs are given higher
grades. Each sentence in a paragraph with n sentences is scored as follows:
Numerical data: Each sentence containing numerical terms that duplicate significant
statistical data points in the text is slated for summarization. The scores for each phrase are
calculated as follows:
CountND (Si )
ND (Si ) = (4)
CountLength S(i,w∈{1...n})
CountThematic(Si )
TW (Si ) = (5)
max ( TW )
Sentences that are identical to one another: To determine commonalities between each
sentence S and every other sentence, token-matching algorithms are used. The total number
of sentences found is represented by the matrix [N][N], and the diagonal components are
set to zero because the sentences are not compared to one another. The evaluation of each
sentence’s similarity score is as follows:
Electronics 2023, 12, 437 10 of 17
∑nk=1 Sim Si , S j
k
STS(Si ) = (6)
max (sim Si , S j )k
To determine PR, a starting score of 1 is given to each node, and Equation (8) is applied
iteratively on the weighted graph G until the difference in scores between iterations for
all nodes is less than a threshold of 0.001. The salient scores of the sentences determine
the nodes’ weights. Nodes with higher scores correspond to sentences that are significant,
relevant to the document, and have strong relationships with other sentences. Each vertex
is given a score following the algorithm’s execution, and this score reflects the vertex’s
“importance” or “power” inside the graph. After that, the sentences are arranged in order
of their scores. Note that only the number of iterations necessary to reach convergence may
be impacted by the initial value choice; the final values are unaffected. Figure 2. shows a
graph representation built for a text sample, the blue lines are the edges between sentences
of the text.
Electronics 2023, 12, 437 11 of 17
The next step is to create a list of triangles to represent the text. The procedure based
on De- Morgan lows is used to locate the triangles in the graph. Algorithm 2. shows how
this step was done.
Electronics 2023, 12, 437 12 of 17
After finding the nodes and edges representing the triangles in the main graph, we
can construct the reduced graph. Figure 3 shows the Triangle graph for the sample text
represented in Figure 2. The blue lines are the edges between sentences of the text, while
the red lines are the edges represent the reduced graph. That means only the red edges will
be used to create the summary.
1 In Figure 3, the red lines show the edges of the triangles from the main graph, while
the blue ones were ignored.
Electronics 2023, 12, 437 13 of 17
where Score(Si ) is the score of sentence S, Score( Fk (Si )) is the score of feature K, and m is
the number of features used to score the sentences.
6. Experimental Results
The proposed experiment aims to produce the following outcomes: (i) assess the
proposed design of the chosen statistical and semantic features; (ii) assess the use of a
statistical summarization method on the Arabic texts; and (iii) assess the comparison of our
proposed method to other related works. As was already noted, the EASC dataset was used
for testing and assessing the suggested method. In order to calculate the precision, recall,
and F-score for each of the generated summaries for both summary methods, ROUGE-N
(i.e., ROUGE-1, ROUGE-2) was employed.
In order to produce the output summaries in score-based summarization, an input
threshold (summary ratio) needs to be modified. Finding the ideal ratio is challenging
because the corpus includes 153 documents, each of which has five human reference
summaries with a different ratio. To prevent this issue, the generated summaries are
modified using an adaptive ratio dependent on the length of the reference summary we are
comparing it to.
A majority summary, or so-called gold-standard summary, was created by a voting
process among the five references to improve outcomes and avoid the problem of subjectiv-
ity. As a result, the statement was included in the gold-standard reference summary if it
appears in three or more of the five references [45].
The results of the suggested approach are contrasted with those of other systems and
methods for relevant Arabic summarization in this section. With a brief description of
the summary type, summarization method, and features employed, Table 1 presents ten
similar summarization methods/systems. These systems were assessed using the Essex
Arabic corpus and the “gold-standard” summary, which stipulates that no more than 50%
of the original document’s words should be used in the summary.
Electronics 2023, 12, 437 14 of 17
Since ROUGE-N (N = 2) performs better for the evaluation of single document summa-
rization, it
precision, andwas employed
F-Score, in the
Figure evaluation
4 compares theprocess as an automatic
performance results ofevaluation metric
the proposed sum-for
recall, precision, and F-score. Based on their published results in terms of recall, precision,
marizing method to the performance results of the related summarization methods/sys-
and F-Score, Figure 4 compares the performance results of the proposed summarizing
tems. The suggested score-based strategy exceeds the competition in terms of recall, pre-
method to the performance results of the related summarization methods/systems. The
cision, and F-Score, with average improvements of 23%, 23%, and 24%, respectively (Fig-
suggested score-based strategy exceeds the competition in terms of recall, precision, and
ure 4). This is due to the potency/strength of the chosen feature and the originality of their
F-Score, with average improvements of 23%, 23%, and 24%, respectively (Figure 4). This is
composition, in addition to the use of appropriate and modern Arabic NLP techniques.
due to the potency/strength of the chosen feature and the originality of their composition,
in addition to the use of appropriate and modern Arabic NLP techniques.
this approach, we used six different features for each sentence, and we used cosine similarity
measurement to find the relations between the sentences (graph nodes) to represent the
graph; then we pruned the graph by finding the triangles sub-graph and by using the
sentences, formed this sub-graph to find the summary. The scoring process of the sentences
was completed based on the values of the selected features. Based on the experimental
results of the proposed method, we can say that if we can identify significant similarity
measurements for representing relations between sentences and identify significant features
for text summarization, it can produce a good summary.
8. Conclusions
Because of the Internet’s incredible rise in data, it is more important than ever to have
an automated summarizing system that can reduce user time consumption and information
overload. Key sentences from the document’s major concepts should be retained in a decent
summary, and repetition should be minimized to create a summary that is information-
rich. Despite current efforts to develop text summarization techniques and formulate
representative characteristics, these formulations are still unable to adequately capture the
relevance, coverage, and diversity of a phrase. The method for extracting single document
summarization presented in this paper is general.
The score-based method makes use of a set of attributes that were selected and devel-
oped after a thorough examination of summarization techniques, Arabic text characteristics,
and writing styles. These characteristics range from statistics to semantically-based ones.
While keeping in mind that these sentences are varied and cover the entire notions of the
document, the adopted formulations aid in determining the value of sentences, which
is vital to the process of deciding whether to include them in the summary. We test the
suggested strategy using the EASC dataset. The system achieved an F-score of 0.617 for the
score-based method using ROUGE-2 as a performance metric.
The findings obtained demonstrate that our method outperforms the most cutting-
edge score-based algorithms, particularly in terms of precision. This is a result of the
proposed characteristics’ informative formulation, which aids in highlighting the signifi-
cance of the statement.
Author Contributions: Conceptualization, Y.A.A.-K. and E.S.H.; Methodology, Y.A.A.-K. and E.S.H.;
Software, Y.A.A.-K. and E.S.H.; Validation, Y.A.A.-K. and E.S.H.; Formal analysis, Y.A.A.-K. and
E.S.H.; Investigation, Y.A.A.-K. and E.S.H.; Resources, Y.A.A.-K. and E.S.H.; Data curation, Y.A.A.-K.
and E.S.H.; Writing—original draft, Y.A.A.-K. and E.S.H.; Writing—review & editing, Y.A.A.-K. and
E.S.H.; Visualization, Y.A.A.-K. and E.S.H.; Supervision, Y.A.A.-K. and E.S.H.; Project administration,
Y.A.A.-K. and E.S.H.; Funding acquisition, Y.A.A.-K. and E.S.H. All authors have read and agreed to
the published version of the manuscript.
Funding: This research was funded by the Deanship of Research in Zarqa University/Jordan. Grant
Number: 7252.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Slamet, C.; Atmadja, A.R.; Maylawati, D.S.; Lestari, R.S.; Darmalaksana, W.; Ramdhani, M.A. Automated text summarization for
indonesian article using veSctor space model. IOP Conf. Ser. Mater. Sci. Eng. 2018, 288, 012037. [CrossRef]
2. Hosseinikhah, T.; Ahmadi, A.; Mohebi, A. A new Persian text summarization approach based on natural language processing
and graph similarity. Iran. J. Inf. Process. Manag. 2018, 33, 885–914.
3. Ozsoy, M.G.; Alpaslan, F.N.; Cicekli, L. Text summarization using latent semantic. J. Inf. Sci. 2011, 37, 405–417. [CrossRef]
4. El-Kassas, W.S.; Salama, C.R.; Rafea, A.A.; Mohamed, H.K. Automatic text summarization: A comprehensive survey. Expert Syst.
Appl. 2021, 165, 113679. [CrossRef]
5. Talibali, L.; Riahi, N. An overview of automatic text summarization techniques. Int. J. Eng. Res. Technol. 2015, 28, 75–84.
6. Thakkar, K.; Dharaskar, R.; Chandak, M. Graph-Based Algorithms for Text Summarization. In Proceedings of the 2010 3rd
International Conference on Emerging Trends in Engineering and Technology (ICETET), Goa, India, 19–21 November 2010; pp.
516–519.
7. Luhn, H.P. The automatic creation of literature abstracts. IBM J. Res. Dev. 1958, 2, 159–165. [CrossRef]
Electronics 2023, 12, 437 16 of 17
8. AL-Khassawneh, Y.A. The use of Semantic Role Labelling with Triangle-Graph Based Text Summarization. Int. J. Emerg. Trends
Eng. Res. 2020, 8, 1162–1169. [CrossRef]
9. Belwal, R.C.; Rai, S.; Gupta, A. A new graph-based extractive text summarization using keywords or topic modeling. J. Ambient.
Intell. Humaniz. Comput. 2021, 12, 8975–8990. [CrossRef]
10. Li, Y.; Cheng, K. Single document Summarization based on Clustering Coefficient and Transitivity Analysis. In Proceedings of
the 10th International Conference on Accomplishments in Electrical and Mechanical Engineering and Information Technology,
Banjaluka, Srpska, 26–28 May 2011.
11. Mihalcea, R. Graph-based Ranking Algorithms for Sentence Extraction, Applied to Text Summarization. In Proceedings of the
42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004), Barcelona, Spain, 21–26 July 2004.
12. AL-Khassawneh, Y.A.; Salim, N.; Jarrah, M. Improving triangle-graph based text summarization using hybrid similarity function.
Indian J. Sci. Technol. 2017, 10, 1–15. [CrossRef]
13. AL-Khassawneh, Y.A.; Salim, N.; Isiaka, O.A. Extractive text summarisation using graph triangle counting approach: Proposed
method. In Proceedings of the 1st International Conference of Recent Trends in Information and Communication Technologies in
Universiti Teknologi Malaysia, Johor, Malaysia, 12–14 September 2014; pp. 300–311.
14. El-Haj, M.; Kruschwitz, U.; Fox, C. Experimenting with Automatic Text Summarization for Arabic. In Proceedings of the 4th
Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics,
LTC’09, Poznan, Poland, 6–8 November 2009; pp. 365–369.
15. El-Haj, M.; Kruschwitz, U.; Fox, C. Using Mechanical Turk to Create a Corpus of Arabic Summaries in the Language Resources
(LRs) and Human Language Technologies (HLT) for Semitic Language. In Proceedings of the Workshop Held in Conjunction
with the 7th International Language Resources and Evaluation Conference (LREC), Valletta, Malta, 17–23 May 2010; pp. 36–39.
16. Ben Abdallah, M.; Aloulou, C.; Belguith, L. Toward a Platform for Arabic Automatic Summarization. In Proceedings of the
International Arab Conference on Information Technology (ACIT’08), Hammamet, Tunisia, 16–18 December 2008.
17. Sakhr Company. Available online: http//:www.sakhr.com (accessed on 1 October 2022).
18. Boudabous, M.; Maaloul, M.; Belguith, L. Digital learning for summarizing Arabic documents. In Proceedings of the 7th
International Conference on NLP (IceTAL 2010), Reykjavik, Iceland, 16–18 August 2010.
19. Sabuna, P.M.; Setyohadi, D.B. Summarizing Indonesian text automatically by using sentence scoring and decision tree. In
Proceedings of the 2017 2nd International conferences on Information Technology, Information Systems and Electrical Engineering
(ICITISEE), Yogyakarta, Indonesia, 1–2 November 2017; Volume 9, pp. 1–6.
20. Abuobieda, A.; Salim, N.; Albaham, A.T.; Osman, A.H.; Kumar, Y.J. Text summarization features selection method using pseudo
genetic-based model. In Proceedings of the 2012 International Conference on Information Retrieval & Knowledge Management,
Kuala Lumpur, Malaysia, 13–15 March 2012; Volume 8, pp. 193–197.
21. Chowdary, C.R.; Sravanthi, M.; Kumar, P.S. A system for query specific coherent text multi-document summarization. Int. J. Artif.
Intell. Tools 2010, 19, 597–626. [CrossRef]
22. Thaokar, C.; Malik, L. Test model for summarization Hindi text using extraction method. In Proceedings of the 2013 IEEE
Conference on Information & Communication Technologies, Thuckalay, India, 11–12 April 2013; Volume 7, pp. 1138–1143.
23. Elbarougy, R.; Behery, G.; Khatib, A.E. Graph-Based Extractive Arabic Text Summarization Using Multiple Morphological
Analyzers. J. Inf. Sci. Eng. 2020, 36, 347–363.
24. Al-Abdallah, R.Z.; Al-Taani, A.T. Arabic Text Summarization using Firefly Algorithm. In Proceedings of the 2019 Amity
International Conference on Artificial Intelligence (AICAI), Dubai, United Arab Emirates, 4–6 February 2019; IEEE: Piscataway,
NJ, USA, 2019; pp. 61–65.
25. Riahi, N.; Ghazali, F.; Ghazali, M. Improving the efficiency of the Persian abstract synthesis system using pruning algorithms in
neural networks. In Proceedings of the First International Conference on Line and Language Processing Persian; Semnan University:
Semnan, Iran, 2012.
26. Shafiei, F.; Shamsifard, M. The automatic dictionary of Persian texts. In Proceedings of the 20th National Computer Society
Conference, Mashhad, Iran, 3 March 2015; Volume 1, pp. 931–936.
27. Mihalcea, R.; Tarau, P. Textrank: Bringing order into text. In Proceedings of the Conference on Empirical Methods in Natural
Language Processing (EMNLP 2004), Barcelona, Spain, 25–26 July 2004.
28. Erkan, G.; Radev, D.R. Lexrank: Graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 2004, 22,
457–479. [CrossRef]
29. Brin, S.; Page, L. The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 1998, 30, 107–117.
[CrossRef]
30. Al-Saleh, A.; Menail, M. Automatic Arabic text summarization: A survey. Artif. Intell. Rev. Arch. 2016, 45, 203–234. [CrossRef]
31. El-Haj, M.; Koulali, R. Kalimat a multipurpose Arabic corpus. In Proceedings of the 2nd Workshop on Arabic Corpus Linguistics
(WACL-2), Lancaster, UK, 22 July 2013.
32. Attia, M. Handling Arabic Morphological and Syntactic Ambiguity within the LFG Framework with a View to Machine
Translation. Ph.D. Thesis, School of Languages, Linguistics and Cultures, Faculty of Humanities, University of Manchester,
Manchester, UK, 2008.
Electronics 2023, 12, 437 17 of 17
33. Abdelkrime, A.; Djamel Eddine, Z.; Khaled Walid, H. Allsummarizer system at multiling 2015: Multilingual single and multi-
document summarization. In Proceedings of the SIGDIAL 2015 Conference, Prague, Czech Republic, 2–4 September 2015;
pp. 237–244.
34. Litvak, M.; Vanetik, N.; Last, M.; Churkin, E. Museec: A multilingual text summarization tool. In Proceedings of the 54th
Annual Meeting of the Association for Computational Linguistics—System Demonstrations, Berlin, Germany, 7–12 August 2016;
pp. 73–78.
35. Thomas, S.; Beutenmüller, C.; de la Puente, X.; Remus, R.; Bordag, S. Exb text summarizer. In Proceedings of the SIGDIAL 2015
Conference, Prague, Czech Republic, 2–4 September 2015; pp. 260–269.
36. Attia, M. Arabic tokenization system. In Proceedings of the 5th Workshop on Important Unresolved Matters, Prague, Czech
Republic, 28 June 2007; pp. 65–72.
37. Althobaiti, M.; Kruschwitz, U.; Poesio, M. Aranlp: A java-based library for the processing of Arabic text. In Proceedings of the
Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014, Reykjavik, Iceland, 26–31
May 2014.
38. Ayedh, A.; Tan, G.; Alwesabi, K.; Rajeh, H. The effect of preprocessing on Arabic document categorization. Algorithms 2016, 9, 27.
[CrossRef]
39. Al-Shalabi, R.; Kanaan, G.; Jaam, J.M.; Hasnah, A.; Hilat, E. Stop-word removal algorithm for Arabic language. In Proceedings of
the 2004 International Conference on Information and Communication Technologies: From Theory to Applications, Damascus,
Syria, 23 April 2004.
40. El-Khair, I. Effects of stop words elimination for Arabic information retrieval: A comparative study. Int. J. Comput. Inform. Sci.
2006, 4, 119–133.
41. Mustafa, M.; Salah-Eldeen, A.; Bani-Ahmad, S.; Elfaki, A. A comparative survey on Arabic stemming: Approaches and challenges.
Intell. Inf. Manag. 2017, 09, 39–67. [CrossRef]
42. Alami, N.; Meknassi, M.; Ouatik, S.A.; Ennahnahi, N. Impact of stemming on Arabic text summarization. In Proceedings of the
2016 4th IEEE International Colloquium on Information Science and Technology (CiSt), Tangier, Morocco, 24–26 October 2016.
43. Brin, S.; Page, L. Reprint of: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. 2012, 56, 3825–3833.
[CrossRef]
44. Morris, A.H.; Kasper, G.M.; Adams, D.A. The effects and limitations of automated text condensing on reading comprehension
performance. Inf. Syst. Res. 1992, 3, 17–35. [CrossRef]
45. El-Haj, M. Multi-Document Arabic Text Summarisation. Ph.D. Thesis, University of Essex, Colchester, UK, 2012.
46. Al-Radaideh, Q.; Afif, M. Arabic text summarization using aggregate similarity. In Proceedings of the 2009 International Arab
Conference on Information Technology (ACIT’2009), Sana’a, Yamen, 15–18 December 2009.
47. Haboush, A.; Al-Zoubi, M.; Momani, A.; Tarazi, M. Arabic text summerization model using clustering techniques. World Comput.
Sci. Inf. Technol. J. 2012, 2, 62–67.
48. AL-Khawaldeh, F.; Samawi, V. Lexical cohesion and entailment based segmentation for Arabic text summarization (lceas). World
Comput. Sci. Inf. Technol. J. 2015, 5, 51–60.
49. Oufaida, H.; Nouali, O.; Blache, P. Minimum redundancy and maximum relevance for single and multi-document Arabic text
summarization. J. King Saud Univ.-Comput. Inf. Sci. 2014, 26, 450–461. [CrossRef]
50. Binwahlan, M.S. Extractive Summarization Method for Arabic Text-ESMAT. Int. J. Comput. Trends Technol. IJCTT 2015, 21, 103–107.
[CrossRef]
51. Al-Radaideh, Q.A.; Bataineh, D.Q. A hybrid approach for Arabic text summarization using domain knowledge and genetic
algorithms. Cogn. Comput. 2018, 10, 651–669. [CrossRef]
52. Al-Abdallah, R.Z.; Al-Taani, A.T. Arabic single-document text summarization using particle swarm optimization algorithm.
Procedia Comput. Sci. 2017, 117, 30–37. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.