Corpus Analysis Using Antconc
Lecture 4
Language and Technology
Introduction
• Corpus analysis involves studying large collections of texts, or corpora, to
understand language patterns, frequencies, collocations, and other linguistic
phenomena. AntConc is one of the most popular and freely available tools for
conducting corpus analysis. It allows researchers to analyze text data and extract
meaningful information, making it widely used in linguistics, discourse analysis,
and other fields related to text studies.
• What is AntConc?
• AntConc is a software tool designed for analyzing language data within text
corpora. It is developed by Laurence Anthony and is used to perform a variety of
text analysis tasks, such as word frequency analysis, concordancing, collocation
analysis, and keyword extraction. The tool can handle multiple languages and text
formats and is compatible with both small and large corpora.
• Reference: Anthony, L. (2019). AntConc (Version 3.5.8) [Computer Software].
Tokyo, Japan: Waseda University. [Available at:
https://www.laurenceanthony.net/software/antconc/
Types of Analysis Possible with AntConc
• Concordance Analysis: This involves finding all occurrences of a word or
phrase in a text along with their surrounding context. It is particularly
useful for understanding how words are used in specific contexts.
• Word Frequency Analysis: This is used to determine how often different
words appear in the corpus, which helps in identifying key terms or
common patterns.
• Collocation Analysis: This involves identifying words that tend to co-occur
frequently with a given word, providing insights into lexical patterns and
relationships between words.
• Keyword Analysis: AntConc can also be used to compare a target corpus to
a reference corpus, allowing for the identification of keywords—words that
are significantly more frequent in one corpus than in the reference corpus.
• Cluster/N-gram Analysis: This helps in identifying recurring sequences of
words (e.g., bigrams or trigrams) and is useful in studying collocations,
phrases, and patterns.
Concordance Tab & Concordance Plot Tab
The Concordance tab is where you can see all occurrences of a search term within your corpus. Each
occurrence is displayed with its surrounding context (Key Word in Context - KWIC format).
Steps: Load the corpus files using the "File" menu.
• Enter the search term in the input box at the top.
• Click "Start" to display a KWIC view of all instances of the word.
• You can adjust the width of the context window around the word.
• Use: Ideal for studying how a word or phrase is used in different contexts.
Concordance Plot Tab displays where and how often the search term appears in the corpus using a
visual plot. It allows you to see the distribution of occurrences of your term across the entire
corpus.
Steps: After entering a search term in the Concordance tab, click on the Concordance Plot tab.
• Each bar represents the position of the search term in different files.
• Use: Useful for understanding the distribution of words or phrases within a text or across multiple
texts.
• Interpret the KWIC Results: The tool will generate a KWIC display showing
each instance of the word or phrase in the middle, along with a set number
of words before and after (the context).
• You can adjust the size of the window to display more or fewer words on either side
of the keyword, helping to refine your analysis.
• Analyze Patterns: Look for patterns of usage by examining the
concordances. Ask questions like:
• What words commonly appear near the keyword?
• How is the word used grammatically (noun, verb, etc.)?
• Does the word have different meanings in different contexts?
• Are there repeated phrases or collocations associated with the keyword?
• You can also use filters to narrow down the results, such as searching for specific
grammatical forms or related words.
• Interpreting Concordance Results: The main purpose of concordance
analysis is to observe how the meaning and function of a word change
based on the context in which it appears. This requires careful
interpretation of the results in order to gain useful insights.
• Below are some key ways to interpret concordance data
• Contextual Meaning: Polysemy (multiple meanings): Words often have multiple meanings depending on the context.
Concordance analysis helps you examine these meanings by looking at how the word is used across different contexts.
For instance, the word "bank" could refer to a financial institution or the side of a river.
• Connotation and Register: Certain words may carry different connotations based on the context. For example, the
word "young" might have positive connotations in one context (e.g., "a young talent") but could be neutral or even
negative in others (e.g., "a young offender").
• Collocational Patterns: Look at the words that frequently appear with the target word. These are known as collocates,
and they can reveal significant patterns about the meaning and usage of the word. For instance, in a corpus of
newspaper articles, the word "crisis" might frequently appear with words like "economic", "financial", or "global",
revealing a strong relationship between these concepts.
• Syntactic and Grammatical Usage: Concordance analysis allows researchers to study the grammatical behavior of
words. For instance, if you search for the verb "run", the surrounding context will help identify whether it is used in the
sense of "manage" (e.g., "run a business") or in the physical sense (e.g., "run fast").
• It also helps in identifying patterns in verb forms, noun phrases, or prepositional structures. For example, you
could search for how the verb "agree" is used—whether it typically takes a direct object ("agree to terms") or is
followed by a preposition ("agree on").
• Comparing Contexts: By examining the KWIC results, you can compare how the same word is used in different texts or
genres. For example, comparing how "innovation" is used in business writing versus scientific writing may reveal
differences in connotations, tone, and frequency of usage.
• Discourse and Pragmatic Functions: Concordance analysis can also reveal how words function pragmatically in
discourse. For example, in a spoken corpus, words like "you know" or "I mean" may appear frequently in casual
conversations as discourse markers, signaling hesitation or emphasis rather than conveying literal meaning.
Example of Concordance Analysis
• Let’s consider an example where you want to study the word "innovation" in a corpus of business
and scientific articles.
• Step 1: Load the Corpus:
• Assume you have a corpus of business and scientific articles in .txt format. Load them into AntConc.
• Step 2: Concordance Search:
• Enter "innovation" in the Concordance tab search box and run the search.
• Step 3: KWIC Analysis:
• The KWIC output will show each occurrence of the word "innovation" in its context. For example, you might
see entries like:
• "The innovation of the new product will boost market share."
• "Government support for innovation is critical in the tech sector."
• Step 4: Collocation Patterns:
• You may notice that "innovation" frequently collocates with words like "technology", "development", "new",
and "support".
• Step 5: Interpretation:
• From this data, you can conclude that "innovation" is often associated with technology and development, and
it appears frequently in contexts discussing economic or governmental support. You might also see
differences in how the word is used in business vs. scientific contexts.
Some More Examples
• Concordance Analysis of "Sustainability" in Environmental Texts: You
could analyze how the term "sustainability" is used in academic papers,
media articles, and corporate reports. You might find that it co-occurs with
terms like "energy", "environment", and "policy", showing how the term is
associated with specific topics.
• Concordance Analysis of Gender Pronouns in Fiction: A study of gender
pronouns (e.g., "he", "she", "they") in fiction could reveal patterns of
gender representation. For example, you might find that male pronouns
are more frequently used in certain types of fiction or that female
characters are associated with specific roles or activities.
• Concordance Analysis of Academic Writing: You might conduct a
concordance analysis on the word "however" in academic papers to study
how it is used to introduce contrasting ideas. The surrounding context
would help you observe the structures in which "however" appears and
how it contributes to the overall argumentation in the text.
Concordance Analysis of Gendered Pronouns in Fiction
• Concordance analysis of gendered pronouns in fiction provides an insightful look into
how language reflects and perpetuates gender representation, power dynamics, and
social roles. By focusing on pronouns like "he," "she," "they," and others, researchers
can examine the distribution and context in which these pronouns are used, offering
insights into how gender is constructed in narrative texts.
• Objective of the Analysis: The main objective of conducting a concordance analysis of
gendered pronouns in fiction is to:
• Identify how often male and female pronouns are used.
• Understand how gender is represented through language in different narrative contexts.
• Study the roles, actions, and attributes associated with male and female characters.
• Examine whether gender bias or stereotypical gender roles are present in the text.
• Explore the use of non-binary or gender-neutral pronouns like "they" in modern fiction.
Steps for Conducting the Concordance Analysis
• Select a Corpus of Fiction Texts:
• First, choose a corpus of fiction texts that are relevant to your research. The corpus
could consist of modern novels, classical literature, or specific genres like fantasy or
romance. The corpus should ideally include works from different time periods or
authors to provide diverse insights.
• For example, a corpus could include novels like "Pride and Prejudice" by Jane Austen,
"The Great Gatsby" by F. Scott Fitzgerald, or modern works like "Harry Potter" by J.K.
Rowling.
• Load the Corpus into AntConc:
• After preparing the text files, load them into AntConc.
• Ensure that the text is in plain text format (.txt) for compatibility.
• Search for Gendered Pronouns:
• In the Concordance tab, enter the pronouns you want to analyze (e.g., "he", "she",
"they").
• Run separate searches for each pronoun, or you can use wildcards to capture
multiple forms (e.g., search for "he*" to capture "he," "him," "his," etc.).
• Analyze the KWIC (Key Word in Context) Results: For each pronoun search,
AntConc will display the results in a KWIC format, showing the pronoun in the
middle of the screen with the surrounding context on either side.
• Adjust the context window if necessary to view more or fewer words around the
target pronoun.
• Look for Patterns in Gender Representation: Examine how "he" and "she" are
used in different contexts.
• For example:
• What kinds of verbs are used after male vs. female pronouns?
• Are male characters often associated with action, leadership, or power, while
female characters are more frequently described in terms of appearance,
emotion, or passivity?
• Do the pronouns cluster around specific roles (e.g., male characters being leaders
or adventurers, female characters being caregivers or passive observers)?
• If the fiction includes non-binary or gender-neutral pronouns like "they," examine
how these pronouns are used and whether they are associated with traditional
male or female roles, or if they represent more progressive or varied gender
identities.
Interpret the Findings
• Frequency of Pronouns: Are male pronouns used more frequently
than female pronouns? This could suggest an imbalance in gender
representation.
• Role Assignments: Do male pronouns co-occur with active, powerful
roles, while female pronouns are associated with passive or
supportive roles?
• Changes Over Time or Across Genres: If your corpus includes texts
from different time periods or genres, you may notice shifts in how
gendered pronouns are used. For example, older fiction may use
more rigid gender roles, while contemporary fiction might show more
diverse gender representations.
Interpreting Concordance Results: Gender and Representation
• Concordance analysis of gendered pronouns in fiction can reveal several
interesting trends and patterns in terms of representation and gender roles. Here
are some typical areas of focus and how to interpret them:
• Frequency of Male vs. Female Pronouns:
• A common observation in traditional literature is that male pronouns ("he," "him," "his")
tend to appear more frequently than female pronouns. This could indicate that male
characters are given more prominence in the narrative, which may reflect a historical bias in
literature where men often take central roles as protagonists, heroes, or decision-makers.
• Female pronouns ("she," "her") may appear less frequently, especially in older fiction.
However, in modern fiction, the balance may shift as gender roles evolve and authors strive
to represent women more equitably.
• Roles and Actions Associated with Pronouns:
• Concordance results allow you to observe the verbs and adjectives commonly associated
with male vs. female pronouns. For example:
• If "he" is often paired with action verbs like "fought," "decided," or "led," this suggests an active,
authoritative role for male characters.
• If "she" is more frequently paired with verbs like "cried," "listened," or "waited," this may indicate a
more passive or emotional portrayal of female characters.
• Such patterns can reinforce traditional gender roles where men are depicted as strong and assertive,
while women are shown as emotional or subservient.
• Stereotyping and Gender Bias: Concordance analysis can help uncover
gender stereotyping in fiction.
• For instance: Male characters may be frequently described in terms of their
achievements or physical strength, while female characters might be
described in terms of their beauty or relationship status.
• If male pronouns are often found in contexts of intellectual work (e.g., "he
discovered," "he solved"), while female pronouns are associated with
domestic tasks (e.g., "she cooked," "she cared"), this indicates a
stereotypical division of labor.
• Representation of Non-Binary Characters: In recent fiction, you might
encounter the use of gender-neutral pronouns like "they" for non-binary or
gender-fluid characters. The concordance analysis of these pronouns can
reveal how inclusive the text is in its representation of gender.
• For example, if "they" appears alongside diverse roles and actions, this
could suggest that non-binary characters are given the same range of
possibilities as their male and female counterparts.
• If "they" is used infrequently or only for secondary roles, this might point
to a limited representation of non-binary identities.
Advantages of Concordance Analysis
• Contextual Insights: Concordance analysis shows how words function
in different contexts, providing a more nuanced understanding of
their meaning and usage.
• Grammatical and Syntactic Patterns: It allows for the study of
grammatical structures, collocations, and phrase patterns.
• Quantitative and Qualitative Analysis: Researchers can quantify the
frequency of words and phrases while also analyzing their qualitative
meaning.
• Comparative Studies: Concordance analysis is highly effective for
comparing the usage of words across different genres, registers, or
time periods.
Limitations of Concordance Analysis
• Contextual Narrowness: The KWIC display can only show a limited
number of words before and after the keyword, which might obscure
broader textual meaning. However, this can be mitigated by using the
File View tab to examine a word in its full textual context.
• Time-Consuming: If the corpus is large, manual analysis of
concordance results can be time-consuming, especially if the keyword
appears frequently.
File View Tab & Clusters/N-Grams Tab
File View tab allows you to see the full text of the files loaded into the program. It
highlights any occurrences of the search term.
Steps: Select a result from the Concordance tab to view the term in the full text.
• The entire document will be shown with highlighted occurrences.
Use: This is useful when you want to see the word within its entire textual context rather
than just in the limited KWIC window.
Clusters/N-Grams Tab helps you identify common phrases (N-Grams) or clusters of words
that occur together frequently.
Steps: Enter a word or phrase in the input box.
• Set the number of words you want to analyze in the clusters (e.g., bigrams, trigrams).
• Click "Start" to see the most frequent word sequences in your corpus.
Use: Useful for identifying frequently used phrases or collocations.
Cluster/N-Gram Analysis in Corpus Linguistics
• Cluster/N-Gram Analysis is a method used to identify sequences of words
(multi-word units) that frequently occur together in a text or corpus. These
word clusters (also known as N-Grams) can provide insights into recurring
phrases, collocations, and patterns of language use.
• The analysis is particularly useful in studies of phraseology, language
teaching, and discourse analysis, as it reveals commonly used expressions
or formulaic language.
• In AntConc, the Cluster/N-Gram Tab helps in identifying recurring word
sequences of various lengths (e.g., bigrams, trigrams, and higher N-
Grams**).
• This can be helpful in understanding the structure of texts, identifying key
phrases, or studying lexical bundles that are frequently used in specific
genres or registers.
Explanation of the Cluster/N-Gram Tab in AntConc
• The Cluster/N-Gram Tab in AntConc is specifically designed to help users find
word clusters (also referred to as N-Grams) within a corpus.
• Interface Features:
• Search Field: Enter a specific word or leave the field blank to generate clusters or
N-Grams based on all words in the corpus.
• Minimum and Maximum Cluster Size: Define the minimum and maximum
number of words in each cluster or N-Gram. For example, if you set it to 2-3,
AntConc will display bigrams (two-word sequences) and trigrams (three-word
sequences).
• This allows you to focus on clusters of different sizes, depending on your research needs.
• Frequency Cut-off: You can set a frequency threshold to limit the output to
clusters or N-Grams that appear a minimum number of times in the corpus. This
helps eliminate less significant clusters that appear only once or twice.
• Sort Option: You can sort the results by frequency (most frequent N-Grams first)
or by the first word of the N-Gram.
• Results Window: Once you click "Start," the results will be displayed in the
window, showing the N-Grams, their frequencies, and their positions in the text.
Steps to Perform Cluster/N-Gram Analysis in AntConc
• Step 1: Load the Corpus
• Open AntConc and load the corpus (text files) that you wish to analyze. The corpus can
consist of multiple text files or a single large text, depending on your study.
• Step 2: Open the Cluster/N-Gram Tab: Select the Cluster/N-Gram Tab from the toolbar.
• Step 3: Set Cluster/N-Gram Parameters
• Minimum and Maximum Cluster Size: Choose the minimum and maximum number of
words you want to examine in each cluster. For instance, to analyze bigrams and
trigrams, set the minimum size to 2 and the maximum size to 3.
• Search Term: You can leave this field empty to search for all clusters in the corpus, or you
can enter a specific word to see clusters that begin with that word.
• Frequency Cut-off: Set a threshold for how often an N-Gram must appear to be included
in the results (e.g., set a cut-off of 5 to only include N-Grams that appear at least five
times).
• Sorting Option: Choose whether to sort by frequency or alphabetically by the first word
in the N-Gram.
• Step 4: Start the Analysis
• Click Start to generate the list of clusters or N-Grams that meet the criteria
you set.
• The results will appear in the window, showing the N-Gram itself, along
with its frequency (i.e., how many times it appears in the corpus).
• Step 5: Analyze the Results
• Once the clusters/N-Grams are generated, you can:
• Look for patterns: Identify frequently occurring word combinations. For example, in
a business English corpus, you might see N-Grams like "customer service", "market
share", or "business strategy".
• Examine common phrases: These recurring phrases can reveal important topics, key
concepts, or formulaic expressions in the corpus.
• Compare across texts or genres: If you are working with multiple corpora, you can
compare the results to see how frequently certain word combinations appear in
different texts or genres.
Cluster/N-Gram Analysis: Business English Corpus
• Step 1: Load the Business English Corpus
• Assume you have a corpus consisting of business reports, emails, and articles in plain text format.
• Load this corpus into AntConc.
• Step 2: Set the Parameters in the Cluster/N-Gram Tab
• Set the minimum cluster size to 2 and the maximum cluster size to 3 to focus on bigrams and
trigrams.
• Set the frequency cut-off to 5, meaning that only N-Grams that appear at least five times in the
corpus will be shown.
• Leave the Search Term field blank to analyze all word clusters in the corpus.
• Step 3: Generate the Results
• Click Start to begin the analysis.
• The results might show frequent bigrams and trigrams like:
• "customer service" (20 occurrences)
• "market share" (15 occurrences)
• "financial statement" (12 occurrences)
• "business strategy" (10 occurrences)
• Step 4: Interpret the Results
• "Customer service" and "market share" are highly frequent bigrams, indicating that these topics are central
to the corpus of business texts. You can infer that these concepts are commonly discussed in business
contexts.
• "Financial statement" and "business strategy" appear frequently as trigrams, indicating their importance in
the genre and the relevance of financial and strategic topics in business communication.
How Cluster/N-Gram Analysis Can Be Interpreted?
1. Identifying Key Concepts and Terminology
• By analyzing recurring N-Grams, you can identify the key concepts and terminology of a particular genre or
field. For example, in the business English corpus, frequent clusters like "financial statement" or "market
analysis" indicate the significance of financial and analytical topics in business discourse.
2. Understanding Common Collocations and Formulaic Language
• Cluster/N-Gram analysis helps reveal common collocations (words that frequently appear together). These
clusters may represent formulaic language or fixed expressions typical of a specific genre.
• For example, in academic writing, you might find clusters like "according to the study" or "as shown in
Table", which reflect conventional structures used in research papers.
3. Comparing Clusters Across Corpora
• By performing N-Gram analysis on different corpora, you can compare the frequency of specific phrases
across genres or time periods. For instance, you could compare clusters in academic writing and media
articles to see how language differs between formal and informal writing styles.
Advantages of Cluster/N-Gram Analysis
• Uncovers Phrase Patterns: It helps identify common expressions and
lexical bundles that are characteristic of a particular genre or
discourse.
• Simplifies Large-Scale Analysis: Instead of manually searching for
recurring phrases, the tool automatically identifies frequent word
sequences in large corpora.
• Genre-Specific Insights: It provides insights into language conventions
used in specialized fields, such as business, law, or academia.
Collocates Tab
• This tab is designed to identify words that frequently co-occur with a
search term within a specific window size (e.g., within five words to
the left or right of the search term).
• Steps:
• Enter a search term.
• Define the collocation window (number of words before and after).
• Click "Start" to view the collocates along with their frequency.
• Use: Collocation analysis helps in understanding word associations
and the behavior of words in context.
Collocation Analysis
• Collocation analysis is a key method in corpus linguistics that involves
studying the co-occurrence of words—how certain words tend to appear
together more frequently than would be expected by chance. This can
reveal meaningful associations between words, highlighting semantic,
syntactic, or idiomatic relationships. By examining collocations, researchers
gain insights into language patterns, word behavior, and phraseology that
go beyond single-word frequency analysis.
• In AntConc, collocation analysis helps you understand the relationships
between words and identify how particular terms are typically used in
context. Words that frequently co-occur are often linked by specific
semantic or grammatical rules, such as common verb-object relationships
(e.g., "make a decision") or adjective-noun pairs (e.g., "strong leader").
What Are Collocations?
• Collocations are pairs or groups of words that are statistically likely to
appear together in natural language. For example:
• Verb-Noun Collocation: "make a decision," "give advice"
• Adjective-Noun Collocation: "heavy rain," "strong argument"
• Adverb-Verb Collocation: "deeply regret," "strongly recommend"
• Noun-Noun Collocation: "data analysis," "traffic jam"
Types of Collocation Relationships
• Lexical Collocations: Words that often appear together to form meaningful
units, such as "fast food" or "commit a crime."
• Grammatical Collocations: Words that follow certain grammatical patterns,
like verbs and prepositions, such as "depend on" or "interested in."
Steps for Performing Collocation Analysis in AntConc
• Step 1: Load the Corpus
• Begin by opening AntConc and loading your text corpus (e.g., academic papers, novels, newspaper articles)
in plain text format (.txt).
• A well-structured corpus that aligns with your research focus will yield more relevant collocation insights. For
example, if you're studying medical discourse, you could load a corpus of medical journals and reports.
• Step 2: Open the Collocates Tab
• Click on the "Collocates" tab in AntConc. This tab is specifically designed to help you identify words that
frequently co-occur with a target word, also known as collocates.
• Collocates will be words that are typically found within a defined window of the target word (e.g., five words
before or after the keyword).
• Step 3: Set Parameters for Collocation Search
• Enter the Target Word: In the search box, type the word whose collocates you want to explore. For example,
if you’re analyzing business texts and want to study collocations of "market," enter "market" as the target
word.
• Define the Collocation Window: Set the window size, which determines how many words to the left and
right of the target word should be considered when looking for collocates. A common choice is a window of
5 words on either side, but this can be adjusted based on the context you're interested in.
• Frequency Cut-off: Set a frequency threshold if you only want to examine collocates that appear frequently.
This helps you avoid collocates that occur rarely and might not be relevant to your analysis.
• Step 4: Start the Collocation Search
• Click Start to begin the search. AntConc will generate a list of words that frequently appear
around your target word, along with the frequency of co-occurrence.
• Step 5: Analyze the Collocates
• Once the results are displayed, you will see the collocates of your target word, along with their
frequencies.
• The frequency indicates how often the target word and its collocates appear together within the window size
you defined.
• You can sort the collocates by frequency or statistical measures like Mutual Information (MI), which indicates
how strongly two words are associated.
• For example, if you search for collocates of "market" in a business corpus, you might find:
• "share" (as in "market share")
• "growth" (as in "market growth")
• "analysis" (as in "market analysis")
• Step 6: Interpret the Results
• The collocation patterns will give you insight into the typical usage of the word in context. For
example:
• "market" may commonly appear with words like "share" and "analysis," suggesting that these are key topics
in business discourse.
• "policy" in a political discourse corpus might collocate with terms like "foreign," "public," or "economic,"
indicating specific types of policies discussed.
Collocation Example: Analyzing "Sustainability" in
Environmental Texts
• Let’s consider an example where you want to analyze the collocations of the word
"sustainability" in a corpus of environmental science papers. You are interested
in understanding how "sustainability" is framed and what concepts or issues are
commonly associated with it.
• Step 1: Load the Environmental Corpus
• Load a corpus of environmental science papers or reports into AntConc.
• Step 2: Search for Collocates of "Sustainability"
• Open the Collocates Tab and type "sustainability" as the target word.
• Set the window size to 5 words on either side, so that AntConc captures any
words frequently appearing within five words to the left or right of
"sustainability."
• Step 3: Set Frequency Threshold
• Set a frequency cut-off of 3, meaning only collocates that appear at least 3 times
in the vicinity of "sustainability" will be included in the results. This filters out
infrequent words that are less likely to have a strong association.
• Step 4: Analyze the Collocates
• The results might show that "sustainability" frequently collocates with words
like:
• "environmental" (as in "environmental sustainability")
• "development" (as in "sustainable development")
• "energy" (as in "sustainability of energy sources")
• "policy" (as in "sustainability policy")
• Step 5: Interpret the Results
• From this analysis, you can interpret that "sustainability" is closely associated
with topics like environmental protection, development, and energy resources
in the corpus. These collocates reveal key concepts and debates in environmental
discourse, emphasizing how sustainability is linked to broader issues of
development and policy.
• For example, the frequent co-occurrence of "development" suggests that the
concept of sustainability is often discussed in relation to economic growth or
infrastructure planning, while "energy" indicates a focus on sustainable energy
sources.
Advanced Collocation Measures: Mutual Information (MI)
• In addition to basic frequency counts, AntConc and other tools can provide
statistical measures like Mutual Information (MI) to assess how strongly
two words are associated. The MI score compares the actual co-occurrence
of two words with their expected co-occurrence by chance:
• A high MI score indicates a strong association between the words (i.e.,
they appear together more frequently than expected by chance).
• A low MI score might indicate that the words co-occur less frequently than
would be expected, or that the association is more random.
• For example:
• "sustainable development" might have a high MI score, indicating that
these two words often appear together in environmental discourse.
• "sustainable energy" may also show a high MI score, revealing its
importance in discussions about renewable energy.
Applications of Collocation Analysis
• 1. Studying Lexical Bundles
• In academic writing, collocation analysis can reveal common lexical bundles, such
as "on the other hand" or "in the context of." These bundles reflect formulaic
language typical of specific genres.
• 2. Understanding Word Behavior
• Collocation patterns help linguists understand how specific words behave
grammatically or semantically. For example, "run" may collocate with different
words depending on its meaning: "run a company" (manage) vs. "run fast"
(move quickly).
• 3. Exploring Semantic Fields
• Collocation analysis helps in mapping semantic fields, showing how words cluster
around certain topics or concepts. For instance, in a corpus of news articles,
words like "government", "policy", and "election" might frequently co-occur,
showing how these topics are interrelated in political discourse.
Word List Tab & Keyword List Tab
Word List tab generates a list of all the words in your corpus, sorted by frequency.
This can be useful for identifying the most common words and understanding the
general lexicon of the corpus.
Steps: Load your corpus files.
• Click "Start" to generate the word list.
• You can sort the list by frequency or alphabetically.
• Use: Essential for frequency analysis and identifying key terms.
Keyword List Tab is used to identify keywords—words that are statistically more
frequent in your target corpus compared to a reference corpus.
Steps: Load both the target and reference corpora.
• Click "Start" to generate a list of keywords.
• The tool will display the keyness value, which indicates how significant each
keyword is compared to the reference corpus.
Use: Helps in identifying words that are characteristic of a particular genre or text.