Dialogue Act Tagging

Implementing two dialogue act taggers using Conditional Random Field.

Baseline tagger features:

Speaker change indicator.
First utterance indicator.
A feature for each tokens in the utterance.
A feature for each part-of-speech in the utterance.

Advanced Tagger:

Length of utterance.
Bigrams of tokens.
various features extracted from tokens string: e.g. IS_UPPER to indicate whether the token is in upper case.
A feature for each token and a feature for each POS, like in the baseline.
Non-words sounds in the utterance, e.g. <Laughter>
Non-words sounds in the previous utterance, e.g. <Laughter>
Speaker change indicator: as in the baseline.
A feature for each part-of-speech in the previous utterance.
A Bias feature.

Data set

The Switchboard (SWBD), which is a collection of phone dialogues of volunteers over a predetermine topics, is used. The tags in the corpus are the SWBD-DAMSL dialogue acts. See this for the annotation manual.

Conditional Random Field

A python interface of the CRFsuite is used, see this for installation and documentation.

Credit

hw2_corpus_tool.py is written by Christopher Wienberg, a previous TA for CSCI544 at USC.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
advanced_tagger.py		advanced_tagger.py
baseline_tagger.py		baseline_tagger.py
hw2_corpus_tool.py		hw2_corpus_tool.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dialogue Act Tagging

Baseline tagger features:

Advanced Tagger:

Data set

Conditional Random Field

Credit

About

Releases

Packages

Languages

alturkim/dialogue-act-tagging

Folders and files

Latest commit

History

Repository files navigation

Dialogue Act Tagging

Baseline tagger features:

Advanced Tagger:

Data set

Conditional Random Field

Credit

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages