8000 GitHub - jude-miller-dev/Vocabulary-list-generator-for-the-text: 一个 C++ 程序,从英文文本提取单词并统计词频,生成字母升序和词频降序两份词汇表。采用随机基准快速排序。A C++ program that extracts words from English text, counts their frequencies, and generates two vocabulary lists sorted in alphabetical ascending order and frequency descending order. Optimized with quicksort using a random pivot · GitHub
[go: up one dir, main page]

Skip to content

jude-miller-dev/Vocabulary-list-generator-for-the-text

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

中文版简介: 跳转

English version intro: GoTo


主要功能:从txt中提取词频,生成词汇表

项目简介

一个轻量级 C++ 程序,实现从英文文本文件中提取单词、统计词频,并生成 按字母升序按词频降序 两种格式的词汇表。程序采用随机基准的快速排序优化排序效率与稳定性,适用于算法实验、文本词频分析学习场景。

项目结构

Algorithm-Design-Practice-Experiment/
├── Input.txt       # 待处理的英文文本文件
├── main.cpp        # 核心源码文件
└── main.exe       # 打包成的可执行文件

Input.txt

In mathematics and computer science, an algorithm is a self-contained step-by-step set of operations to be performed. Algorithms exist that perform calculation, data processing, and automated reasoning.
An algorithm is an effective method that can be expressed within a finite amount of space and time and in a well-defined formal language for calculating a function. Starting from an initial state and initial input (perhaps empty), the instructions describe a computation that, 
.....

运行结果

单词 频率
a 15
algorithm 5
algorithms 10
ambiguities 1
ambiguous 1
amount 1
an 7
and 12
......

分析

时间复杂度(快速排序):平均 (O(nlog n)),最差O(n^2)


Main function: extract words and frequency from txt, and generate word list.

Project Introduction

A lightweight C++ program that extracts words from English text files, counts word frequencies, and generates two types of vocabulary lists: sorted alphabetically in ascending order and sorted by frequency in descending order. The program uses quick sort with a random pivot to optimize sorting efficiency and stability, making it suitable for algorithm experiments and text frequency analysis learning scenarios.

Project Structure

Algorithm-Design-Practice-Experiment/
├── Input.txt       # English text file to be processed
├── main.cpp        # Core source code file
└── main.exe        # Compiled executable file

Input.txt

In mathematics and computer science, an algorithm is a self-contained step-by-step set of operations to be performed. Algorithms exist that perform calculation, data processing, and automated reasoning.
An algorithm is an effective method that can be expressed within a finite amount of space and time and in a well-defined formal language for calculating a function. Starting from an initial state and initial input (perhaps empty), the instructions describe a computation that, 
.....

Execution Results

Word Frequency
a 15
algorithm 5
algorithms 10
ambiguities 1
ambiguous 1
amount 1
an 7
and 12
......

Analysis

Time complexity (Quick Sort): Average (O(nlog n)), worst-case (O(n^2))

About

一个 C++ 程序,从英文文本提取单词并统计词频,生成字母升序和词频降序两份词汇表。采用随机基准快速排序。A C++ program that extracts words from English text, counts their frequencies, and generates two vocabulary lists sorted in alphabetical ascending order and frequency descending order. Optimized with quicksort using a random pivot

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

0