[go: up one dir, main page]

0% found this document useful (0 votes)
2 views10 pages

ICB Lecturenote 7

Chapter 7 focuses on working with text and data files in Linux/Unix systems, covering commands for viewing, sorting, and editing files. Key tools discussed include 'cat' for file manipulation, 'sort' for organizing data, and 'uniq' for filtering duplicates. By the end of the chapter, readers will be proficient in handling various file operations efficiently.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views10 pages

ICB Lecturenote 7

Chapter 7 focuses on working with text and data files in Linux/Unix systems, covering commands for viewing, sorting, and editing files. Key tools discussed include 'cat' for file manipulation, 'sort' for organizing data, and 'uniq' for filtering duplicates. By the end of the chapter, readers will be proficient in handling various file operations efficiently.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Chapter 7

Playing with Text and Data Files

Working with text and data files is an essential part of using Linux/Unix
systems.

These systems store most of their configuration, logs, and data in text files.
Knowing how to view, analyze, edit, and manipulate such files quickly
can save time and improve efficiency.

In this chapter, we will learn:


 How to view and analyze text files in different ways.
 Commands to sort, search, split, and compare file contents.
 Basic and advanced editing tools like Pico and Vim.
By the end of this chapter, you will be able to:
 Open and read files in various formats.
 Filter and organize data.
 Create and edit files efficiently.

7.1.1 A Quick Start: cat


Definition:
cat (short for concatenate) is a Linux command used to view the contents
of files, combine multiple files, and even create new files.
Common Uses:
1. View the content of a file
cat filename.txt

Displays the full content of filename.txt on the terminal.


2. Combine and display multiple files
cat file1.txt file2.txt

Shows the contents of file1.txt followed by file2.txt.


3. Create a new file
cat > newfile.txt

 Type your content.


 Press CTRL+D to save and exit.
4. Append text to an existing file
cat >> existing.txt

 Type new content.


 Press CTRL+D to save without overwriting.
Example:
cat fruits.txt

Output:
Apple
Banana
Mango
Orange

7.1.2 Text Sorting


Definition:
sort is a Linux command used to arrange the lines of a text file in alphabetical or
numerical order. It can also reverse the order or sort based on specific fields.
Common Uses:
1. Sort alphabetically
sort filename.txt

Arranges lines in ascending (A–Z) order.


2. Sort in reverse order

sort -r filename.txt

Arranges lines in descending (Z–A) order.


3. Sort numerically

sort -n numbers.txt

Sorts lines as numbers instead of text.


4. Sort by a specific column (useful for tables)

sort -k2 data.txt

Sorts based on the second column.


Example:
cat fruits.txt
Mango
Apple
Orange
Banana

Command:
sort fruits.txt

Output:
Apple
Banana
Mango
Orange

Example 2 – Sorting and Removing Duplicates


File: names.txt
Rahul
Anita
Rahul
Suman
Anita
Command:

sort -u names.txt

Output:

Anita
Rahul
Suman

You can combine options, e.g., sort -nr for numeric sorting in reverse order.

7.1.3 Extract Unique Lines

Definition:
The uniq command in Linux is used to filter out repeated lines from a file.
However, it only removes consecutive duplicates, so files should be sorted first for
best results.

Common Uses:
1. Remove consecutive duplicates

uniq filename.txt

Displays the file content with consecutive duplicates removed.


2. Count occurrences of each line

uniq -c filename.txt
Shows how many times each line appears.

3. Show only duplicate lines

uniq -d filename.txt

Displays only the lines that appear more than once.


4. Show only unique lines

uniq -u filename.txt

Displays lines that appear exactly once.


Example – Removing Duplicates
File: names.txt
Anita
Anita
Rahul
Rahul
Suman

Command:
uniq names.txt

Output:
Anita
Rahul
Suman

Example – Counting Occurrences


uniq -c names.txt

Output:
2 Anita
2 Rahul
1 Suman

For accurate results with all duplicates removed, combine sort with uniq:
sort names.txt | uniq

Search Commands
• /regex→ Searches forward for regex.
Example: /apple → moves cursor to first "apple".

• ?regex→ Searches backward for regex.


Example: ?apple → searches upward for "apple".

• n→ Jumps to the next match.


Example: after /apple, press n to go to the next "apple".

• Shift + n → Jumps to the previous match.

Substitute Commands
1. Current line only
• :s/regex/xyz/
→ Replaces first occurrence of regex in the current line.
Example: :s/apple/orange/

o Line: apple is red → becomes → orange is red.


• :s/regex/xyz/g
→ Replaces all occurrences of regex in the current line.
Example: :s/apple/orange/g

o Line: apple apple pie → becomes → orange orange pie.


• :s/regex/xyz/c
→ Asks for confirmation before each replacement in the line.
Example: :s/apple/orange/c

o Vim will ask replace with orange? (y/n/a/q/l) for each match.

2. Whole file
• :%s/regex/xyz/g
→ Replaces all occurrences of regex in the whole file.
Example: :%s/apple/orange/g

o File becomes:
o orange is red
o orange is sweet
o banana is yellow
o orange pie is tasty
• :%s/regex/xyz/gc
→ Same as above, but asks for confirmation before replacing each one.

3. Between specific lines


• :x,ys/regex/xyz/g
→ Replace between line x and line y.
Example: :2,3s/apple/orange/g

o Only lines 2 and 3 will be checked:


o apple is red
o orange is sweet
o banana is yellow
apple pie is tasty

Finding Matching Lines of Text using grep, egrep


1. Using grep
grep is used for basic text matching with normal regex.
Example 1: Find lines containing "human"
grep "human" genomes.txt
output: H. sapiens (human) - 3,400,000,000 bp - 30.000 genes
Example 2: Find lines containing "human" with line no
grep -n "human" genomes.txt
Output: 1: H. sapiens (human) - 3,400,000,000 bp - 30.000 genes
2. Using egrep (or grep -E)
egrep allows extended regex patterns (like |).

Example 1: Match multiple patterns (|)


egrep 'bacteria|human' genomes.txt
Output: H. sapiens (human) - 3,400,000,000 bp - 30.000 genes
E. coli (bacteria) - 4,670,000 bp - 3237 genes

Text File Comparisons using diff command

Suppose we have two files:

file1.txt
apple
banana
grapes
mango
file2.txt
apple
banana
orange
mango

1.Basic Comparison using “diff”


diff file1.txt file2.txt
Output:
3c3
< grapes
---
> orange
Meaning:
• Line 3 changed (c)

• In file1.txt it was grapes

• In file2.txt it is orange

2. less command
• It’s faster and safer than opening large files in editors like nano or vi.
• It doesn’t modify files — read-only view.
• Allows scrolling, searching, and navigation easily.
Ex: less filename

Key / Command Action


/word Search forward for a word
?word Search backward for a word
n Repeat the last search in the same direction
N Repeat the last search in the opposite direction
g Go to the beginning of the file
G Go to the end of the file
q Quit less
• Open multiple files:
less file1.txt file2.txt
Then use:
• :n → Next file

• :p → Previous file

Search for a word:


Inside less, type:
Ex: /any word

3. Counting Characters, words and Lines


Ex: wc genomes.txt

6 lines, 42 words and 246 characters


It also includes invisible lines break.
4. Splitting Files into pieces
Split command is used to split the file into a series files.
split [options] filename [prefix]
filename → the file you want to split
prefix → (optional) name prefix for output files (default: x)

Ex: split -l 2 genomes.txt genomes.


Explanation

split The command to split files


-l 2 Split into chunks of 2 lines per file
genomes.txt The input file to be split
genomes. The prefix for output files

You might also like