Part 4: Mastering Text Processing in Linux with awk, sed, and grep

Part 4: Mastering Text Processing in Linux with awk, sed, and grep

Introduction

In this part of the blog series, we'll delve into some powerful command-line tools in Linux: awk, sed, and grep. These tools are essential for text processing and data extraction tasks, making them invaluable for system administrators and developers alike.

awk Command

awk is a versatile command-line tool used for pattern scanning and processing. It interprets data as fields and records, which can be manipulated using various operations.

Basic Examples

  1. Print the entire file:

     awk '{print}' app.log
    
  2. Print the first and second fields:

     awk '{print $1,$2}' app.log
    
  3. Print the first, second, and fourth fields:

     awk '{print $1,$2,$4}' app.log
    
  4. Filter and print specific lines:

     awk '/mailbox_register/ {print $1,$2,$4}' app.log
    

Advanced Examples

  1. Count occurrences of a pattern:

     awk '/mailbox_register/ {count++} END {print count}' app.log
    

    Output: 13

  2. Display a custom message with the count:

     awk '/mailbox_register/ {count++} END {print "The Count of mailbox_register is: " count}' app.log
    

    Output: The Count of mailbox_register is: 13

  3. Filter records by time range:

     awk '$2 >= "08:51:00" && $2 <="08:51:04" {print $2,$3,$4}' app.log
    
  4. Print specific line numbers:

     awk 'NR >=2 && NR <10 {print NR, $2}' app.log
    

sed Command

sed is a stream editor for filtering and transforming text.

Basic Examples

  1. Print lines matching a pattern:

     sed -n '/mailslot_create/p' app.log
    
  2. Replace text in the file:

     sed 's/mailslot_create/CREATE/g' app.log
    
  3. Print line numbers matching a pattern:

     sed -n -e '/mailbox_register/=' app.log
    
  4. Combine multiple operations:

     sed -n -e '/mailbox_register/=' -e '/INFO/p' app.log
    
  5. Replace text within a range of lines:

     sed '1,10 s/INFO/LOG/g' app.log
    
  6. Replace text and print lines within a range:

     sed '1,10 s/INFO/LOG/g; 1,10p;11q' app.log
    

grep Command

grep is used for searching plain-text data for lines that match a regular expression.

Basic Examples

  1. Search for a pattern:

     grep INFO app.log
    
  2. Case-insensitive search:

     grep -i info app.log
    
  3. Count occurrences of a pattern:

     grep -i -c info app.log
    
  4. Count occurrences using awk:

     awk '/INFO/ {count++} END {print count}' app.log
    

Combining Commands

Combining ps, grep, and awk can be particularly powerful for process management:

  1. List all processes:

     ps aux
    
  2. Filter processes by name:

     ps aux | grep ubuntu
    
  3. Extract specific fields:

     ps aux | grep ubuntu | awk '{print $2}'
    

Conclusion

Understanding and mastering these commands can significantly enhance your ability to handle and manipulate text files and system processes in Linux. By practicing these commands and exploring their options, you'll gain greater proficiency and efficiency in your daily tasks.


Practical Task

Create a directory, generate a log file, and apply the above commands to practice and solidify your understanding.

mkdir logs
echo -e "08:51:01 INFO :main: Starting process\n08:51:02 INFO :process: Running\n08:51:03 WARN :main: Low memory" > logs/app.log

Experiment with the awk, sed, and grep commands on your generated app.log to see the results firsthand.