Bash: Read File Line by Line

Problem

Let’s say we have a file named in.txt with the following content:

$ cat in.txt
    # comment
*

and we want to read it line by line and do something with each one of them.

The following could be a solution:

#!/bin/bash

for line in $(cat in.txt); do
    echo "line:$line"
done

Let’s see if it works

 $ ls
 in.txt   null.txt script.sh

 $ ./script.sh
 line:#
 line:comment
 line:in.txt
 line:null.txt
 line:script.sh

Hm, the * character was replaced by the filenames of the current directory, but why did that happen? Some info about a couple of bash commands first.

Command Substitution

Allows the output of a command to replace the command itself. It occurs when a command is enclosed as $(command) or `command`.

Word Splitting

The shell scans the results of parameter expansion, command substitution, and arithmetic expansion that did not occur within double quotes for word splitting.

Filename Expansion

After word splitting, unless the -f option has been set, Bash scans each word for the characters *, ?, and [. If one of these characters appears, then the word is regarded as a pattern, and replaced with an alphabetically sorted list of filenames matching the pattern.

So, Bash treated the * character, that exists in the file in.txt, as a pattern and replaced it with the list of all filenames. What if we put the $(cat in.txt) in double quotes?

$ cat script.sh
#!/bin/bash

for line in "$(cat in.txt)"; do
    echo "line:$line"
done

$ ./script.sh
line:    # comment
*

A little bit better now, but the whole result is in one line. Obviously, none of these solutions can be used. So, let’s remove the double quotes and add the set -f option bash man page mentions.

$ cat script.sh
#!/bin/bash

set -f
for line in $(cat in.txt); do
    echo "line:$line"
done

$ ./script.sh
line:#
line:comment
line:*

Better now, but the # comment line is still divided into 2 lines, because of the space between # and comment.

Solution 1

$ cat script.sh
#!/bin/bash

while read line; do
    echo "line:$line"
done < in.txt

$ ./script.sh
line:# comment
line:*

Nice! The wildcard * is not treated as a pattern anymore, but the leading spaces have been removed. This happened because the read command by default removes all leading and trailing whitespace characters. Fortunately, we can handle this by clearing the IFS variable, like the example below.

Final Solution

$ cat script.sh
#!/bin/bash

# Add IFS= so `read` won't trim leading and trailing whitespace from each line
# Add -r to read to prevent from backslashes from being interpreted as escape sequences
# Use printf in place of echo is safer if $line is a string like -n which echo would interpret as a flag

while IFS= read -r line; do
    printf '%s\n' "$line"
done < in.txt

$ ./script.sh
line:    # comment
line:*

Note

The read command does not read lines! When a line is read, the first word is assigned to the first variable, the second word to the second variable, and so on. If there are more words than variables, the remaining words are assigned to the last variable.

Links

Leave a Reply

Your email address will not be published. Required fields are marked *