-
Your shopping cart is empty.
Your shopping cart is empty.
Shell Redirection
One of the key pieces of the Unix philosophy is the idea of making small, simple tools that do one thing extremely well and can easily connect with other such tools. The facility for this in Unix is redirection: the ability to manipulate the flow of input and output of shell commands.
Terminals
In Unix, almost everything is represented as a file. This was a design choice that enables any program that can deal with files to deal with any part of the system. In the bad old days, your interaction with the computer was through a dumb terminal (a screen and keyboard) attached via a serial cable. The dumb terminal is the epitome of the thin client in that it can only talk to the server, it can't do any data processing. The computer would have many serial lines coming into it feeding these dumb terminals. Each of these was represented in /dev as a tty (teletype) file. Writing to this file sent data to that terminal's screen. Reading from that file pulled data from that terminal's keyboard. Terminal interaction is still pretty much the same today, except instead of dealing with a real serial terminal (tty) you're dealing pseudo terminals (pts) over a network connection, such as an ssh session. Streams
There's a networking protocol/technology thingie in the Unix world called STREAMS. This is not what we're talking about. In Unix, all character I/O (/dev/ devices of type 'c') is a stream: a flow of characters/bytes going from one place to another. Output from a program to the terminal is a stream, input from the terminal to a program is a stream, and many other 'flows' of data are streams.
Standard Streams
There are three standard streams that all programs get access to by default: standard input (stdin) which by default goes from terminal to the current foreground program
standard output (stdout) which by default goes from the current foreground program to the terminal
standard error (stderr) which by default also goes from the program to the terminal, but as a separate flow from stdout
These streams can be thought of as pipes with data flowing through them from one place to another. Each of them says 'by default' because you can reroute these flows with redirection.
Redirection
Redirection is essentially a trick in the Unix system. Programs ask for data to be sent to or read from the terminal and it just seems to happen. The operating system is doing all the dirty work to make sure that it happens. Because the operating system is doing all the work, it can do it however it likes. It can discard all the data and for the most part the program will be none the wiser. With redirection, the operating system gets the data to be read from the terminal from another source or takes the data meant for the terminal and sends it some place else. stdout and stderr are kept separate in case you want to redirect regular program output... error messages will still be visible or can be routed separately.
There are essentially two types of redirection: redirection with files and redirection with other programs. As you may guess, redirection with files involves reading input from a file rather than the terminal or writing data to a file rather than the terminal. Redirection with programs involves making the output of one program the input of another program. File Redirection
There are four symbols used in file redirection: >, >>, <, and <<.
command > file - Takes the stdout from command and dumps all the data into file, after it truncates the file, erasing all contents
command >> file - Takes the stdout from command and adds it to the end of file. This erases no data from the file
command < file - Takes the contents of file and feeds it to command as stdin. This could be used to save the commands that you would normally give to an interactive program to a file and "replay" them
command << TERMINATOR - Reads from the current stdin and feeds to command as stdin until TERMINATOR is found on a line by itself. This is an obscure redirection that is seldom used outside of scripts. This construction is referred to as a "HERE Document" because often, people would use HERE as the terminator to say the input ends HERE
Another item commonly associated with redirection that doesn't quite follow the same pattern is putting a command in backticks (grave accents), such as `id -u`, the out of the command in backticks will directly replace the text between the backticks.
Redirection With Pipes
Redirecting the output of one command to be the input of another is known as piping and is accomplished by the pipe (|) character. Placing the pipe between commands is all that is required to have the second command process the output of the first. Some common examples are:
ls -l | grep root
ps awfux | grep Mar
dmesg | tail
Redirecting stderr
When redirecting using >, >>, and |, you're grabbing stdout and not stderr. This is intentional and allows you to see errors and keep them separate from the desired information. If you want to grab stderr, you use 2>, 2>>, and 2|. The number is because interally there are numbers to represent open files and stdin, stdout, and stderr are given 0, 1, and 2 respectively. If you want to send both to the same place, you can do 2>&1, such as command > file 2>&1.
Essential Filters
In the Unix world, a command that reads a stream, manipulates it, and spits it back out is called a filter. Certain filters are so essential that you'll find yourself using them over and over again when investigating a complex issue. Below are some sample filters to help slice and dice your way through the flood of information that Unix commands can spit out.
more/less
More is your basic 'paging' application. It determines your screen size and stops output from flying by so that you get a chance to view it all. Spacebar bumps you down a page, enter bumps you down a line. Most Linux distros today provide 'less' and make 'more' point to 'less'. The less program does the same as 'more', and more. It will allow you to Page Up and Page Down, search in the same way you do with vi, and more.
grep
Possible the most commonly used filter ever. grep (short for regular expression) only includes lines output that match given criteria in the form of a regular expression. With the -v option you can make grep provide every line that doesn't match the supplied criteria. Some versions of grep allow you to provide multiple expressions as possible matches with -e, and you can pipe from one grep to another to provide additional filtering.
awk
Where grep is strainer of the command line, awk is the pairing knife. It can do an incredible amount of manipulation of a stream, but the most common use is to isolate a certain column of output with awk's print function. For example, to print the 3rd column of 'ps' output, do:
ps | awk '{print $3}'
You can specify the field/column separator with the -F option. To print the last two bytes of the local machine's MAC address you could do:
ifconfig | grep HW | awk '{print $5}' | awk -F: '{print $5$6}'
sed
The sed program is the stream editor. It allows you to do a text search and replace with regular expressions on a stream. In it's simplest invocation you can use it to remove something:
cat | sed 's/myname//g'
The 's/myname//g' portion is a substitution. The format for this is 's/from/to/options'. In this case, I'm changing 'myname' to an empty string which should remove it. By default, sed only works on the first instance of a match it sees on a line. The 'g' option means 'greedy', instructing sed to preform the substitution as much as possible. Both sed and awk are extremely powerful and getting into many of their basic uses is beyond the scope of this page.
sort
Sort does exactly what the name implies: it sorts all the lines of input it receives. By default it does the conventional alphanumeric way, looking only at characters and not at actual numbers (so 2000 would appear before 30 because 2 is less than 3). It has options to make it smart with respect to numbers and other criteria, and you can make it sort by a specific column with the -k option. For example, to have a list of connections in netstat grouped so that connections from the same address appear together, do:
netstat -n | sort -k 5
uniq
The uniq filter is almost self explanatory. It removes duplicate lines from a stream, leaving only the first instance of each. tail/head
The tail and head filters are very frequently used. The allow you to see the first or last few lines of a file or stream. You can specify the number of lines with the -n or just - options. Conversely, you can see all but the last lines, you can do head +. To see all but the first lines you can do tail +. An incredibly useful feature of tail is that you can have it 'follow' the end of a file so that you see new output as it is added with the -f option. This is incredibly useful for watching logs.
xargs
The xargs command isn't actually a filter since it consumes a stream but doesn't necessarily spit anything out. You pipe a stream to xargs then give it command. That stream is read and its contents are passed to the command as arguments. For example, let's say we want to kill every process belonging to a certain user and we don't know the pkill command:
ps aux | grep | awk '{print $1}' | xargs kill