Unix environment: Difference between revisions

From genomewiki
Jump to navigationJump to search
m (cut and paste desktop)
Line 19: Line 19:
==grep==
==grep==


The [http://en.wikipedia.org/wiki/Grep grep] command is used to find lines in text files, or in streaming output from previous pipeline commands.  Given a regular expression, any line matching
The [http://en.wikipedia.org/wiki/Grep grep] command is used to find lines in text files, or in streaming output from previous pipeline commands.  Given a string, any line matching that string is printed out.  The output can be the inverse, printing out any lines that do <B>not</B> match the string.
that expression is printed out.  The output can be the inverse, printing out any lines that do <B>not</B> match the expression.


Example:  select only the bed format lines from a custom track file so they can be used with hgLoadBed.  Removing any lines that begin with track or browser:
Example:  select only the bed format lines from a custom track file so they can be used with hgLoadBed.  Removing any lines that begin with track or browser, an example of the inverse function with the <em>-v</em> argument:
  grep -v "^track|^browser" customTrack.txt > file.bed
  grep -v "^track|^browser" customTrack.txt > file.bed
Alternatively, if it is known that all chromosome names start with "chr", select only those lines:
Alternatively, if it is known that all chromosome names start with "chr", select only those lines:
  grep "^chr" customTrack.txt > file.bed
  grep "^chr" customTrack.txt > file.bed
Both these examples assume there is only one track defined in customTrack.txt
Both these examples assume there is only one track defined in customTrack.txt
If you want your string to instead be an actual regular expression, use <em>grep</em> with the <em>-E</em> argument,
or use the <em>egrep</em> command which is equivalent to <em>grep -E</em>.
To efficiently scan an entire directory hierarchy of files, use the following <em>find | xargs grep</em> pipeline:
find . -type f -print0 | xargs --null grep "<your string>"
The <em>-print0</em> argument to <em>find</em> combined with the <em>--null</em> argument on <em>xargs</em> makes this pipeline
work properly even if the file names include blanks.  If you know all your file names have no blanks,
omit the <em>-print0</em> and <em>--null</em>.


==See also==
==See also==

Revision as of 20:55, 4 May 2010

Working in the UNIX environment

Editor

The most important tool will most likely be your editor. It doesn't matter what you want to use, but whatever it is, learn it well. vi and emacs are the most common editors used in the unix environment. Your choice of editor will become critical when you use your shell command line in its editing mode. There are very good tutorials on the internet for your editor. There is a VI quick start command listing in genomewiki. See also: Editor War.

Shell

There are two shells in common use on unix: bash and tcsh. Next to your editor, your shell command line is going to be a critical element of your efficiency using unix. You will want the command line editing features turned on for your command line to recognize your favorite editor commands. Learn how to use your command line editing feature.

Understand what stdout, stderr and stdin are and how to control their input and output in compound shell commands. There are very good bash and tcsh tutorials on the internet. You will never again compose a long command line just to find out it has a typo error in it, and then have to type the whole thing in again. Use your command line editor to rapidly fix the typo to repeat the corrected command.

Also, verify that you can easily cut and paste between your shell command line and other applications on your desktop. This function depends upon what kind of desktop you operate. Each desktop may have different mechanisms for this function.

Regular Expressions

You will be using regular expressions in your editor, your shell and in other commands. Just about everything. You will need to know how to use them. You can get pretty far with a minimal familiarity of the basics. Keep a reference handy for the odd cases where you need to use the more extensive operations.

grep

The grep command is used to find lines in text files, or in streaming output from previous pipeline commands. Given a string, any line matching that string is printed out. The output can be the inverse, printing out any lines that do not match the string.

Example: select only the bed format lines from a custom track file so they can be used with hgLoadBed. Removing any lines that begin with track or browser, an example of the inverse function with the -v argument:

grep -v "^track|^browser" customTrack.txt > file.bed

Alternatively, if it is known that all chromosome names start with "chr", select only those lines:

grep "^chr" customTrack.txt > file.bed

Both these examples assume there is only one track defined in customTrack.txt

If you want your string to instead be an actual regular expression, use grep with the -E argument, or use the egrep command which is equivalent to grep -E.

To efficiently scan an entire directory hierarchy of files, use the following find | xargs grep pipeline:

find . -type f -print0 | xargs --null grep "<your string>"

The -print0 argument to find combined with the --null argument on xargs makes this pipeline work properly even if the file names include blanks. If you know all your file names have no blanks, omit the -print0 and --null.

See also

Unix-Haters Handbook