10 Practical Grep Command Examples for Developers

10 Practical Grep Command Examples for Developers

Sylvain Leroux
Sylvain Leroux

Brief: The grep command is used to find patterns in files. This tutorial shows some of the most common grep command examples that would be specifically beneficial for software developers.

Recently, I started working with Asciidoctor.js and on the Asciidoctor.js-pug and Asciidoctor-templates.js project. It is not always easy to be immediately effective when you dig for the first time into a codebase containing several thousand of lines. But my secret weapon to find my way through so many code lines is the grep tool.

I am going to share with you how to use grep command in Linux with examples.

Using grep commands in Linux

Grep command example

If you look into the man, you will see that short description for the grep tool: “print lines matching a pattern.” However, don’t be fooled by such humble definition: grep is one of the most useful tools in the Unix toolbox and there are countless occasions to use it as soon as you work with text files.

It is always better to have real-world examples to learn how things work. So, I will use the Asciidoctor.js source tree to illustrate some of the grep capabilities. You can download that source tree from GitHub, and if you want, you may even check out the same changeset I used when writing this article. That will ensure you obtain results perfectly identical to those described in the rest of this article:

git clone https://github.com/asciidoctor/asciidoctor.js
cd asciidoctor.js
git checkout v1.5.6-rc.1

1. Find all occurrences of a string (basic usage)

Asciidoctor.js is supporting the Nashorn JavaScript engine for the Java platform. I do not know Nashorn, so I could take that opportunity to learn more about it by exploring the project parts referencing that JavaScript engine.

As a starting point, I checked if there were some settings related to Nashorn in the package.json file describing the project dependencies:

sh$ grep nashorn package.json
    "test": "node npm/test/builder.js && node npm/test/unsupported-features.js && node npm/test/jasmine-browser.js && node npm/test/jasmine-browser-min.js && node npm/test/jasmine-node.js && node npm/test/jasmine-webpack.js && npm run test:karmaBrowserify && npm run test:karmaRequirejs && node npm/test/nashorn.js",

Yes, apparently there was some Nashorn-specific tests. So, let’s investigate that a little bit more.

2. Case insensitive search in a file set

Now, I want to have a closer look at the files from the ./npm/test/ directory mentioning explicitly Nashorn. A case-insensitive search (-i option) is probably better here since I need to find both references to nashorn and Nashorn (or any other combination of upper- and lower-case characters):

sh$ grep -i nashorn npm/test/*.js
npm/test/nashorn.js:const nashornModule = require('../module/nashorn');
npm/test/nashorn.js:log.task('Nashorn');
npm/test/nashorn.js:nashornModule.nashornRun('jdk1.8.0');

Indeed case insensitivity was useful here. Otherwise, I would have missed the require('../module/nashorn') statement. No doubt I should examine that file in greater details later.

3. Find non-matching files

By the way, is there some non-Nashorm specific files in the npm/test/ directory? To answer that question, we can use the “print non-matching files” option of grep (-L option):

sh$ grep -iL nashorn npm/test/*
npm/test/builder.js
npm/test/jasmine-browser-min.js
npm/test/jasmine-browser.js
npm/test/jasmine-node.js
npm/test/jasmine-webpack.js
npm/test/unsupported-features.js

Notice how with the -L option the output of grep has changed to display only filenames. So, none of the files above contain the string “nashorn” (regardless of the case). That does not mean they are not somehow related to that technology, but at least, the letters “n-a-s-h-o-r-n” are not present.

4. Finding patterns into hidden files and recursively into sub-directories

The last two commands used a shell glob pattern to pass the list of files to examine to the grep command. However, this has some inherent limitations: the star (*) will not match hidden files. Neither it will match files (eventually) contained in sub-directories.

A solution would be to combine grep with the find command instead of relying on a shell glob pattern:

# This is not efficient as it will spawn a new grep process for each file
$ find npm/test/ -type f -exec grep -iL nashorn \{} \;
# This may have issues with filenames containing space-like characters
grep -iL nashorn $(find npm/test/ -type f)

As I mentioned it as comments it the code block above, each of these solutions has drawbacks. Concerning filenames containing space-like characters, I let you investigate the grep -z option which, combined with the -print0 option of the find command, can mitigate that issue. Don’t hesitate to use the comment section at the end of this article to share your ideas on that topic!

Nevertheless, a better solution would use the “recursive” (-r) option of grep. With that option, you give on the command line the root of your search tree (the starting directory) instead of the explicit list of filenames to examine. With the -r option, grep will examine all files in the search directory, including hidden ones, and then it will recursively descend into any sub-directory:

grep -irL nashorn npm/test/npm/
npm/test/builder.js
npm/test/jasmine-browser-min.js
npm/test/jasmine-browser.js
npm/test/jasmine-node.js
npm/test/jasmine-webpack.js
npm/test/unsupported-features.js

Actually, with that option, I could also start my exploration one level above to see in there are non-npm tests that target Nashorn too:

sh$ grep -irL nashorn npm/

I let you test that command by yourself to see its outcome; but as a hint, I can say you should find many more matching files!

5. Filtering files by their name (using regular expressions)

So, there seems to have some Nashorn specific tests in that project. Since Nashorn is Java, another question that could be raised would be “is there some Java source files in the project explicitly mentioning Nashorn?”.

Depending the version of grep you use, there are at least two solutions to answer that question. The first one is to use grep to find all files containing the pattern “nashorn”, then pipe the output of that first command to a second grep instance filtering out non-java source files:

sh $grep -ir nashorn ./ | grep "^[^:]*\.java"
./spec/nashorn/AsciidoctorConvertWithNashorn.java:public class AsciidoctorConvertWithNashorn {
./spec/nashorn/AsciidoctorConvertWithNashorn.java:    ScriptEngine engine = engineManager.getEngineByName("nashorn");
./spec/nashorn/AsciidoctorConvertWithNashorn.java:    engine.eval(new FileReader("./spec/nashorn/asciidoctor-convert.js"));
./spec/nashorn/BasicJavascriptWithNashorn.java:public class BasicJavascriptWithNashorn {
./spec/nashorn/BasicJavascriptWithNashorn.java:    ScriptEngine engine = engineManager.getEngineByName("nashorn");
./spec/nashorn/BasicJavascriptWithNashorn.java:    engine.eval(new FileReader("./spec/nashorn/basic.js"));

The first half of the command should be understandable by now. But what about that “^[\^:]*\\.java” part?

Unless you specify the -F option, grep assumes the search pattern is a regular expression. That means, in addition to plain characters that will match verbatim, you have access to a set of metacharacter to describe more complex patterns. The pattern I used above will only match:

  • ^ the start of the line

  • [^:]* followed by a sequence of any characters except a colon

  • \. followed by a dot (the dot has a special meaning in regex, so I had to protect it with a backslash to express I want a literal match)

  • java and followed by the four letters “java.”

In practice, since grep will use a colon to separate the filename from the context, I keep only lines having .java in the filename section. Worth mention it would match also .javascript filenames. This is something I let try solving by yourself if you want.

6. Filtering files by their name using grep

Regular expressions are extremely powerful. However, in that particular case, it seems overkill. Not mentioning with the above solution, we spend time examining all files in search for the “nashorn” pattern— most of the results being discarded by the second step of the pipeline.

If you are using the GNU version of grep, something which is likely if you are using Linux, you have another solution though with the --include option. This instructs grep to search only into files whose name is matching the given glob pattern:

sh$ grep -ir nashorn ./ --include='*.java'
./spec/nashorn/AsciidoctorConvertWithNashorn.java:public class AsciidoctorConvertWithNashorn {
./spec/nashorn/AsciidoctorConvertWithNashorn.java:    ScriptEngine engine = engineManager.getEngineByName("nashorn");
./spec/nashorn/AsciidoctorConvertWithNashorn.java:    engine.eval(new FileReader("./spec/nashorn/asciidoctor-convert.js"));
./spec/nashorn/BasicJavascriptWithNashorn.java:public class BasicJavascriptWithNashorn {
./spec/nashorn/BasicJavascriptWithNashorn.java:    ScriptEngine engine = engineManager.getEngineByName("nashorn");
./spec/nashorn/BasicJavascriptWithNashorn.java:    engine.eval(new FileReader("./spec/nashorn/basic.js"));

7. Finding words

The interesting thing about the Asciidoctor.js project is it is a multi-language project. At its core, Asciidoctor is written in Ruby, so, to be usable in the JavaScript world, it has to be “transpiled” using Opal, a Ruby to JavaScript source-to-source compiler. Another technology I did not know about before.

So, after having examined the Nashorn specificities, I assigned to myself the task of better understanding the Opal API. As the first step in that quest, I searched all mentions of the Opal global object in the JavaScript files of the project. It could appear in affectations (Opal =), member access (Opal.) or maybe even in other contexts. A regular expression would do the trick. However, once again, grep has some more lightweight solution to solve that common use case. Using the -w option, it will match only words, that is patterns preceded and followed by a non-word character. A non-word character is either the begin of the line, the end of the line, or any character that is neither a letter, nor a digit, nor an underscore:

sh$ grep -irw --include='*.js' Opal .
...

8. coloring the output

I did not copy the output of the previous command since there are many matches. When the output is dense like that, you may wish to add a little bit of color to ease understanding. If this is not already configured by default on your system, you can activate that feature using the GNU --color option:

sh $grep -irw --color=auto --include='*.js' Opal .
...

You should obtain the same long result as before, but this time the search string should appear in color if it was not already the case.

9. Counting matching lines or matching files

I mentioned twice the output of the previous commands was very long. How long exactly?

sh$ grep -irw --include='*.js' Opal . | wc -l
86

That means we have a total 86 matching lines in all the examined files. However, how many different files are matching? With the -l option you can limit the grep output the matching files instead of displaying matching lines. So that simple change will tell how many files are matching:

sh$ grep -irwl --include='*.js' Opal . | wc -l
20

If that reminds you of the -L option, no surprise: as it is relatively common, lowercase/uppercase are used to distinguish complementary options. -l displays matching filenames. -L displays non-matching filenames. For another example, I let you check the manual for the -h/-H options.

Let’s close that parenthesis and go back to our results: 86 matching lines. 20 matching files. However, how are distributed the matching lines in the matching files? We can know that using the -c option of grep that will count the number of matching lines per examined file (including files with zero matches):

grep -irwc --include='*.js' Opal .
...

Often, That output needs some post-processing since it displays its results in the order in which the files were examined, and it also includes files without any match— something that usually does not interest us. That latter is quite easy to solve:

grep -irwc --include='*.js' Opal . | grep -v ':0$'

As about ordering things, you may add the sort command at the end of the pipeline:

sh$ grep -irwc --include='*.js' Opal . | grep -v ':0$' | sort -t: -k2n

I let you check the sort command manual for the exact meaning of the options I used. Don’t forget to share your findings using the comment section below!

10. Finding the difference between two matching sets

If you remember, few commands ago, I searched for the word “Opal.” However, if I search in the same file set for all occurrence of the string “Opal,” I obtain about twenty more answers:

sh$ grep -irw --include='*.js' Opal . | wc -l
86
sh$ grep -ir --include='*.js' Opal . | wc -l
105

Finding the difference between those two sets would be interesting. So, what are the lines containing the four letters “opal” in a row, but where those four letters do not form an entire word?

This is not that easy to answer that question. Because the same line can contains both the word Opal as well as some larger word containing those four letters. But as a first approximation, you may use that pipeline:

sh$ grep -ir --include='*.js' Opal . | grep -ivw Opal
./npm/examples.js:  const opalBuilder = OpalBuilder.create();
./npm/examples.js:  opalBuilder.appendPaths('build/asciidoctor/lib');
./npm/examples.js:  opalBuilder.appendPaths('lib');
...

Apparently, my next stop would be to investigate the opalBuilder object, but that will be for another day.

The last word

Of course, you will not understand a project organization, even less the code architecture, by just issuing a couple of grep commands! However, I find that command unavoidable to identify benchmarks and starting points when exploring a new codebase. So, I hope this article helped you to understand the power of the grep command and that you will add it to your tool chest. No doubt you will not regret it!



Join the conversation.