My thoughts as an enterprise Java developer.

Thursday, February 18, 2021

Command line tool Introduction

 

Goal

Introduce some command line tools and how that can work together to solve common problems.

Notes

There are many commands included with a *nix system that are useful for research. The commands may have differences on different systems so the following works on Mac OS X.

Read the links to learn more about the tools – they can do much more than is presented here.

There are more efficient ways to achieve the same results as some of these example but the examples were chosen to be a simple introduction that focuses on how the tools can be used together.

Sample files based on https://www.w3schools.com/xml/cd_catalog.xml.

 

Finding things in files

grep searches files for lines that match a pattern. It uses regular expressions.

Example: Find XML values of "EU"

grep ">EU<" *.xml

cd_catalog.xml: <COUNTRY>EU</COUNTRY>

cd_catalog.xml: <COUNTRY>EU</COUNTRY>

cd_catalog.xml: <COUNTRY>EU</COUNTRY>

cd_catalog.xml: <COUNTRY>EU</COUNTRY>

cd_catalog.xml: <COUNTRY>EU</COUNTRY>

cd_catalog2.xml: <COUNTRY>EU</COUNTRY>

Linking multiple commands together

A command line pipe allows you to pass the results of one command to another command. This allows you to link many simple programs together to complex solutions in the same way that 26 letters can be put together to make millions of words.

Example: Find XML values of "EU" that also are in the N1 segment

grep ">EU<" *.xml | grep "<COUNTRY>"

cd_catalog.xml: <COUNTRY>EU</COUNTRY>

cd_catalog.xml: <COUNTRY>EU</COUNTRY>

cd_catalog.xml: <COUNTRY>EU</COUNTRY>

cd_catalog.xml: <COUNTRY>EU</COUNTRY>

cd_catalog.xml: <COUNTRY>EU</COUNTRY>

cd_catalog2.xml: <COUNTRY>EU</COUNTRY>

 

Check your command by looking at the first few results

head shows only the first 10 results.

Example: Find XML values of "EU" and look at the first few results.

grep ">EU<" *.xml | head

cd_catalog.xml: <COUNTRY>EU</COUNTRY>

cd_catalog.xml: <COUNTRY>EU</COUNTRY>

cd_catalog.xml: <COUNTRY>EU</COUNTRY>

cd_catalog.xml: <COUNTRY>EU</COUNTRY>

cd_catalog.xml: <COUNTRY>EU</COUNTRY>

cd_catalog2.xml: <COUNTRY>EU</COUNTRY>

Make changes to results

sed allows you to change lines. I use the search and replace feature that uses regular expressions.

Example: Find all file names with an XML value of "EU".

grep ">EU<" *.xml | sed "s/:.*//"

cd_catalog.xml

cd_catalog.xml

cd_catalog.xml

cd_catalog.xml

cd_catalog.xml

cd_catalog2.xml

Run a command against many files.

xargs allows you to run a command against many files.

Example: Find all file names with an XML value of "EU" and find all YEAR XML nodes in those files (different lines from the EU).

grep ">EU<" *.xml | sed "s/:.*//" | xargs grep "<YEAR>" | head

cd_catalog.xml:    <YEAR>1985</YEAR>

cd_catalog.xml:    <YEAR>1988</YEAR>

cd_catalog.xml:    <YEAR>1982</YEAR>

cd_catalog.xml:    <YEAR>1990</YEAR>

cd_catalog.xml:    <YEAR>1997</YEAR>

cd_catalog.xml:    <YEAR>1998</YEAR>

cd_catalog.xml:    <YEAR>1973</YEAR>

cd_catalog.xml:    <YEAR>1990</YEAR>

cd_catalog.xml:    <YEAR>1996</YEAR>

cd_catalog.xml:    <YEAR>1987</YEAR>

Sort the results

sort allows you to sort your results.

Example: Find all file names with an XML value of "EU" and find all years in those files.

grep ">EU<" *.xml | sed "s/:.*//" | xargs grep "<YEAR>" | sed "s/:.*\">/:/;s/<\/.*//" | sort | head

cd_catalog.xml: <YEAR>1968

cd_catalog.xml: <YEAR>1968

cd_catalog.xml: <YEAR>1968

cd_catalog.xml: <YEAR>1968

cd_catalog.xml: <YEAR>1968

cd_catalog.xml: <YEAR>1971

cd_catalog.xml: <YEAR>1971

cd_catalog.xml: <YEAR>1971

cd_catalog.xml: <YEAR>1971

cd_catalog.xml: <YEAR>1971

Remove duplicates

uniq allows you to remove duplicate results (that are right next to each other). This is often used after sort.

Example: Find all file names with an XML value of "EU" and find all unique sorted years in those files.

grep ">EU<" *.xml | sed "s/:.*//" | xargs grep "<YEAR>" | sed "s/:.*\">/:/;s/<\/.*//" | sort | uniq | head

cd_catalog.xml: <YEAR>1968

cd_catalog.xml: <YEAR>1971

cd_catalog.xml: <YEAR>1973

cd_catalog.xml: <YEAR>1982

cd_catalog.xml: <YEAR>1983

cd_catalog.xml: <YEAR>1985

cd_catalog.xml: <YEAR>1987

cd_catalog.xml: <YEAR>1988

cd_catalog.xml: <YEAR>1990

cd_catalog.xml: <YEAR>1991

Count duplicates

uniq also allows counting duplicates

Example: Find all file names with an XML value of "EU" and count years per file.

grep ">EU<" *.xml | sed "s/:.*//" | xargs grep "<YEAR>" | sed "s/:.*\">/:/;s/<\/.*//" | sort | uniq | sed "s/:.*//" | uniq -c15 cd_catalog.xml

3 cd_catalog2.xml

Copy intermediate results to a file

tee allows you to store the results of a command while still copying the results to the next command

Example: Find all file names with an XML value of "EU", count unique years per file, and store in yearCounts.txt.

grep ">EU<" *.xml | sed "s/:.*//" | xargs grep "<YEAR>" | sed "s/:.*\">/:/;s/<\/.*//" | sort | uniq | sed "s/:.*//" | uniq -c | tee yearCounts.txt15 cd_catalog.xml

3 cd_catalog2.xml

Count the number of results

wc allows you to count the results.

Example: Find all file names with an XML value of "EU", count how many have exactly 3 unique years.

grep ">EU<" *.xml | sed "s/:.*//" | xargs grep "<YEAR>" | sed "s/:.*\">/:/;s/<\/.*//" | sort | uniq | sed "s/:.*//" | uniq -c | tee yearCounts.txt | grep "^ *3 " | sed "s/^ *3 //;s/\.xml$//" | tee 3years.txt | wc -l

1

No comments: