Goal
Introduce some command line tools and how that can work together to solve common problems.
Notes
There are many commands included with a *nix system that are useful for research. The commands may have differences on different systems so the following works on Mac OS X.
Read the links to learn more about the tools – they can do much more than is presented here.
There are more efficient ways to achieve the same results as some of these example but the examples were chosen to be a simple introduction that focuses on how the tools can be used together.
Sample files based on https://www.w3schools.com/xml/cd_catalog.xml.
Finding things in files
grep searches files for lines that match a pattern. It uses regular expressions.
Example: Find XML values of "EU"
|
Linking multiple commands together
A command line pipe allows you to pass the results of one command to another command. This allows you to link many simple programs together to complex solutions in the same way that 26 letters can be put together to make millions of words.
Example: Find XML values of "EU" that also are in the N1 segment
|
Check your command by looking at the first few results
head shows only the first 10 results.
Example: Find XML values of "EU" and look at the first few results.
|
Make changes to results
sed allows you to change lines. I use the search and replace feature that uses regular expressions.
Example: Find all file names with an XML value of "EU".
|
Run a command against many files.
xargs allows you to run a command against many files.
Example: Find all file names with an XML value of "EU" and find all YEAR XML nodes in those files (different lines from the EU).
|
Sort the results
sort allows you to sort your results.
Example: Find all file names with an XML value of "EU" and find all years in those files.
|
Remove duplicates
uniq allows you to remove duplicate results (that are right next to each other). This is often used after sort.
Example: Find all file names with an XML value of "EU" and find all unique sorted years in those files.
|
Count duplicates
uniq also allows counting duplicates
Example: Find all file names with an XML value of "EU" and count years per file.
|
Copy intermediate results to a file
tee allows you to store the results of a command while still copying the results to the next command
Example: Find all file names with an XML value of "EU", count unique years per file, and store in yearCounts.txt.
|
Count the number of results
wc allows you to count the results.
Example: Find all file names with an XML value of "EU", count how many have exactly 3 unique years.
grep ">EU<"
*.xml | sed "s/:.*//"
| xargs grep "<YEAR>"
| sed "s/:.*\">/:/;s/<\/.*//"
| sort | uniq | sed "s/:.*//"
| uniq -c | tee yearCounts.txt | grep "^ *3 "
| sed "s/^ *3 //;s/\.xml$//"
| tee 3years.txt | wc -l
1
No comments:
Post a Comment