Logfile Requests Matching a Specific Pattern
I needed to parse Apache logfiles to get a list of IP addresses and dates for requests matching a list of specific patterns, and found that you can do this easily from within the shell:
cat */*access.log.200?.?? | grep -i <pattern> | sed -e \"s/\\"GET.*$//g\"
This reads all logfiles, extracts those lines that match the pattern, and then cuts off each line starting from the HTTP request, leaving only the fields IP, IdentUser (usually empty), AuthUser (also usually empty), and Date/Time.
OS X users can append the pbcopy command, so that the output will not be printed but copied to the clipboard (PasteBoard):
cat */*access.log.200?.?? | grep -i <pattern> | sed -e \"s/\\"GET.*$//g\" | pbcopy
Context:
- The logfiles were distributed among several subdirectories (hence cat */*access…. instead of the simpler cat *access….)
- The filename convention used was <subdomain>.<domain>-access.log.YYYY.MM (e.g. “cow.bull-access.log.2005.08″)
- The patterns I was searching for could be expressed with a simple regular expression
- Sort order of the output didn’t matter (otherwise I would have needed to use a more verbose loop instead of relying on filename globbing)
