Edit your HTML files with
a one-line perl program
If you maintain a number of HTML documents on a Unix WWW server, you
may sometimes want to make the same change to a number of files. Doing
so by hand in a text editor can be tedious, but one time-saving option
is to edit your files in place with a perl "one-liner". Best of all, you
don't have to be a perl expert to do it.
Warning: Be sure to try this on a dummy copy of your
files before you use it to edit the real thing! Since the editing happens
in place, a mistake can be tricky to undo, even if you use the backup
-i.bak option.
Sections of this page:
Examples
-
-
-
-
- Change the hostname "xyz.rice.edu" to "abc.rice.edu":
-
perl -i.bak -p -e 's/xyz\.rice\.edu/abc.rice.edu/ig' *.html
- Change localhost URLs to remote URLs:
-
perl -i.bak -p \
-e 's#file://localhost/localpath/#http://riceinfo.rice.edu/remotepath/#ig' \
*.html
- Insert a department name at the beginning of every <TITLE>:
-
perl -i.bak -p \
-e 's#<title>#<title>Rice Fooology Dept.: #i' *.html
- Insert a maintainer signature at the end of every file (before
the closing <BODY> tag):
-
perl -i.bak -p \
-e 's#</body>#<p>\n<address>-- Jane Doe (jdoe\@rice.edu) 1999.12.31</address>\n</body>#i' \
*.html
Anatomy of a perl one-line
substitution command
perl -i[.backup-extension] -p -e 's#pat1#pat2#ig' files
- -i[.backup-extension]
- Tells perl to run the command on the named files in-place, i.e.,
using the named files both as input and output. If a backup extension
is provided, the unmodified version of each file will be saved with
the extension appended.
Example: -i.bak
- -p
- Tells perl to assume an input loop around your one-line program and
echo the output.
- -e
- The one-line program follows.
- 's#pat1#pat2#ig'
- The perl "substitution" function. Matches every instance of the pattern
pat1 and replaces it with pat2. The "#" used to delimit
the patterns can be any character that isn't found in pat1 or
pat2. The perl pattern matching used in pat1 is very powerful
and somewhat complex; the main pitfall to remember is that you may need
to escape special characters such as "." with a preceding backslash,
e.g. "xyz\.rice\.edu". The trailing "i" flag means to ignore case when
matching pat1. The trailing "g" flag means to apply the substitution
multiple times on the same line (without the "g" it will only be applied
to the leftmost pattern match on each line).
- files
- The file(s) on which the command should be run. In an HTML context,
you probably want to specify a pattern in the shell to match your HTML
files, taking into account any subdirectories you also want to include.
Examples:
*.html (HTML files in current dir)
*.html blah/*.html (HTML files in current dir and subdir "blah")
*.html */*.html (HTML files in current dir and all subdirs one level deep)
{.,*,*/*,*/*/*}/*.html (HTML files in current dir and all subdirs three levels deep)
For more information
- Pertinent sections of the perl
man page (if unavailable on the Web, type man perl at the
Unix prompt):
- Learning Perl (the llama book) by Randal Schwartz
- Programming Perl (the camel book) by Larry Wall and Randal
Schwartz
Back to Selected World-Wide Web Documentation
-- Prentiss Riddle (riddle@rice.edu)
1996.08.23
|