Edit your HTML files with a one-line perl program

If you maintain a number of HTML documents on a Unix WWW server, you may sometimes want to make the same change to a number of files. Doing so by hand in a text editor can be tedious, but one time-saving option is to edit your files in place with a perl "one-liner". Best of all, you don't have to be a perl expert to do it.

Warning: Be sure to try this on a dummy copy of your files before you use it to edit the real thing! Since the editing happens in place, a mistake can be tricky to undo, even if you use the backup -i.bak option.

Sections of this page:


Examples


        

        
Change the hostname "xyz.rice.edu" to "abc.rice.edu":
perl -i.bak -p -e 's/xyz\.rice\.edu/abc.rice.edu/ig' *.html
Change localhost URLs to remote URLs:
perl -i.bak -p \
-e 's#file://localhost/localpath/#http://riceinfo.rice.edu/remotepath/#ig' \
*.html
Insert a department name at the beginning of every <TITLE>:
perl -i.bak -p \
-e 's#<title>#<title>Rice Fooology Dept.: #i' *.html
Insert a maintainer signature at the end of every file (before the closing <BODY> tag):
perl -i.bak -p \
-e 's#</body>#<p>\n<address>-- Jane Doe (jdoe\@rice.edu) 1999.12.31</address>\n</body>#i' \
*.html


Anatomy of a perl one-line substitution command

perl -i[.backup-extension] -p -e 's#pat1#pat2#ig' files
-i[.backup-extension]
Tells perl to run the command on the named files in-place, i.e., using the named files both as input and output. If a backup extension is provided, the unmodified version of each file will be saved with the extension appended.
Example: -i.bak

-p
Tells perl to assume an input loop around your one-line program and echo the output.

-e
The one-line program follows.

's#pat1#pat2#ig'
The perl "substitution" function. Matches every instance of the pattern pat1 and replaces it with pat2. The "#" used to delimit the patterns can be any character that isn't found in pat1 or pat2. The perl pattern matching used in pat1 is very powerful and somewhat complex; the main pitfall to remember is that you may need to escape special characters such as "." with a preceding backslash, e.g. "xyz\.rice\.edu". The trailing "i" flag means to ignore case when matching pat1. The trailing "g" flag means to apply the substitution multiple times on the same line (without the "g" it will only be applied to the leftmost pattern match on each line).

files
The file(s) on which the command should be run. In an HTML context, you probably want to specify a pattern in the shell to match your HTML files, taking into account any subdirectories you also want to include. Examples:
*.html                  (HTML files in current dir)
*.html blah/*.html      (HTML files in current dir and subdir "blah")
*.html */*.html         (HTML files in current dir and all subdirs one level deep)
{.,*,*/*,*/*/*}/*.html  (HTML files in current dir and all subdirs three levels deep)

For more information


<- Back to Selected World-Wide Web Documentation

-- Prentiss Riddle (riddle@rice.edu) 1996.08.23

 

 

Navigational Links
For assistance with IT services, e-mail problem@rice.edu.
For assistance with IT web pages, e-mail webteam@rice.edu.
This page updated 10/25/00
© 2000 Rice University
To Rice Home Page