You're viewing a comment by Stuart Marmorstein and its responses.

April 29, 2013, 01:21

hi Peter,

Thanks to you, I can get many tasks done requiring searches through groups of files using wildcards as in:

perl -pi.bak -e "BEGIN{@ARGV=<*.ini>} /77345/ && s/Humble/Kingwood/"

which changes the name of the town, Humble, to Kingwood in a group of .ini files if the zip code is 77345.

Next question for a practical application:

I use a command line program in Win 7 called getmail, which downloads emails from one of my accounts and saves them as .txt files in the form msg*.txt.

I would like to have a Perl one-liner that can read through these files looking for certain text strings that would identify them to me as SPAM.

How could I get a one-liner to take the FILENAME of the offending file and write it to a line in another text file that I could use to delete SPAM emails?

I have done something similar with sgrep, and it seemed to work for a while, but lately some inconsistent results are cropping up.

All the best--Stuart

Comment Responses

May 08, 2013, 17:39

Hi Stuart,

I've been busy so forgive me my late reply.

You can solve this problem with a Perl one-liner but it gets a bit complicated. It's best if you moved to writing Perl programs at this point rather than one-liners.

Does each email gets saved as a separate msg*.txt file? If so you can do the following:

perl -e "
  my @files = <msg*.txt>;
  open my $out, '>', 'spam-emails.txt' or die 'failed opening spam-emails.txt: ' . $!;

  for my $file (@files) {
    my $contents = do {
      open my $fh, '<', $file or die 'error opening ' . $filename . ': ' . $!;
      local $/; <$fh>
    };
    $_ = $contents;
    if (/spam_keyword_1/ && /spam_keyword_2/ && /spam_keyword_3/) {
      print $out qq/$file\n/;
    }
  }
"

What happens here is that we first setup @files array to contain the filenames of all emails in msg*.txt files.
Then we open the 'spam-emails.txt' file that will contain the list of filenames that are spam emails.
Next we loop over the filenames and read the contents of each email into $contents variable.
Then we check if the contents matches spam_keyword_1, spam_keyword_2, spam_keyword_3, and if so we print the filename of the spam email to spam-emails.txt.

You can create your own list of spam_keywords.

Reply To This Comment

(why do I need your e-mail?)

(Your twitter handle, if you have one.)

Type the word "lcd_303": (just to make sure you're a human)

Please preview the comment before submitting to make sure it's OK.