Re: Split the FAQ?

---------

Matt Welsh (mdw%merengue@merengue.oit.unc.edu)
Thu, 21 Jul 1994 20:09:27 EDT


I find the following to be a bit more flexible. It will split the
text file only on blank lines, and it uses an upper and lower threshold
for the number of lines to split on. It takes as an argument the file
to split, as well as the name of a file containing the entire header
(along with Archive-name). It will prepend the header with the
appropriate part number, with modified subject line, to each part.
It'll even print a line for your post-faq control file, if you uncomment
a line.

It's not the most efficient beast, but it works. Hack to taste.

--
#!/usr/local/bin/perl

# mkpost.pl, M Welsh <mdw@cs.cornell.edu> # Take three arguments: A file, a header, and an original file (for the # date---useful if generating text from nroff or SGML source). # Split the file into # parts, appending header to each with appropriate # Subject and Archive-Name fields, adding a Last-Modified field for the # mtime of the orig file.

$MAXCOUNT = 1500; # Split here $MINLINES = 100; # Don't split under here

sub printheader { local($thestream,$theheader,$thepart,$totalparts,$modtime)=@_;

open(HEADER,$theheader) || die "Can't open $theheader.";

while (<HEADER>) { if ((/^Subject:\s*(.*)/) && ($totalparts > 1)) { chop $1; print $thestream "Subject: $1 (part $thepart/$totalparts)\n"; } elsif ((/^Archive-name:\s*(.*)/) && ($totalparts > 1)) { chop $1; print $thestream "Archive-name: $1/part$thepart\n"; } elsif ((/^Archive-Name:\s*(.*)/) && ($totalparts > 1)) { chop $1; print $thestream "Archive-Name: $1/part$thepart\n"; } else { print $thestream "$_"; } }

# Print last mod time print $thestream "Last-modified: $modtime\n\n"; close HEADER; }

# Print line for post_faq config file sub printpostfaq { local($thestream,$fname,$partnum)=@_; local($pwd)=`pwd`; chop $pwd;

if ($partnum > 1) { $l = $partnum - 1; print $thestream "$fname.$partnum $pwd/$fname.$partnum . 30 none 2 "; print $thestream "$fname.$l\n"; } else { print $thestream "$fname.$partnum $pwd/$fname.$partnum . 30 none 2 .\n"; } }

$parts = 0; $thispart = 1;

($thefile,$theheader,$origfile)=@ARGV;

open(FILE,$thefile) || die "Can't open $thefile."; if (! -f $origfile) { die "$origfile doesn't exist."; }

@months = ('Jan','Feb','Mar','Apr','May','Jun','Jul', 'Aug','Sep','Oct','Nov','Dec');

($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime((stat($origfile))[10]);

$lastmod = "$mday $months[$mon] $year";

$filecount = 0; while (<FILE>) { $filecount++; } seek(FILE,0,0);

# Pretend that we're doing it, to count actual parts produced # Okay, so sue me. I'm being lazy. $count = 0; $totalcount = 0; $totalparts = 1; while (<FILE>) { $count++; $totalcount++; if ((/^$/) && ($count >= $MAXCOUNT) && (($filecount-$totalcount)>$MINLINES)) { $count = 0; $totalparts++; } } seek(FILE,0,0);

$outfile = "$thefile.$thispart"; open(OUT,">$outfile") || die "Can't open $outfile."; &printheader(OUT,$theheader,$thispart,$totalparts,$lastmod);

# Uncomment for post-faq control line # &printpostfaq(STDOUT,$thefile,$thispart);

$count = 0; $totalcount = 0; while (<FILE>) { $count++; $totalcount++;

# Only split on blank lines if ((/^$/) && ($count >= $MAXCOUNT) && (($filecount-$totalcount)>$MINLINES)) { print OUT "\n---End of part $thispart/$totalparts---\n\n";

# Start new part $count = 0; $thispart++; close(OUT);

$outfile = "$thefile.$thispart"; open(OUT,">$outfile") || die "Can't open $outfile."; &printheader(OUT,$theheader,$thispart,$totalparts,$lastmod); # &printpostfaq(STDOUT,$thefile,$thispart);

print OUT "\n---This is part $thispart/$totalparts---\n\n"; print OUT; } print OUT; }

close FILE; close OUT;



[ Usenet Hypertext FAQ Archive | Search Mail Archive | Authors | Usenet ]
[ 1993 | 1994 | 1995 | 1996 | 1997 ]

---------

faq-admin@landfield.com

© Copyright The Landfield Group, 1997
All rights reserved