Split access.log file by dates using command line tools
Pure bash, making one pass through the access log:
while read; do
[[ $REPLY =~ \[(..)/(...)/(....): ]]
d=${BASH_REMATCH[1]}
m=${BASH_REMATCH[2]}
y=${BASH_REMATCH[3]}
#printf -v fname "access.apache.%s_%s_%s.log" ${BASH_REMATCH[@]:1:3}
printf -v fname "access.apache.%s_%s_%s.log" $y $m $d
echo "$REPLY" >> $fname
done < access.log
One way using awk
:
awk 'BEGIN {
split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec ", months, " ")
for (a = 1; a <= 12; a++)
m[months[a]] = sprintf("%02d", a)
}
{
split($4,array,"[:/]")
year = array[3]
month = m[array[2]]
print > FILENAME"-"year"_"month".txt"
}' incendiary.ws-2009
This will output files like:
incendiary.ws-2010-2010_04.txt
incendiary.ws-2010-2010_05.txt
incendiary.ws-2010-2010_06.txt
incendiary.ws-2010-2010_07.txt
Against a 150 MB log file, the answer by chepner took 70 seconds on an 3.4 GHz 8 Core Xeon E31270, while this method took 5 seconds.
Original inspiration: "How to split existing apache logfile by month?"