Feb. 08, 2014

Getting the most popular pages from your Apache logfile

An Apache logfile can be huge and hard to read.
Here is a way to get a list of the most visited pages (or files) from an Apache logfile.

In this example, we only want to know the URLs from GET requests. We will use the wonderful Counter which is in Python's Collections


import collections

logfile = open("yourlogfile.log", "r")

clean_log=[]

for line in logfile:
    try:
        # copy the URLS to an empty list.
# We get the part between GET and HTTP clean_log.append(line[line.index("GET")+4:line.index("HTTP")]) except: pass counter = collections.Counter(clean_log) # get the Top 50 most popular URLs for count in counter.most_common(50): print(str(count[1]) + " " + str(count[0])) logfile.close()

Recommended Python Training – DataCamp

For Python training, our top recommendation is DataCamp.

Datacamp provides online interactive courses that combine interactive coding challenges with videos from top instructors in the field.

Datacamp has beginner to advanced Python training that programmers of all levels benefit from.

 



Read more about:
Disclosure of Material Connection: Some of the links in the post above are “affiliate links.” This means if you click on the link and purchase the item, I will receive an affiliate commission. Regardless, PythonForBeginners.com only recommend products or services that we try personally and believe will add value to our readers.