The Waraas.Com Logo - Click To Go To The Waraas.Com Homepage

The Googlebot Tracker 4000 Is Live!

Thumbnail image for The Googlebot Tracker 4000 Is Live!

By Jon Waraas - First Published: March 24th, 2024

I built a Googlebot crawler tracker :)


Over the last several years, I feel as though I have become too reliant on using others' software for my web work.

I have mostly used WordPress for my websites and exclusively relied on 'over-the-counter' tracking software for data analytics. Such as Clicky.

But how reliable is that data?

So after some thought, I decided to start building my own tracking tools to supplement that data.

One of the first tools I built, The Googlebot Tracker 4000, is a method for tracking Googlebot crawlers on websites. Currently, the tool is designed to capture as much data from the Googlebot as possible for all URLs of a website.

I am currently testing it on this website (Waraas.Com), and have made all of the data public. You can go to The Googlebot Tracker 4000 page to see all of the Googlebot data that has been recorded so far.

Overtime, I will analyze the data to fix any errors on the website, tweak 301's, make website changes, etc.

I will also update my blog with any fun or unique findings.

I Will Be Testing One Thing..

I am one of those SEO'ers who believe in the "Google crawl budget" theory. Essentially, this theory posits that Google allocates a certain 'bandwidth' to its crawlers, meaning a Google crawler will only crawl a limited number of pages/images per month.

The trick is to use your "Google crawl budget" wisely. But how?

That is one of the main things I will be testing with this experiment. I just need more data :)

What Does The Tracker Do?

Currently, the Googlebot tracker utilizes PHP sessions to 'log' the data into a SQL database.

Over the next week or so, I hope to add a function that will "log" everything, include all the data from the website images.

The Code

The first part of the code is the section below:

The first part of code for the Googlebot tracker

The code determines whether the 'user' is a Googlebot. If confirmed, it captures the data and feeds it into the 'logGooglebotVisit()' function, as illustrated in the photo below. Located in the functions.php file.

The first part of code for the Googlebot tracker

And then you can see in the screenshot below, the data is added to a SQL table.

The first part of code for the Googlebot tracker

The code is quite simple, but I will continue to enhance it over time.

Why Don't I Use Logs?

On Facebook, I was essentially asked "why use SQL when I can use logs instead" by Jeff Kee.

I am using SQL instead of logs because it is way easier for me to manage and display the data. I can do a lot more with SQL than I can with logs.

The first part of code for the Googlebot tracker

I've never attempted to display log data via HTML, which would require some time for me to learn. Therefore, I opted for SQL/PHP.

What's Next?

Well, as you can see from the screenshots, I'm having a IP logging issue. The IP address isn't being stored in the database correctly.

I've updated the code and adjusted some of the SQL table settings. The challenge now is waiting for the Googlebots to return so I can test the changes. (Getting the Googlebots to return is a WHOLE other subject hehe)

This week, I want to add 2 new features:

Response Time - I want to add the "load time" of each URL that the Googlebot crawls.

Validate Google's IP - Google themselves recommend conducting a reverse DNS lookup on the Googlebot IP address to verify the authenticity of the Googlebot. I need to add this feature.

In the near future, I hope to add another feature that will track ALL of the URL's that a Googlebot crawls, including the actual image files.

Currently, the Googlebot tracker is focused on identifying missing or bad images. Accessing a URL with a missing image leads to a 404 page, which is also what Google sees, enabling us to track these images. However, tracking for working images (status 200) is not yet available. I plan to add this feature in the future.

While this is currently just an experiment, perhaps later on I could turn it into a WordPress plugin or something similar?

Thank you for reading :) If you have any questions or suggestions at all, please leave a comment below.

Conversation:


No comments yet. Please contribute to the conversation and leave a comment below.

 

Conversation:










This totally free tool will ping your website to Google, Bing & others to give it a little extra boost.
(Results will be emailed after completion.)

Ever since building my first website in 2002, I've been hooked on web development. I now manage my own network of eCommerce/content websites full-time. I'm also building a cabin inside a old ghost town. This is my personal blog, where I discuss web development, SEO, eCommerce, cabin building, and other personal musings.

Recent Comments:

Avi : Was the plugin officially approved in the Wordpress repo ?

Posted on: August 29, 2024

Brett : Very cool to get the back story and will be neat to watch the progress. Hoping eventually to so do something the similar on the west coast of Canada somewhere. Amazing that in 2006 I first found your site for it's myspace page information and how to build PHP site header/footers for resale. How times change hah. Anyways, keep up the great posts, looking forward to the updates.

Posted on: April 11, 2024

Feedburner Image