Looking for a basic and FREE web crawler for smaller sites? Take a look at what Greenflare can do for you!
Greenflare is a SEO web crawler that will offer you some basic information about the health of your website. It is a software package that you download to your computer to use.
It is available as a download for Windows and macOS.
They do also have options to download it as a Python package or just with the source code for more advanced users.
What Greenflare Offers
Greenflare is a basic crawler that offers some great information for identifying issues that may be preventing you from ranking well in the search engines.
Below are some of the things it can help you find.
Status Codes & Broken Links. It will deliver any URLs that have a 4xx or 3xx code. You can then redirect these to have them go to a page that is in fact live.
Headings. It will pull data from all of your pages for your H1 and H2 headings so you can see if any are missing or duplicates.
Meta Data. Take a peak at your meta descriptions to see how you might look in the search results.
How To Install Greenflare
Go to the download Greenflare page here. Choose what package you want to download. I did the macOS bundle, and downloaded it no problem.
But when I did try to open it I got a warning that macOS could not open the dmg file to install it since it was not an “identified” developer.
You may see this for both Windows and maxOS and you can see instructions for overcoming this on the downloads page.
Once installed the crawler will open and look like the below.
How To Use Greenflare
Now that you have it installed you will want to adjust some settings to run the crawl the way you need it.
At the top of the screen you will see 4 tabs: Crawl, Settings, Exclusion and Extractions.
This is the screen where the magic happens. Once you complete your custom settings on the other 3 tabs you will come here and add your URL to be crawled.
IMPORTANT: Be sure to enter your URL exactly as it is – example, if you use www make sure it is in the URL when you add it.
Once you start the crawl. you will be given a popup to store the data from your crawl on your website, so enter your file name and click save – the the crawl will start.
Once the crawl begins you will start seeing the interface populate with data. Once it is done crawling – and it was quick for this website, under a minute – but this is a smaller site.
The settings page offers several settings that you should review.
- Crawler. These settings control the speed of the crawl, by default 5 is usually the recommendation. You can also limit the URLs crawled if you have a huge site. User-Agent can be changed, but Greenflare is recommended.
- HTTP Basic Auth. Enter a username and password if the site is password protected.
- Proxy. A proxy allows you to crawl your website while lessening the risk of your IP address from being blocked because it uses anonymous and random IPs to crawl with, not yours. This is not required but recommended if you do a lot of crawling.
- On-Page. Choose all or choose 1, control what you want to the crawler to crawl from Page Title, Meta Description, H1 or H2
- Links. Run a crawl of just your external links or you can also crawl Canonicals, Pagination or Hreflang.
- Directives. Select from Canonical Tag, Canonical HTTP Header, Meta Robots or X-Robots-Tag
- Robots. Let the crawler know whether to follow the robots.txt rules and whether they should follow blocked redirects or check blocked URLs
- Miscellaneous. Choose whether to crawl Unique Inlinks or Respect Nofollow
This setting allows you to exclude specific URL’s from being crawled. So if you have a shop and blog, maybe you only want to crawl the blog right now as an example.
A more advanced feature that allows you to enter queries or patterns to have Greenlane crawl specific items on the website.
Greenflare was started by developer Benjamin Görler of Germany. As a self-proclaimed technical guy – he just wanted to create something to give back to the community.