Advanced Examples

Advanced examples for how to use check_site_simple including some tips for tuning.

For brevity some parameters like --url are replaced by ellipses (…) in the examples below.

Crawl a whole site

$ ./ check_site_simple ... --crawl

This will crawl the whole site and alarm if at least one page could not be retrieved successfully or a broken link has been found.

Example Output (text) if ok:

CHECK_SITE OK - 657 pages checked

Example Output (text) if errors were found:

CHECK_SITE CRITICAL - 657 pages checked, 2 pages with error
https://example.com/www/ (ERROR: 404 Not Found resources/price-list_2017_09.pdf)
https://example.com/ (ERROR: 403 Access Denied  internal/access-list.php)

To get rid of the alarm for the internal access-list page you would probably remove the link to that internal page from the site. But if you want to keep this link for whatever reason you can also exclude this area from the check:

$ ./ check_site_simple ... --crawl --ignore="/internal/"

Given the above example this would reduce the number of checked pages plus the errors found by one:

CHECK_SITE CRITICAL - 646 pages checked, 1 page with error
https://example.com/www/ (ERROR: 404 Not Found resources/price-list_2017_09.pdf)

Make sure that something does show up on a site

$ ./check_site_simple ... --crawl --contains="My Company Ltd." --contains="office@my.com"

This will crawl the whole site and check for the existence of your company’s name and contact-data (email address) on each of the pages. Any page not conforming to that standard will be listed.

Make sure that something does not (re)appear on a site

The following will also check that some old company name is not present anymore on your site:

$ ./check_site_simple ...
    --crawl \
    --contains    "My Company Ltd." \
    --no-contains "Old Company Name"

The same syntax can be used for common typos in persons or organizations names, products and the like. On our own page we check for e.g. "plugns" and "check_side".

New phone-numbers

To make sure that old contact data like an former phone-number has been removed from every single page and the new ones show up everywhere use this check:

$ ./ check_site_simple ... \
    --crawl \
    --contains    "My Company Ltd." \
    --no-contains "Old Company Name" \
    --contains    "+43-1-123456" \
    --no-contains "0664-234567"

Since this check will run on a regular base it will assure you that also in the future outdated contact-data does not reappear by accident (out of people habits, from not yet updated snippets or templates and the like).