Introduction

The ahcrawler has a command line tool.
It is located in the ./bin/ subdirectory.
With it you can

  • list current profiles
  • (re)index a website profile
  • delete data of a profile
  • flush all data of all profiles

It was written to be used in cronjobs and for manual indexing.

Calling it without parameter it shows a help.

ahcrawler :: cli

Basic rules
The most commands you will need have a structure with 3 parameter blocks

cli.php [action] [for wich data] [and which profile]

You can use the short variant for the parameters or long (which are more readable).

Actions

--action [name of action] or -a [name of action]

Known actions are:

  • list - list all existing profiles
  • index - start crawler to reindex searchindex or ressources
  • update - start crawler to update missed searchindex or ressources
  • empty - remove existing data of a profile
  • flush - drop data for ALL profiles

Data

--data [name] or -d [name]

Valid data items are:

  • searchindex - the database of the webcontent for a website search; this is always the first data item you need to fill!!
  • ressources - the used ressources in your website (links, images, css, js files)
  • search - the entered search terms of your visitors (if you use the search form)
  • all - short for searchindex + ressources

List profiles

With the list action you find out the ids of your profiles.
These ids you will need for the parameter --profile (or -p) in other actions.

Example:

cli.php --action list

(Re-) Create the index of a website

With the reindex action you can delete existing indexed data and start the indexer. This is is the most simple variant to update a profile. I handles both data stores in a single step: it deletes and updates

  • searchindex
  • ressources

The --profile parameter defines the profile to handle.

Example:

cli.php --action reindex --profile 1

Remark: On a shared hosting with a limited execution time you can split actions (empty then index and then update), data ressources (searchindex and ressources) while looping over all profiles.

Create index of a website

With the index action you can start the indexer.
Remark: to delete already indexed data you need to call the "empty" action (see below).

The --profile parameter defines the profile to handle.
The --data parameter is used to tell what to index.

  • searchindex - the database of the webcontent for a website search; this is always the first data item you need to fill!!
  • ressources
  • all (searchindex + ressources)

Example:

cli.php --action index --data all --profile 1

Remark:
If the website was crawled before you may want to delete the data of a single profile first (action empty) - or flush all indexed content of all profiles (action flush).

Update index of a website

With the update action you can start the indexer to check all items that failed in the last run and have an error status.
Repeat the update command after a full index of a website profile only.

The --profile parameter defines the profile to handle.
The --data parameter is used to tell what to index.

  • searchindex
  • ressources

Example:

cli.php --action update --data ressources --profile 1

Empty data of a website profile

With the empty action you can delete all entries of the given profile id. This command initiates a DELETE in the database table(s) for all items with the given profile id.

The --profile parameter defines the profile to handle.
The --data parameter is used to tell what to delete.

  • searchindex
  • ressources
  • all (searchindex + ressources)
  • search - be careful - this you don't want in the most cases
  • full (searchindex + ressources + search) - be careful - this you don't want in the most cases

Example:

cli.php --action empty --data searchindex --profile 1

Flush data of all website profiles

With the flush action you can delete all data of all profiles. This command initiates a DROP TABLE command in the database.
You should use the flush command if you have created a search index and a ressources scan and want to rebuild them from point zero.

The --profile parameter is not needed - dropping tables has impact to all profiles.
The --data parameter is used to tell what to delete.

  • searchindex
  • ressources
  • all (searchindex + ressources)
  • search - be careful - this you don't want in the most cases
  • full (searchindex + ressources + search) - be careful - this you don't want in the most cases

Example:

cli.php --action flush --data all






Copyright © 2015-2020 Axel Hahn
project page: GitHub (en)
Axels Webseite (de)
results will be here