Author image
Senior Developer

Apache Solr boost Drupal search results at query-time

Boosting terms in Solr search results produced by the Apache Solr Search module that integrates Solr with Drupal is something we had to do for a project recently. If a user has come to our website from a search engine, we can pick up the terms that they had originally searched for - and then boost any documents containing those terms in our own search pages, regardless of what they search for on our pages. (So for example, a user searches Google for 'Ski Holidays', comes to our site, and anything ski-related items should be more prominent than they would be otherwise.)

Boosting terms (or biasing towards certain fields) at index-time is well documented. It wasn't so obvious how to do this at query-time though, which we need to do, since it depends on terms picked up in a session. Here's how to do it - although if you know of a better way, please do leave a comment below!

  1. Implement hook_apachesolr_modify_query() (which takes two parameters: &$query and $caller) - full code will be in step 3.
  2. Let's say our term for boosting is $term - we want to boost it for each Solr field that it may be found in. We could get a list of all the fields in the Solr index, but we only really want ones that contain text, and only the ones that relate to the Drupal fields, so use the following helper function (this is written for Drupal 7, you will need to adapt it if using D6), replacing MYMODULE with your module's name:
    /**
    * Helper function to map Drupal field names to Solr field names.
    *
    * @param $field_name
    *   The Drupal field name. If omitted, return all field mappings.
    * @param $solr_field_type_filter
    *   An array of the Solr field types that are wanted (e.g. string, text, float,
    *   integer). Solr fields that aren't in this list will not be returned.
    */
    function _MYMODULE_get_solr_field_name($field_name = NULL, $solr_field_type_filter = array()) {
      $map = &drupal_static(__FUNCTION__ . ':' . implode(',', $solr_field_type_filter));  // This uses Drupal 7's caching system
      if (is_null($map)) {
        $indexed_fields = apachesolr_entity_fields('node') + apachesolr_entity_fields('user'); // you may want to restrict this to entities you are searching?
        foreach ($indexed_fields as $index_name => $solr_field) {
          if (isset($solr_field['field']['field_name'])) {
            if (empty($solr_field_type_filter) || in_array($solr_field['index_type'], $solr_field_type_filter)) {
              $map[$solr_field['field']['field_name']] = $index_name;
            }
          }
        }
      }

      if (isset($field_name)) {
        if (isset($map[$field_name])) {
          return $map[$field_name];
        }
      }
      else {
        return $map;
      }
    }
  3. Use the following code to boost $term across all Solr fields, replacing MYMODULE with your module name:
    /**
    * Implements hook_apachesolr_modify_query().
    *
    * Boost terms in searches.
    */
    function MYMODULE_apachesolr_modify_query(&$query, $caller) {
      // @TODO: Replace this with something to set $term to what we want to boost:
      $term = MYMODULE_get_term();

      $boost = 10;  // You may want to change this boost value.
      if (!empty($term) && !empty($boost)) {
        if (!isset($query->params['bq'])) {
          $query->params['bq'] = array();
        }
        $fields = _MYMODULE_get_solr_field_name(NULL, array('text', 'string'));
        // Boost term in all of the available fields to be searched, plus
        // taxonomy_names which seems to have to be handled as a special case.
        foreach ($fields as $f) {
          $query->params['bq'][] = $f . ':' . $term . '^' . $boost;
        }
        // The following don't map to Drupal fields, but other information:
        $query->params['bq'][] = 'entity:' . $term . '^' . $boost;
        $query->params['bq'][] = 'bundle_name:' . $term . '^' . $boost;
        $query->params['bq'][] = 'language:' . $term . '^' . $boost;
        $query->params['bq'][] = 'title:' . $term . '^' . $boost;
        $query->params['bq'][] = 'path_alias:' . $term . '^' . $boost;
        $query->params['bq'][] = 'taxonomy_names:' . $term . '^' . $boost;

        // HTML tags get their own fields too, so boost these:
        $tags_to_boost = array_unique(variable_get('apachesolr_tags_to_index', array(
          'h1' => 'tags_h1',
          'h2' => 'tags_h2_h3',
          'h3' => 'tags_h2_h3',
          'h4' => 'tags_h4_h5_h6',
          'h5' => 'tags_h4_h5_h6',
          'h6' => 'tags_h4_h5_h6',
          'u' => 'tags_inline',
          'b' => 'tags_inline',
          'i' => 'tags_inline',
          'strong' => 'tags_inline',
          'em' => 'tags_inline',
          'a' => 'tags_a'
        )));
        foreach ($tags_to_boost as $tag) {
          $query->params['bq'][] = $tag . ':' . $term . '^' . $boost;
        }
      }
    }
    Let's analyse what's going on in this code then. First, we get hold of the term and the amount to boost it by, and if they are both non-empty, our code gets to work. The bq (Boost Query) parameter will be an array, where each item is a field to boost the term in. Each will look something like this: im_39_field_my_field:my_term^10 - where im_39_field_my_field is the Solr field, and my_term is the term being boosted. Which tells Solr to "boost 'my_term' when it occurs in 'my field' by 10".
    Importantly, we also then add all the Solr fields that don't correspond to Drupal fields, such as title and taxonomy_names, which is a catch-all field for all taxonomy term names that are indexed. Without these, if the term to boost is in the title, or is also a taxonomy term name (highly likely if you tag your documents), it won't get boosted.
  4. Since our $query object was passed in by reference, we don't return anything here - we've modified the query directly

And you're done! You will find now that search results containing $term are higher than they would otherwise be - which got me quite excited :-) Do use the boosting documentation for Solr, as your use case may differ significantly. You could adapt the steps to boost any Solr fields with certain values quite simply, but be aware that if the boost will happen regardless of the user's query, you probably want to boost at index-time instead.


Edit: Added further Solr fields