Main page background image

Elasticsearch and Ruby on Rails Full Text Search With Autocompletion


Ihor T.
RoR Developer

In this article, we will learn how to use Elasticsearch along with Ruby on Rails by creating a very a simple API that provides search capabilities, including full-text search with autocomplete and filtering. In addition, I would like to dockerize all the services we have so that we don’t spend a lot of time configuring and installing them, but focus more on the development itself.

To summarize, by the end of this article, we will have an application with the following search features:

  • Dockerized Rails application with Elasticsearch service running using docker-compose.
  • Ability to make a full-text query with the autocomplete feature.
  • Ability to filter records by certain criteria.
  • Access to all search functionality through the Rails API.

All code from the article is available in my GitHub repository here.

Now let’s read a little theory in order to better understand what we are doing. There won’t be much, I promise.

What is Elasticsearch?

Elasticsearch is an extremely fast open source JSON based search service. It allows you to store, scan and analyze the necessary data in milliseconds. The service is designed to integrate complex search queries and requirements.

Mapping - the process of determining how a document and its fields are stored and indexed.

Indexing - the act of storing data in Elasticsearch. An Elasticsearch cluster can consist of different indexes, which in turn contain different types.

Analysis process - the process of converting text into tokens or terms that are placed in an inverted index. The analysis is performed by a analyzer, which can be of two types: a built-in analyzer or a custom analyzer defined for each index.

Analyzer - a package of three building modules, each of which modifies the input stream.
The analyzer includes:

  • character filters;
  • tokenizer;
  • token filters;

The document indexing process can be represented as follows:

  1. Character filters. First of all, it goes through one or more character filters. It takes the original text fields and then transforms the value by adding, removing, or changing characters. For example, it can remove HTML markup from the text. The full list of character filters can be found here.

  2. Tokenizer. The parser then separates the text into tokens, usually words. For example, a text of 5 words is split into an array of 5 tokens. An analyzer can only have one tokenizer. The standard tokenizer is applied by default. It breaks text with spaces and also removes most characters like dots, commas, semicolons, etc. You can find a list of all available tokenizers here.

  3. Token filters. Token filters are close to character filters. The main difference is that token filters operate on a stream of tokens, while character filters operate on a stream of characters. There are various token filters. The lowercase token filter is the simplest. A complete list of all available token filters can be found here.

Inverted index. The results of the analysis begin within an inverted index. The purpose of the inverted index is to store a text in a structure that allows quick full-text searches. When we do full-text searches, we actually query an inverted index, not the documents defined when they were indexed.

All text fields have a single inverted index by field. An inverted index includes all unique terms that appear in any document covered by the index.

Let’s look at two sentences. The first is "I love Ruby!" and the second is "I love Elasticsearch!".

In the inverted index, they will be stored like this:

If we search for "Love", we can see that both documents contain this term.

What is Kibana?

Kibana is a tool that mainly allows the visualization of Elasticsearch data. In our case, we will use Kibana to see the effects of our code on Elasticsearch.

Enough theory, now let’s get to the fun part!

Generate a Rails Application

Along with the PostgreSQL database, we will be using Rails in API mode and ruby version 2.6.3. Of course, you can use any version of ruby you want, but I would recommend that you follow my setup otherwise you may run into unexpected problems.

rvm use 2.6.3
rails new elasticsearch_rails --api -T -d postgresql
cd elasticsearch_rails
bundle install

Dockerizing a Rails Application

First of all, we need to add the Dockerfile.dev file to the root of the project.

# Layer 0. Download base ruby image.
FROM ruby:2.6.3-slim

# Layer 1. Updating and installing the necessary software for the Web server. Cleansing to reduce image size.
RUN apt-get update -qq && apt-get install --no-install-recommends -y \
  build-essential libpq-dev && \
  apt-get clean && \
  rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

# Layer 2. Creating environment variables which used further in Dockerfile.
ENV APP_HOME /backend

# Layer 3. Adding config options for bundler.
RUN echo "gem: --no-rdoc --no-ri" > /etc/gemrc

# Layer 4. Creating and specifying the directory in which the application will be placed.
WORKDIR $APP_HOME

# Layer 5. Copying Gemfile and Gemfile.lock.
COPY Gemfile Gemfile.lock ./

# Layer 6. Installing dependencies.
RUN bundle check || bundle install --jobs 20 --retry 5

# Layer 7. Copying full application.
COPY . .

# Layer 8. Make file executable
RUN chmod +x ./dev-docker-entrypoint.sh

# Layer 9. Run migrations
ENTRYPOINT ["./dev-docker-entrypoint.sh"]

# Layer 10. Command to run application.
CMD ["rails", "s", "-p", "3000", "-b", "0.0.0.0"]

Next, we need to add the dev-docker-entrypoint.sh file to the root of the project. We need this so that our database and all migrations are created automatically when we start the project.

#!/bin/sh
set -e

if [ -f tmp/pids/server.pid ]; then
  rm tmp/pids/server.pid
fi

bundle check || bundle install

bundle exec rake db:create
bundle exec rake db:migrate
bundle exec rake db:seed

exec "$@"

And grant this file execution rights otherwise, we may enter into a permissions-related error.

chmod +x dev-docker-entrypoint.sh

Don’t forget to add the .dockerignore file to the root of the project. dockerignore will speed up and simplify compilations by excluding large files from the context that are not used in the compilation.

/.bundle

# Ignore all logfiles and tempfiles.
/log/*
/tmp/*
!/log/.keep
!/tmp/.keep

# Ignore uploaded files in development.
/storage/*
!/storage/.keep
.byebug_history

# Ignore master key for decrypting credentials and more.
/config/master.key
coverage
.idea/*
.dockerignore

We are now ready to add a docker-compose.yml file which contains two services, the first is backend which is our Ruby on Rails application and the second is db which is our Postgres database.

version: '3.4'

x-backend:
  &backend
  build:
    context: .
    dockerfile: Dockerfile.dev
  environment:
    RAILS_ENV: development
    DB_USERNAME: postgres
    DB_PASSWORD: secret
    DB_HOST: db
    DB_PORT: 5432
    DB_NAME: es_db
    SECRET_KEY_BASE: STUB
  stdin_open: true
  tty: true
  volumes:
    - .:/backend:rw
    - bundle_cache:/bundle

services:
  backend:
    <<: *backend
    ports:
      - 3000:3000/tcp
    depends_on:
      - db

  db:
    image: postgres:11.2
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: secret
    restart: always
    volumes:
      - postgres:/var/lib/postgresql/data

volumes:
  bundle_cache:
  postgres:

The last thing we want to do is update our config/database.yml so that our database can be properly connected to our server.

# config/database.yml
default: &default
  adapter: postgresql
  encoding: unicode
  username: <%= ENV["DB_USERNAME"] %>
  password: <%= ENV["DB_PASSWORD"] %>
  host: <%= ENV["DB_HOST"] %>
  port: <%= ENV["DB_PORT"] %>
  pool: <%= ENV["RAILS_MAX_THREADS"] { 5 } %>

development:
  <<: *default
  database: <%= ENV["DB_NAME"] %>_development

test:
  <<: *default
  database: <%= ENV["DB_NAME"] %>_test

production:
  <<: *default
  database: <%= ENV["DB_NAME"] %>_production

Now it’s time to start the project.

docker-compose up -d

Once all services are up and running, you can hit the http://localhost:3000 URL in your browser and see that your application is actually running.

Creating test data

Before integrating Elasticsearch, let’s create some dummy data so we have something to play with. Next, open the bash terminal of the Rails application.

docker-compose exec backend bash

And we can add a database table Article with title and category columns.

rails g model Article title category

We can then update the db/seeds.rb file with test data.

# db/seeds.rb
if ActiveRecord::Base.connection.table_exists? 'articles'
  Article.where(title: 'I love Ruby!', category: 'ruby').first_or_create
  Article.where(title: 'I love Elasticsearch!', category: 'elasicsearch').first_or_create
  Article.where(title: "Why it's always sunny in California?", category: 'other').first_or_create
end

Here in the code above, we will only create 3 entries, but I think that’s enough to keep things simple.
Our Article table looks like this:

title category
I love Ruby! ruby
I love Elasticsearch! elasticsearch
Why it’s always sunny in California? other

The last thing we need to do is restart the Rails application.

docker-compose stop
docker-compose up -d

Adding Elasticsearch and Kibana to docker-compose

Our docker-compose.yml with Kibana and Elasticsearch services should look like this:

version: '3.4'

x-backend:
  &backend
  build:
    context: .
    dockerfile: Dockerfile.dev
  environment:
    RAILS_ENV: development
    DB_USERNAME: postgres
    DB_PASSWORD: secret
    DB_HOST: db
    DB_PORT: 5432
    DB_NAME: es_db
    SECRET_KEY_BASE: STUB
  stdin_open: true
  tty: true
  volumes:
    - .:/backend:rw
    - bundle_cache:/bundle

services:
  backend:
    <<: *backend
    ports:
      - 3000:3000/tcp
    depends_on:
      - db
      - es

  db:
    image: postgres:11.2
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: secret
    restart: always
    volumes:
      - postgres:/var/lib/postgresql/data

  es:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.5.0
    environment:
      - node.name=es
      - cluster.name=cluster
      - discovery.seed_hosts=es
      - bootstrap.memory_lock=true
      - discovery.type=single-node
      - ES_JAVA_OPTS=-Xms512m -Xmx512m
      - http.port=9200
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - es:/usr/share/elasticsearch/data
    restart: always
    ports:
      - 9200:9200

  kibana:
    image: docker.elastic.co/kibana/kibana:7.5.0
    environment:
      - ELASTICSEARCH_HOSTS=http://es:9200
    depends_on:
      - es
    ports:
      - 5601:5601

volumes:
  bundle_cache:
  postgres:
  es:

To test that all services are running, we need to restart docker-compose again.

docker-compose stop
docker-compose up -d

And go to the following links.

Integrating Elasticsearch with Ruby on Rails

To integrate Elasticsearch into a Rails application, we need to add two gems to the Gemfile.

# Gemfile
gem 'elasticsearch-model'
gem 'elasticsearch-rails'

We then want to rebuild the backend service to install our gems.

docker-compose stop
docker-compose build backend

We are now ready to add real functionality to the Article model. For this purpose, we use the so-called concerns.

Create a new file app/models/concerns/searchable.rb and add the following code:

# app/models/concerns/searchable.rb
module Searchable
  extend ActiveSupport::Concern

  included do
    include Elasticsearch::Model
    include Elasticsearch::Model::Callbacks
  end
end

And we include Searchable in the Article model:

# app/models/article.rb
class Article < ApplicationRecord
  include Searchable
end

Create a config/initializers/elasticsearch.rb file so that the Rails application knows how to connect to the Elasticsearch service.

# config/initializers/elasticsearch.rb
Elasticsearch::Model.client = Elasticsearch::Client.new(
  port: ENV.fetch('ES_PORT') { 9200 },
  host: ENV.fetch('ES_HOST') { 'http://es' }
)

At this stage, we reproduce the following steps:

  • Using the Elasticsearch::Model module - we add Elasticsearch integration to the model.
  • With Elasticsearch::Model::Callbacks - we add callbacks. Why is it important? Each time an object is saved, updated, or deleted, the corresponding indexed data is also updated accordingly.

The last thing we need to do is index Article model.

Open the Rails console and run:

docker-compose up -d # start all services in the background
docker-compose exec Rails c # open rails console
--------------------------
Article.import(force: true) # index Article model

force: true will create the index if it doesn’t exist.

To check if an index has been built, open the Kibana dev tools at:

and paste:

GET _cat/indices?v

As you can see, we have created an articles index.

Since the index was created automatically, the default configuration was applied to all fields.

Now do a search. Open Kibana developer tools and paste:

GET articles/_search
{
  "query": {
    "match_all": {}
  }
}

You can find more information about Elasticsearch Query DSL here.

We are interested in the hits attribute of the JSON response, and especially its _source attribute. As you can see, all columns of our model are serialized, such as id, title, category, created_at, and updated_at. But we don’t want fields like id or updated_at to be indexed. We want our search to work only on the title and category fields, allowing us to use full-text search with autocomplete for the title field, as well as filter records by the category field.

We can also make a test request through a Rails application. Open the rails concole and paste:

docker-compose exec Rails c # open rails console
--------------------------
results = Article.search('love')
results.map(&:title) # ["I love Ruby!", "I love Elasticsearch!"]
--------------------------
results = Article.search('ruby')
results.map(&:title) # ["I love Ruby!"]

Creating an Elasticsearch Autocomplete and Filtering

First, we need to delete the previous index. Open rails console:

docker-compose exec Rails c # open rails console
--------------------------
Article.__elasticsearch__.delete_index! # remove index

Then edit the app/models/concerns/searchable.rb file to look like this:

# app/models/concerns/searchable.rb
module Searchable
  extend ActiveSupport::Concern

  included do
    include Elasticsearch::Model
    include Elasticsearch::Model::Callbacks

    # Every time our entry is created, updated, or deleted, we update the index accordingly.
    after_commit on: %i[create update] do
      __elasticsearch__.index_document
    end

    after_commit on: %i[destroy] do
      __elasticsearch__.delete_document
    end

    # We serialize our model's attributes to JSON, including only the title and category fields.
    def as_indexed_json(_options = {})
      as_json(only: %i[title category])
    end

    # Here we define the index configuration
    settings settings_attributes do
      # We apply mappings to the title and category fields.
      mappings dynamic: false do
        # for the title we use our own autocomplete analyzer that we defined below in the settings_attributes method.
        indexes :title, type: :text, analyzer: :autocomplete
        # the category must be of the keyword type since we're only going to use it to filter articles.
        indexes :category, type: :keyword
      end
    end

    def self.search(query, filters)
      # lambda function adds conditions to the search definition.
      set_filters = lambda do |context_type, filter|
        @search_definition[:query][:bool][context_type] |= [filter]
      end

      @search_definition = {
        # we indicate that there should be no more than 5 documents to return.
        size: 5,
        # we define an empty query with the ability to dynamically change the definition
        # Query DSL https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html
        query: {
          bool: {
            must: [],
            should: [],
            filter: []
          }
        }
      }

      # match all documents if the query is blank.
      if query.blank?
        set_filters.call(:must, match_all: {})
      else
        set_filters.call(
          :must,
          match: {
            title: {
              query: query,
              # fuzziness means you can make one typo and still match your document.
              fuzziness: 1
            }
          }
        )
      end

      # the system will return only those documents that pass this filter
      set_filters.call(:filter, term: { category: filters[:category] }) if filters[:category].present?

      # and finally we pass the search query to Elasticsearch.
      __elasticsearch__.search(@search_definition)
    end
  end

  class_methods do
    def settings_attributes
      {
        index: {
          analysis: {
            analyzer: {
              # we define a custom analyzer with the name autocomplete.
              autocomplete: {
                # type should be custom for custom analyzers.
                type: :custom,
                # we use a standard tokenizer.
                tokenizer: :standard,
                # we apply two token filters.
                # autocomplete filter is a custom filter that we defined above.
                # and lowercase is a built-in filter.
                filter: %i[lowercase autocomplete]
              }
            },
            filter: {
              # we define a custom token filter with the name autocomplete.

              # Autocomplete filter is of edge_ngram type. The edge_ngram tokenizer divides the text into smaller parts (grams).
              # For example, the word “ruby” will be split into [“ru”, “rub”, “ruby”].

              # edge_ngram is useful when we need to implement autocomplete functionality. However, the so-called "completion suggester" - is another way to integrate the necessary options.
              autocomplete: {
                type: :edge_ngram,
                min_gram: 2,
                max_gram: 25
              }
            }
          }
        }
      }
    end
  end
end

First lets apply all recent changes by recreating the index. Open the Rails console and run:

docker-compose exec Rails c # open rails console
--------------------------
reload! # if the console was already open, you need to reload it to load the latest code changes
Article.import(force: true) # index Article model

It is now time to test the following query to ensure the project is working correctly. Open the Rails console:

Article.search('love', {}).map(&:title) # ["I love Ruby!", "I love Elasticsearch!"]
Article.search('ruby', {}).map(&:title) # ["I love Ruby!"]

Having made a typo:

Article.search('rube', {}).map(&:title) # ["I love Ruby!"]

As you remember, we have one filter defined. Used to filter articles by category. Let’s see if filtering by category works:

# without filter
Article.search('love', {}).map(&:title) # ["I love Ruby!", "I love Elasticsearch!"]

# with category filter
Article.search('love', { category: 'ruby' }).map(&:title) # ["I love Ruby!"]

Creating a search API

We need to add a controller to handle search API requests, so let’s name it SearchController in app/controllers/api/search_controller.rb.

# app/controllers/api/search_controller.rb
module Api
  class SearchController < ApplicationController
    def search
      results = Article.search(search_params[:q], search_params)

      articles = results.map do |r|
        r['_source'].merge('id': r['_id'])
      end

      render json: { articles: articles }, status: :ok
    end

    private

    def search_params
      params.permit(:q, :category)
    end
  end
end

And, of course, we need to update the config/routes.rb file to route requests to the correct controller and action.

# config/routes.rb
Rails.application.routes.draw do
  namespace :api do
    get :search, controller: :search
  end
end

Restart your Rails application.

docker-compose restart backend

We go to the browser using http://localhost:3000/api/search?q=love&category=ruby You should see the following output.

// GET http://localhost:3000/api/search?q=love&category=ruby
{
  "articles": [
    {
      "title": "I love Ruby!",
      "category": "ruby",
      "id": "1"
    }
  ]
}

That’s all for today, I hope you enjoyed reading my article, see you in the next posts!

Code from the article is here.