Performing queries in .Net with NEST

Home / Web programming / Performing queries in .Net with NEST

First of all, I created a document model in C# named EsOrganisation with some basic fields:

    [ElasticsearchType(Name = "organisation")]
    public class EsOrganisation
    {
        public Guid Id { get; set; }
        public DateTimeOffset CreatedDate { get; set; }
        public DateTimeOffset? UpdatedDate { get; set; }
        public int OrganisationTypeId { get; set; }
        public string OrganisationName { get; set; }
        public List<string> OrganisationAliases { get; set; }
        public List<string> OrganisationKeywords { get; set; }
        public List<int> Products { get; set; }
    }

Then I also created a factory to retrieve the Nest.ElasticClient, to simplify just have in mind that when I call to client.SearchAsync() I have already instantiated and prepared it.

Structured vs Unstructured Search

Structured or Unstructured Search refers as to how are the filters applied, Structured search refers to data like dates, times or numbers which can have a range or an absolute value in the search and the matches are either yes or no, but can’t be partially a match. Strings can also be structured like in a post labels, either you have the label or you don’t. Unstructured search then is about partial matches and that’s where score comes into play to determine the relevancy of the match.

Adding pagination

            var skipAmount = 20;
            var takeAmount = 10;
            var q1 = await client.SearchAsync<EsOrganisation>(s => s
                    .From(skipAmount)
                    .Size(takeAmount)
            );

Filtering by integer fields

            // Search for documents that have a certain productId
            var q2 = await client.SearchAsync<EsOrganisation>(s => s.Size(pageSize)
                    .Query(q => q.Term(c => c.Field(p => p.Products).Value(3)))
            );

            // Search for documents included in an array of productIds (1,2,3,4)
            var q3 = await client.SearchAsync<EsOrganisation>(s => s.Size(pageSize)
                    .Query(q => q.Terms(c => c.Field(p => p.Products).Terms(1, 2, 3, 4)))
            );

            // or
            var myList = new List<int>() {1, 2, 3, 4};
            var q4 = await client.SearchAsync<EsOrganisation>(s => s.Size(pageSize)
                    .Query(q => q.Terms(c => c.Field(p => p.Products).Terms(myList)))
            );

Filtering by dates

            // Date range: year 2017
            var d1 = await client.SearchAsync<EsOrganisation>(sr => sr
                    .Query(q => q.DateRange(r => r
                            .Field(f => f.CreatedDate)
                            .GreaterThanOrEquals(new DateTime(2017, 01, 01))
                            .LessThan(new DateTime(2018, 01, 01))
                    ))
            );

More on date queries.

Filtering strings – Unstructured queries

Unstructured queries allow for partial matches, which is counted into the score to determine who matches better. Match(), Prefix() and MatchPhrasePrefix() are all unstructured queries.

            // Match exact word (one of the searched words or more)
            var t1 = await client.SearchAsync<EsOrganisation>(s => s
                    .Query(q => q.Match(m => m.Field(f => f.OrganisationName)
                            .Query("one two three")))
            );
            
            // starts with, only accepts one value, doesnt work if supplied with more than one word
            var t3 = await client.SearchAsync<EsOrganisation>(s => s
                    .Query(q => q.Prefix(m => m.Field(f => f.OrganisationName)
                        .Value("one")
                        //.Value("one two") <- doesn't work
                        ))
            );

            // exact match, last word can be prefixed
            var t4 = await client.SearchAsync<EsOrganisation>(s => s
                    .Query(q => q.MatchPhrasePrefix(m => m.Field(f => f.OrganisationName)
                        .Query("one two thr")))
            );

            // words can be separated/disordered by amount of changes (slops)
            var t5 = await client.SearchAsync<EsOrganisation>(s => s
                    .Query(q => q.MatchPhrasePrefix(m => m.Field(f => f.OrganisationName)
                        .Slop(5)
                        .Query("three one two")))
            );

            // limit max found (same as Size() but executed earlier, probably can help with performance?)
            var t6 = await client.SearchAsync<EsOrganisation>(s => s
                    .Query(q => q.MatchPhrasePrefix(m => m.Field(f => f.OrganisationName)
                        .MaxExpansions(takeAmount)
                        .Query("one two three")))
            );

Boolean queries

Boolean queries are composed queries in which there are more than one criteria and the sum of such criteria is done with ANDs, ORs and NOTs operators.

When creating Boolean queries we can add filters to it, a filter is essentially the same as a Must() query without adding the results into the score, allowing the score calculation to be quicker and the search to consume less resources. So try to add structured conditions into a filter while unstructured ones into a Must() that can calculate a score.

Operators
&&: AND
||: OR
!: NOT
+: filter. used to set this criteria as filter-type (not to be considered to calculate score)

Using ANDs and ORs inside a Query with operators:

            var s4 = await client.SearchAsync<EsOrganisation>(sr => sr
                    .Query(q => +q.Terms(c => c.Field(p => p.Products).Terms(products)) && (
                                q.Match(m => m.Field(f => f.OrganisationName).Query(query)) ||
                                q.Match(m => m.Field(f => f.OrganisationAliases).Query(query))) && 
                                !q.Match(m => m.Field(f => f.OrganisationKeywords).Query(query))
                    )
            );

More about boolean queries.

Extracting the search filters

Which can be useful when you want to reuse filters or dynamically build a query

            var productFilter = new QueryContainerDescriptor<EsOrganisation>()
                  .Terms(c => c.Field(p => p.Products).Terms(products));
            var matchNameFilter = new QueryContainerDescriptor<EsOrganisation>()
                  .Match(m => m.Field(f => f.OrganisationName).Query(query));
            var matchAliasFilter = new QueryContainerDescriptor<EsOrganisation>()
                  .Match(m => m.Field(f => f.OrganisationName).Query(query));
            var matchKeywordFilter = new QueryContainerDescriptor<EsOrganisation>()
                  .Match(m => m.Field(f => f.OrganisationName).Query(query));

            var s3 = await client.SearchAsync<EsOrganisation>(sr => sr
                  .Query(q => +productFilter && (matchNameFilter || matchAliasFilter) && !matchKeywordFilter)
            );

Some examples with the filters extracted, notice the behaviour of each command:

            // This works, productFilter doesn't affect score and filters
            var b2 = await client.SearchAsync<EsOrganisation>(sr => sr
                    .Query(q => q.Bool(b => b
                        .Must(phrasePrefixNameFilter || phrasePrefixAliasFilter || phrasePrefixKeywordFilter)
                        .Filter(productFilter)
                        ))
            );

            // All three are required as a MUST
            var b3 = await client.SearchAsync<EsOrganisation>(sr => sr
                    .Query(q => q.Bool(b => b
                        // This works as an AND
                        .Must(phrasePrefixNameFilter, phrasePrefixAliasFilter, phrasePrefixKeywordFilter)
                        ))
            );

            // This works as non-exclusive filters just counting for the score (ORs), 
            // if no Minimum was set everything would be included, just sorted by score
            // if more than one match -> more score (actually didn't seem to add more score in my tests but that's the theory)
            var b4 = await client.SearchAsync<EsOrganisation>(sr => sr
                    .Query(q => q.Bool(b => b
                        .Should(phrasePrefixNameFilter, phrasePrefixAliasFilter, phrasePrefixKeywordFilter)
                        .MinimumShouldMatch(1) //match at least one, then sort by relevancy
                        ))
            );

            // If we add a filter it filters without affecting score
            var b4B = await client.SearchAsync<EsOrganisation>(sr => sr
                    .Query(q => q.Bool(b => b
                        .Should(phrasePrefixNameFilter, phrasePrefixAliasFilter, phrasePrefixKeywordFilter)
                        .MinimumShouldMatch(1)
                        .Filter(productFilter)
                        ))
            );

Building queries dynamically

Now let’s imagine that you can’t determine how many filters or conditions you have until run-time, as the query depends on several conditions. While extracting the filters like done before is useful you also need to attach the ANDs, ORs, etc. in a dynamic way, so the symbols (&&, ||, !) wouldn’t help here as you don’t even know how many filters you may be attaching.

To achieve that, let’s make use of the Bool operator plus arrays of filters. Remember the Bool operator allows to set MUSTs, SHOULds and even other bools inside a bool. So we can effectively create a Bool -> Must AND Must (Bool -> (Should OR Should OR Should)).

Step 1: A simple array of filters

Let’s start with an example to attach an array of filters that you can increase or decrease dynamically.
Note: I’m using the filters extracted at point “Extracting the search filters“.

            // Add filters to Array of filters
            var listOfFilters = new QueryContainer[] {phrasePrefixNameFilter, phrasePrefixAliasFilter, phrasePrefixKeywordFilter};

            // Create Bool Query as object, set array of filters to the Should property
            var boolQuery1 = new BoolQuery {
                Name = "boolQuery",
                Should = listOfFilters,
                MinimumShouldMatch = 1,
                Filter = new QueryContainer [] { productFilter }
            };

            // This works!
            var b8 = await client.SearchAsync<EsOrganisation>(sr => sr.Query(q => boolQuery1));

Step 2: A complex array with groups of ANDs and ORs

Let’s take that to another level of complexity, a foreach that will add filters to our query:
Remember: I created a model called EsOrganisation.

            var client = _clientFactory.GetClient();

            var listOfGroups = new List<Func<QueryContainerDescriptor<EsOrganisation>, QueryContainer>>();
            foreach (var rmGroup in groups)
            {
                var filtersList = new List<QueryContainer>();
                foreach (var rmFilter in rmGroup.Filters)
                {
                    // This is a method I call to generate a filter dynamically, just think on the filters above as example of what it generates.
                    var filter = SharedFilters.GetFilter(rmFilter.AggregateName, rmFilter.FieldName, rmFilter.OperationName, rmFilter.Value);
                    filtersList.Add(filter);
                }
                var group = QueryBuilder.BuildGroupAsFuncOf<EsOrganisation>(filtersList.ToArray());
                listOfGroups.Add(group);
            }
            var productFilter = SharedFilters.GetProductsFilter<EsOrganisation>(products);

            var results = await client.SearchAsync<EsOrganisation>(sr => sr
                    .Query(q => q.Bool(b => b
                            .Must(productFilter)
                            .Should(listOfGroups)
                            .MinimumShouldMatch(1)
                    ))
                    // Can be used to load only this field, can be of use to improve performance in the future
                    .Source(s => s.Includes(f => f.Fields(o => o.Id)))
            );
            var orgs = results.Documents;

        //---------------------------------------
        // This is the function that joins the filters into a group of ANDs (which is a Bool Query with an array of MUSTs)
        public static Func<QueryContainerDescriptor<T>, QueryContainer> BuildGroupAsFuncOf<T>(QueryContainer[] filters) where T : class
        {
            return q => q.Bool(bl => new BoolQueryDescriptor<T>().Must(filters));
        }

Some explanation:
If you start looking at the code from the inside to the outside, start at the inner foreach to see that I create a list of QueryContainers to store a list of filters, these filters will act as ANDs inside each group. Just outside that inner foreach the list of filters is added to the list of groups as a group of ANDs.

Once outside the first foreach I generate the main Bool Query, which includes each group inside a Should (ORs) as an array of Bool queries themselves. There is an extra filter I’m adding to the Must property as in my case absolutely all my dynamic queries have at least that one filter, then I set the MinimumShouldMatch to 1 and the Query is built.

Why the List of Func<>? When I created the list of filters as a list of QueryContainers and attached that to a Bool Query that worked. But when I tried to join together an array of Bool queries (and that’s what each group is, a Bool Query with the filters inside the Must) it didn’t seem to like it, I tried different approaches but none did work as the Should() method doesn’t allow a list of Bool queries as a parameter, instead, it allows a Func so I converted my QueryContainer[] into a Func to allow me to set the list of groups into the Should method as a parameter.

As a side note, I also had this issue internally when creating filters and joining them together, so I ended up with this method to help me, it basically converts a BoolQueryDescriptor into a Func:

        public Func<QueryContainerDescriptor<T>, QueryContainer> GetAsFuncOf<T>(BoolQueryDescriptor<T> descriptor) where T : class
        {
            return q => q.Bool(bl => descriptor);
        }

Queries that won’t work

Just some query attempts that won’t work, useful to know what you can’t do:

            // Attempting an OR/AND between queries -> Fails
            var s4 = await client.SearchAsync<EsOrganisation>(sr => sr
                .Query(q => productFilter) // WARNING: This one is overridden by the second, DON'T DO THIS
                .Query(q => phrasePrefixNameFilter || phrasePrefixAliasFilter || phrasePrefixKeywordFilter)
            );

            // Attempting an OR between Fields -> Fails
            var phrasePrefixInAllFields = new QueryContainerDescriptor<EsOrganisation>()
                .MatchPhrasePrefix(m => m
                    .Field(f => f.OrganisationName)
                    .Field(f => f.OrganisationAliases)
                    // Again, this last Field method overrides the two previous ones, so THIS CAN'T BE DONE
                    .Field(f => f.OrganisationKeywords)
                    .Slop(2)
                    .Query(query)
            );
            var s6 = await client.SearchAsync<EsOrganisation>(sr => sr.Query(q => phrasePrefixInAllFields));

            // WARNING: This won't work, second Must overrides the first!!
            var b1 = await client.SearchAsync<EsOrganisation>(sr => sr
                    .Query(q => q.Bool(b => b
                        .Must(productFilter)
                        .Must(phrasePrefixNameFilter || phrasePrefixAliasFilter || phrasePrefixKeywordFilter)
                        ))
            );

            // This doesn't work as last Should overrides previous ones
            var b6 = await client.SearchAsync<EsOrganisation>(sr => sr
                    .Query(q => q.Bool(b => b
                        .Should(phrasePrefixNameFilter)
                        .Should(phrasePrefixAliasFilter)
                        .Should(phrasePrefixKeywordFilter)
                        .MinimumShouldMatch(1)
                        .Filter(productFilter)
                        ))
            );

Conclusion: You can only have one Field, Must, Should or Query method unless you create subqueries.

Boosting a field

When performing unstructured queries, we can determine which fields have more relevancy than the others, just use Boost() to multiply the value of such match.

            var phrasePrefixKeywordFilter = new QueryContainerDescriptor<EsOrganisation>()
                .MatchPhrasePrefix(m => m
                .Boost(3) // make this field three times more important when calculating score
                .Field(f => f.OrganisationKeywords)
                .Slop(2)
                .Query(query)
                );

Deleting documents using Kibana

To delete the entire index just request this by URL:

DELETE index-name/

To delete a document or a group of them, you can perform a query changing the “_search” command for a “_delete_by_query” one:

// So instead of this:
GET index-name/eventArticle/_search/
{
  "query": {
    "match": {
      "longHeadlines": "Hello world"
      }
  }
}

// You request this:
GET index-name/eventArticle/_delete_by_query/
{
  "query": {
    "match": {
      "longHeadlines": "Hello world"
      }
  }
}

More on Delete by Api

Useful links

Leave a Reply

Your email address will not be published. Required fields are marked *