The web page for “Expert Search
” in obtained from a link inside the “Advanced Search
” web page. Put the entire query in the text area called “Search Builder
”. Clicking on the “Search
” button allows to view the search results. The “Test the query
” button only allows to display, in the historic link under the Search Builder, the number of items found.
Each search is recorded, an historic of your various queries is saved. You can combine your searches by clicking on the identifier of each query (#n).
From the second query added in the “Historic Search Combinatorics
” field onwards, you have to specify the boolean operator between the queries (OR, AND or NOT). Clicking on the “Combine
” button allows to view the search results.
How to write queries in the search builder?
The application uses the Elasticsearch 5 search engine which offers numerous possibilities: combination of field titles with the boolean or proximity indicators, use of regular expressions, from fuzzy search or relevancy-increasing functionalities.
If the query is written in the builder without specifying a particular field, the search is made on “all fields” of the record. The query is analyzed as a series of terms and operators. A term may be one single word – enamel
– or a sentence, enclosed in double quotes – “enamel paintings
”. The search is then made by testing the presence of all terms and in the listed order, that is the first term then the second one (“term1term2
”).
Warning: It is recommended to avoid the cut and paste of terms or expressions originating from text editors since this operation could alter the double quotes which, misinterpreted by the search engine, will lead to aberrant results.
Use of Field Names
If you wish to search — enamel
or painting
— in the title the query will be:
ti.\*:(enamel
OR painting
)
or simply
ti.\*:(enamel painting
)
since the default operator linking the terms is “OR”.
To search exactly “Smith John
” in the author field, write :
au.\*:”Smith John
”
For further information, see : List of Searchable Fields in Expert Search
Reserved characters
Some characters are used for the processing of queries by the search engine.
The reserved characters are: + - = && || > < ! ( ) { } [ ] ^ " ~ * ? : \ /
If you need to use any of the characters which function as operators in your query itself (and not as operators), then you should escape them with a leading backslash. For instance, to search for (1+1)=2
, you would need to write your query as \(1\+1\)=2
.
Warning : Failing to escape these special characters correctly could lead to a syntax error which prevents your query from running.
Boolean operators
By default, all terms are optional, as long as one term matches. A search for rupestrian art
will find any document that contains one or more of rupestrian
or art
. The default operator is “OR”. Elasticsearch supports the usual operaors “AND”, “OR” et “NOT”. But their use is less simple than it seems because “NOT” takes precendence on “AND”, which takes precedence on “OR”.
It is possible to enclose groups of terms in brackets. Between these terms, the operator is then the default operator, “OR”.
The use of brackets allows to form subqueries. In the following example:
((art
AND prehistoric
) OR (painting
AND prehistoric
) OR prehistoric
)
the term prehistoric is associated with art or painting or with no term.
It is also possible to pull up queries with the “+
” (this term must be present) and “-
” (this term must not be present) operators. All other terms are optional.
With these operators, the preceding query will be written:
art painting
+prehistoric
–contemporary
prehistoric
must be present, contemporary
must not be present and art
or painting
or art painting
are optional.
Proximity searches
While a phrase query (eg “French paintings
“) expects all of the terms in exactly the same order, a proximity query allows the specified words to be further apart or in a different order.
In the following example:
“French paintings
“~3
the terms French
and paintings
can be separated by three terms at most. The distance between terms exceeding 2, the order of terms is not maintained. In the results of the query, there might be documents corresponding, for example, to “Paintings of French masters
“.
Fuzziness
The fuzzy search operator “~
” allows finding similar terms with a maximum of two changes, whether it is insertion, deletion, substitution of a single character, or transposition of two adjoining characters. It operates starting from the third letter of the word.
For example, Bernstein
~ will allow to find records with Bernstejn
.
By default the number of changes made in the word is not more than 2, but it is possible to search addition, deletion or substitution possibilities of a single letter. In this case it will be necessary to specify it, as in the following example:
egyptian
~1 will allow to find égyptien
, egyption
, egyplian
and egyphtians
.
While a search with:
egyptian
~2 will broaden the search by allowing an addition or a substitution of two characters and will allow other terms, as Egypten
, égyptienne
, Egyptens
, Egyptisk
and égyptisant
.
Ranges
Ranges can be specified for date, numeric or string fields. Inclusive ranges are specified with square brackets [min TO max] and exclusive ranges with curly brackets {min TO max}.
For example, in order to search the documents whose dates of publication are before 2000
, write :
py.\* : {* TO 2000
}
Curly and square brackets can be combined:
py.\* : [2000
TO 2008
} will result in the records with date of publication 2000
, 2001
, 2002
, 2003
, 2004
, 2005
, 2006
and 2007
Which can be written with the boolean operators as:
py.\*:(>=2000
AND 2008
)
or with the “+
” and “-
” operators as
py.\* : (+>=2000
+<2008
)
An interval whose one limit is not fixed can be written:
py.\*:>2000
py.\*:>=2000
py.\*:<2000
py.\*:<=2000
for the documents with a date of publication respectively subsequent to 2000
, or greater than or equal to 2000
, prior to 2000
or in the last case lower or equal to 2000
.
Boosting
This operator, called boost
, “^
” allows to make one term more relevant than another. For instance, if we want to find all documents about middle ages
, but you are especially interested in early middle ages
, we will write:
(“early middle ages
“)^4
The search strategy
ti.\*:”middle ages
” OR ti.\*:”early middle ages
“^4 will allow you to see first the records related to early middle ages
.
The default boost value is 1, but can be any positive floating point number. The values greater than 1 enhance the relevancy of the searched term in the results; while the values between 0 and 1 reduce it. Boosts can also be applied to phrases or to groups.
Regular expressions
These are character strings using a definite syntax to describe possible character strings. According to the syntax used, it is possible to repeat zero, once or several times character strings of one word, leading to the definition of a set of words with a close spelling. Regular expressions used in search equations allow to find results despite spelling mistakes and have to be enclosed within slashes, as follows:
au.\*:/joh?n(ath[oa]n
)/
For further information, see: Search by Regular Expressions
Index and Subfields
Each index used in simple, advanced or expert search has been set with subindexes, in order to make the search easier and allow document retrieval despite, for example, input faults in the query.
Subindexes are “fold
”, “rich
”, “stem
” and “raw
”.
fold
separates words of the text, converts them to lowercase and translates each character into its ASCII equivalent.
rich
separates words of the text without further processing.
stem
separates words of the text and applies to them a stemming processing according to Porter algorithm.
raw
doesn’t make any processing.
Each index includes one or several subindexes. “Standard” searches are carried out on all scheduled subindexes. In Elasticsearch syntax, the star indicates that all subindexes are queried.
In the expert search, for example,
ti.\*: subindexes “fold
”, “rich
”, and “stem
” will be queried because they have been set for the “ti” index.
For further information, see: Index and Subindex List
It is possible to make queries by specifying a particular subindex, provided this has been set for the queried subindex. In this case the Elasticsearch syntax will be, for example, for a search with the “rich
” subindex in the title:
ti.rich : pigment
The use of the stem
subindex will allow to retrieve documents with pigmente
, pigments
, pigmented
, pigmentation
or pigmenté
in the title.
To use subindexes with a group of words a search with the exact phrase is required, in order to avoid that the search engine interprets the space between words as a “OR”.
ti.fold : “bronze sculpture
”