Subindex Type
They are four : fold
, stem
, rich
and raw
.
fold
: the text is tokenised (e.g all words are separated), all words are normalised to lower case, and the subindex converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the “Basic Latin” Unicode block) into their ASCII equivalents, if one exists.
rich
: the text is tokenised (e.g all words are separated) and nothing else.
stem
: the text is tokenised (e.g all words are separated) plus application of english stemming with the Porter Algorithm.
raw
: nothing is made on the text
Index | Verbalization | fold | rich | stem | raw |
---|---|---|---|---|---|
au | Author | yes | yes | no | yes |
ti | Title | yes | yes | yes | no |
co | Conference | yes | yes | no | yes |
ex | Exhibition | yes | yes | no | yes |
jo | Journal | yes | yes | no | yes |
ab | Abstract | yes | yes | yes | no |
kw | Keyword | yes | yes | yes | yes |
dt | Document Type | no | yes | no | yes |
py | Publication Year | no | no | no | yes |
la | Language | yes | yes | no | yes |
is | ISBN/ISSN | yes | yes | no | yes |
id | Identifier | yes | yes | no | yes |
pu | Publisher | yes | yes | no | yes |
pc | Publication Country | yes | yes | no | yes |