"Infix" is common to arithmetic and logical formulas and uses parentheses () to group operators and operands as well as define their precedence.
- An Infix-notations sentence is constructed as
- [<field>>/](<Infix-Term1> <BinaryOperator> <Infix-Term2> )[:<weight>]
Infix Notation Terms are
- Terms with an optional postfix unary operator and/or an optional field specification prefix: [<Field Name>/]<Term>[<InfixPostfixTermOp>]
- Valid Infix sentences, grouped with ( and ).
- The prefix Field/ is used to specify that the sentence should be a member of the named field, resp. path. It may contain "wildcards" ("Glob Expression Syntax").
- The prefix Field!/ is used to specify that the contents is not in (exclusive to) the field.
- The prefix Field// is used to specify that the terms is in a common instance of the field (container). The expression FOO//(A and B) is equivalent to (A and:FOO B).
- The optional weight (a floating point number) specifies a weighting in the ranking of hits matching the expression.
A Term is
- A word
- A literal phrase enclosed in " marks.
A word is a series of characters. These may also contain special characters called "
wildcards" evaluated according to the rules of the "
Glob Expression Syntax".
InfixPostfix TermOps| Op | Function |
|---|
| . | Exact match. |
| ~ | Phoenetic match. |
| = | Case Dependant match. |
| :nnn | Weight of nnn (positive of negative) non-zero integer value |
A weight is a non-zero number
- Positive := the larger, the more relevant.
- Negative := the larger, the less relevant.
Terms may be preceded by
! (or
NOT) for unary negation of the sentence statement. We have, additionally, the special unary operator
REDUCE to reduce the set according to the number of its different term hits (REDUCE:0).
Supported Binary Operators| Binary Operators |
|---|
| OR | Union, the set of all elements in either of two sets |
| AND | Intersection, the set of all elements in both sets |
| ANDNOT | Elements in the one set but NOT in the other |
| XOR | Exclusive Union, elements in either but not both |
| ADJ | Matching terms are adjacent to one another (as stored on the file system) |
| NEAR[:num] | matching terms in the sets are within X elements as stored on the file system (file offsets). The value num as integer is bytes (octets). As fraction of 1 its % of the length of record. |
| PEER | Elements in the same (unnamed) final tree leaf node |
| PEERa | PEER after |
| PEERb | PEER before |
| XPEER | Elements exclusive to the same (unnamed) final tree leaf node |
| AND:field | Elements in the same node instance of field |
| BEFORE, AFTER | In fielded records ts like an ordered PEER. |
| BEFORE:field, AFTER:field | With a named field its like an ordered AND:field. |
| BEFORE:num, AFTER:num | like NEAR but ordered |
| FOLLOWS, PRECEDES | Within some ordered elements of one another |
| FAR | Elements a "good distance" away from each other |
| NEAR | Elements "near" one another. |
The following special operators:- A Nor B:= NOT (A OR B)
- A Xnor B:= NOT (A XOR B)
- A Nand B:= NOT (A AND B)
Examples:| Query | Meaning | | ("dog" and "cat") | The set of all records containing both
"dog" AND "cat" |
| "love" and not("war") | The set not("war") contains the complement of "war", namely all records that don't have the term "war". This query, however, effectively is the same as (the more efficient expression) "love" andnot "war". |
| ("dog":3 or "cat") | Search for records containing either "dog" OR "cat" but consider the term "dog" to be 3 times as relevant to the search as the term "cat". |
| (title/"dog" and subject/"cat") | Search for records containing both "dog" within the TITLE field AND "cat" within the SUBJECT field. |
| title/("molecular" near "biology") | Search for records containing "molecular" in the proximity of "biology" in the TITLE field. |
| (abstract/"short subject" and title/"In the hotel") | Search for records containing both the phrase "short subject" within the ABSTRACT field AND the phrase "In the hotel" in the TITLE field. |
| act/(line/(hedgehog)) | Search for the term "hedgehog" within the field LINE within the field ACT. |
| ("Titus"= or "Lartius"=):2 or "Roman" | Search for case dependent "Titus" or "Lartius" (e.g. "Titus" matches "Titus" but not "titus" or "TITUS") and boost weight by a factor of 2 (double scores) and take the union of the set of results for "Roman". This is equivalent to the search expression: ("Titus="):10 or ("Lartius"=):10 or "Roman". |
| !("money" and:line "war") and ("money" and:act "war") | The terms "money" and "war" should occur in the same act (element) but not in the same line (element). |
Note that we can combine field name qualifications to search (partially unknown) structure paths.
In our Shakespeare example (SGML/XML markup of Shakespeare's works by Jon Bosak) we have as paths to LINES where things are said:
- PLAY\ACT\EPILOGUE\SPEECH\LINE
- PLAY\ACT\PROLOGUE\SPEECH\LINE
- PLAY\PROLOGUE\SPEECH\LINE
- PLAY\INDUCT\SCENE\SPEECH\LINE
- PLAY\INDUCT\SPEECH\LINE
- PLAY\ACT\SCENE\SPEECH\LINE
We can then specify field search specifications such as ACT/(SCENE/(LINE/(....))
By combining features one can define some very interesting and powerful query sentences such as: ACT/(SPEECH/("spot" peer "out")) to find records where "spot" and "out" are in the same container that's in a speech within an act.
Complex Attribute Search
In RSS we have some complex attribute fields such as:
<category domain="http://www.fool.com/cusips">MSFT </category>
where we'd like to search in the domain of the category as well as its content.
The field describing the domain is called category@domain (which contains the fool.com URL) while the content is MSFT and defined in the field category. These are technically two different fields. They are, however, related. In IB we have a special neo-field "." (called "dot"). It refers to a special kinds of relation between the content of complex fields and the context of the attributes. It allows us to express search queries relating the two.
Example: <law domain="Bavaria">Bayerische Gesetz zum Schutz der Gesundheit<law>
To search for laws with domain "bavaria" we'd search law@domain/bavaria.
To search for laws about "Gesundheit" we'd search law/Gesundheit.
Now if we'd like to have all the laws in domain "Bavaria" that are about Gesundheit we'd search: (law@domain/Bavaria and:. law/Gesundheit).
Notice the above and:. to mean in the same place.
See also: RPN Queries