Full DART Search Syntax Guide

IN THIS ARTICLE:

This article contains confidential information, which may include trade secrets and other proprietary information. The reader acknowledges that this information has been developed by Lighthouse Global as valuable trade secrets. All information contained herein shall remain the exclusive property of Lighthouse Global and shall be disclosed only to persons who have a need to know. The recipient agrees not to copy or reproduce in any form any information supplied herein without prior written permission from an authorized representative of Lighthouse Global. The recipient further agrees to provide security for this document to a reasonable degree so that unauthorized disclosure is prevented.

Words and Phrases / Exact Match

By default, DART search matches the exact text. Searching for a word or a phrase returns exact matches of that word or phrase. Different forms of the same word or phrase, spelling corrections/variations, and conceptually related text are not matched. For example:


advertise => Matches the exact text "advertise" within a document. Does not match text such as "advertising," "advertised," "advertisement," "marketing," "sales," etc.


world health organization => Matches the exact text "world health organization" within a document. Does not match text such as "worldwide health," "healthy world," "organization world health," "world health center," "world health org," "W.H.O.," etc.


Phrases do not need to be inside quotation marks. Running a phrase such as  world united does not return documents that simply contain both words or that contain those two words in the reverse order; it returns only those documents that contain the exact phrase "world united."
DART search syntax is case insensitive.


WHO => Matches text such as "WHO," "who," or "Who" within a document.


who => Matches text such as "WHO," "who," or "Who" within a document.

Default Stream

Words and phrases run over the "default stream," which is the designated set of document fields over which all search terms run by default in DART. The default stream contains the body of the document as well a set of key metadata fields: email_subject, email_from, email_to, email_cc, email_bcc, attendees, attach_names, author, file_name, doc_subject, and title. All other metadata fields (such as file_path, doc_type, file_extension, custodian, etc.) are not included in the default stream.
You may request customized changes to the default stream for your project.

Booleans

Boolean operators connect and define relationships among the elements in your search terms. (Elements are defined as individual words, phrases, terms, conditions, grouped combinations, etc. within a search term.) The three basic Boolean operators are AND, OR, and NOT. Using ORbroadens your search results, while using AND or NOT narrows your search results.

Syntax Description Example search term(s)
OR Returns documents that satisfy either or both of the elements on either side of the operator.

Alternative syntax: brackets (see below)
budget OR cost => Returns documents that contain just budget, just cost, or both budget and cost anywhere in the document.

budget OR budgets OR budgeted OR budgeting => Returns documents that contain one or more of the following words anywhere in the document: budgetbudgetsbudgetedbudgeting.
AND Returns documents that satisfy both elements on either side of the operator. budget AND cost => Returns documents that contain both budget and cost anywhere in the document.

budget AND cost AND proposal => Returns documents that contain budget and cost and proposal. Does not return documents that contain only one or two of these words.
NOT Returns documents that satisfy the element on the left of  NOT and that do not satisfy the element on the right of NOT.

May also be used without an element on the left; this usage returns all documents that do not satisfy the element on the right of  NOT. Note: Using NOT without an element on the left can be computationally expensive. It can result in slow searches that drain system resources.

Alternative syntax:  AND NOT
budget NOT proposal => Returns documents that contain budget and that do not contain proposal. Does not return documents that contain both budget and proposal (or that contain proposal and not budget.

budget AND NOT proposal => Alternative syntax; returns the same results as example above.

NOT proposal => Returns all documents that do not contain proposal. Note: Using NOT in this way can be computationally expensive.)

Brackets (alternative syntax for OR)

Brackets allow you to specify the exact word variation you want to search for in a way that is more compact and less redundant than using Boolean ORs. Using bracket syntax, you can embed the variation directly within the word, and you can separate lists of variants by commas rather than ORs.
Syntax Description Example search term(s)
{ } Contains list of optional variants.

Alternative syntax:  OR
bank{s} => Returns the same documents as the search term bank OR banks. Does not match banked or banking.

colo{u}r{s} => Returns the same documents as the search term color OR colors OR colour OR colours. Does not match colo or colos or coloring or colourful.

{il}legal => Returns the same documents as the search term as legal OR illegal. Does not match legally or illegally.
[ ] Contains list of mandatory variants.

Alternative syntax:  OR
sav[e, es, ed, ing] => Returns the same documents as the search term save OR saves OR saved OR saving. Does not match sav or savings.gr[a, e]y => Returns the same documents as the search term gray OR grey. Does not match gry or graey.
[budget, cost, proposal] => Returns the same documents as the search term budget OR cost OR proposal. Does not match budgets or proposals.
Curly and square brackets are also effective when used in combination. sav[e{s, d}, ing{s}] => Returns the same documents as the search term save OR saves OR saved OR saving OR savings. Does not match sav.

{ re } buil[ d, t ] => Returns the same documents as the search term  build OR built OR rebuild OR rebuilt. Does not match  buil or  rebuilding.

[budget { s } , cost { s } , proposal { s }] => Returns the same documents as the search term  budget OR budgets OR cost OR costs OR proposal OR proposals. Does not match  budgeting or  costly.

Brackets vs. OR

When Boolean OR is used instead of brackets, search terms can quickly become long and difficult to read. Brackets are a faster, easier, and more reader-friendly way to add precise variation to a word or concept within a search term:
[ad{s, vertisement{s}}, commercial{s}, market segment{s}]
versus
ad OR ads OR advertisement OR advertisements OR commercial OR commercials OR market segment OR market segments

Optional Space

Curly brackets can also be used to represent an optional space or optional special character/symbol (such as a hyphen) inside a word. For example, the search term  e{ }mail matches all of the following strings of text:

  • email
  • e-mail
  • e mail

Wildcards and Stemming

A wildcard operator is a special symbol that matches one or more other characters in the text. Stemming identifies the predictable grammatical variations associated with a root word. Wildcards and stemming allow you to add variation to a word in a way that is faster and easier than using brackets or Boolean OR. However, wildcards and stemming do not let you specify or control the exact variation that is or is not captured, and they may capture unanticipated and/or undesired variation.

Syntax

Description Example search term(s)
* Unlimited wildcard: Matches zero or more characters. May be used at the beginning or end of a word or embedded within a word.

Note: Using the unlimited wildcard at the beginning or embedded within a word can be very computationally expensive. It can result in slow searches that drain system resources. It is recommended that this wildcard be used only at the end of a word. It should also be avoided at the end of words that consist of one, two, or even three letters, as this is also very computationally expensive.
bank* => Matches bankbanksbankedbankingbankersbankruptbankruptciesbankablebankofamerica.combank2000Bankwitz, etc.

cost* => Matches costcostscostedcostingcostlycostarcostumesCostcoCostanzaCostellocost100kCosta Rica, etc.
? Single-character wildcard: Matches exactly one character (a letter or a number). May be used at the beginning or end of a word, or embedded within a word.

More than one single-character wildcard in a row may be used to specify the exact number of characters to be matched. This wildcard may also be used in curly brackets to match either no characters or exactly one character.
s??n => Matches soonseenshun, etc. Does not match shining, sin, or sn.

?diaz => Matches adiazjdiazrdiaz, etc. Does not match diaz or ramondiaz.
{?}{?}johns[o, e]n* => Matches johnsonajohnsonabjohnsencajohnson1rjohnsens25johnson25th, etc. Does not match bobjohnson.
+ Digit wildcard: Matches exactly one digit. May be used at the beginning or end of a word, or embedded within a word.

More than one digit wildcard in a row may be used to specify the exact number of characters to be matched. This wildcard may also be enclosed within curly brackets to match either no characters or exactly one character.
1++ => Matches 100122159, etc. Does not match 1 or 1001.

m{ }robert{+} => Matches mrobertm.robertm-robert1m_robert5, etc. Does not match mrobert10 or m.roberts.

en{ }+++{+}{+} => Matches en-123EN1234EN_12345, etc. Does not match en12 or EN-1234a.
~ Stemming: Matches morphological variations of a root word, including the root word itself. Can only be used at the end of a word.

Stemming uses an algorithm that predicts morphological variations of a root word. The algorithm is good, but it is not perfect. Stemming may not always include all desired morphological variations of a word. For example, irregular word forms may be missed, or noun forms may not be included for words deemed to be a verb and vice versa.

Use the Term Expander in DSR to see which variations are included when adding stemming to a particular word.
bank~ => Matches bankbanksbankedbanking, and bankings.

banking~ => Matches bankbanksbankedbanking, and bankings.

cost~ => Matches costcostscostedcostingcostly, and costful.

costs~ => Matches costcostscostedcostingcostly, and costful.

Exercising Caution with Wildcards

It is important to exercise caution and use your discretion with wildcards, especially with the unlimited wildcard. The unlimited wildcard presents the highest risk of matching unanticipated and/or undesired variation, as well as the highest risks related to the computational expense. In situations where you need to use a wildcard operator at the beginning of a word, embedded within a word, or at the end of a word that consists of three or fewer letters, consider using the single-character or digit wildcard instead of the unlimited wildcard.

Proximity (Basic)

Proximity operators allow you to specify the number of tokens that may occur between two or more elements in your search term.

Tokens

The search index defines a token as a word containing letters and/or numbers. Tokens are separated from each other by white space, and the search index defines all non-alphanumeric characters (punctuation, symbols, special characters, etc.) as white space. The token count can be different from the word count within a string of text. For example, the following string of text contains 19 words:


"Hi! Please email this to John at jmartin@enron.com (he's out of the office now, but he'll be back soon)."


The search index tokenizes this text into 23 tokens for search:


"Hi Please email this to John at jmartin enron com he s out of the office now but he ll be back soon"

Syntax Description Example search term(s)
w/<n> Bi-directional proximity: Matches text where element A occurs within N or fewer tokens of element B, in either direction.
Alternative Syntax:  /<n>
[budget*, cost*] w/10 proposal* Matches text such as:
  • These proposals will cost too much.
  • The budget is not finalized. We are waiting for the proposal to be approved.
    [budget*, cost*] /10 proposal* => Same search term as above with alternative syntax
    Adrian w/0 Diaz => Matches Adrian Diaz or Diaz, Adrian. Does not match Adrian R. Diaz.
f/<n> Uni-directional proximity: Matches text where element A is followed by element B, within N or fewer tokens. [blue, white] f/3 card{s} => Matches text such as The blue and green cards are offered to preferred customers.

Does not match  I found the lost card under the white chair.
window(size = <n>, ) Window proximity: Matches text where the specified elements occur within a window of N or fewer tokens. If you do not specify a window number, it is 100.
Alternative syntax:  w(size = <n>, )
window(size = 10, account{s}, status, David f/1 Lee) => Matches text such as What is the status of David M. Lee's account?
w(size = 10, account{s}, status, David f/1 Lee) => Same search term as above with alternative syntax.

Additional Considerations for Proximity

Token count:

For bi-directional and uni-directional proximity, zero to N tokens may occur between the elements. The elements in the term are not included in the token count. The search term  approv* /5 proposal* matches the following text:
"The proposal was reviewed by Maria for approval."


For window proximity, the elements are included in the token count. The search term  w(size=5, approv, proposal*) allows for only three additional words in the window of five words. It does not match the text above, and it does not match the following text:


"The project has been approved by marketing, and the proposal for next steps is ready."

Bi-directional expansions:

When a search term contains two or more bi-directional proximity operators, the search does not match every possible proximity combination within a certain window size. For example, the search term  budget w/10 cost* w/10 proposal* results in four combinations, or expansions:

  • budget* f/10 cost* f/10 proposal*
  • proposal* f/10 budget* f/10 cost*
  • cost* f/10 budget* f/10 proposal*
  • proposal* f/10 cost* f/10 budget*

The bi-directional operators in this search term are treated from left to right, resulting in the above set of four expansions (for more information on this topic, see Grouping).


To match every possible proximity combination, you should use the window operator. For example, the following text is matched by   w(size=20, budget, cost*, proposal*) but is not matched by budget* w/10 cost* w/10 proposal*:


"The cost of this proposal is not within the budget."


"The budget is low, and the costs associated with these materials are high. Please account for this in your proposal."

Different proximity sizes within the same search term:

While the window operator is a better option when you want to match every possible proximity combination within a certain window size, the bi-directional and uni-directional proximity operators are better options when you want to be able to specify the proximities between particular elements in a given search term. Using different proximities within a search term can help you model and target relationships between words. For example:


meet* w/5 Anderson w/50 Baker f/2 account*


The search term above captures language related to meetings and "Anderson" when they have a closer linguistic relationship, perhaps within the same clause. It captures "Baker" in more of a contextual relationship with the Anderson meeting, perhaps in the same paragraph. It ensures that "Baker" occurs before "account," likely as an adjective, and it allows for a few words to occur between "Baker" and "account" to capture possible variations like "the Baker and Harper accounts."

Metadata

The metadata operator allows you to search outside of the default stream and allows you to target content in a specific metadata field. You may use the metadata field's long name or short name in the syntax. For example, the search terms  m/email_from: John and m/from: John both match "John" in the email_from field. For a list of all the field names in DART, go to File in the DSR menu and choose Metadata Info.

Syntax Description Example search term(s)
m/<field> : Returns documents where the specified value is contained in the specified field. m/title: budget => Returns documents that contain the text budget in the title field.
Returns documents that contain no text other than  budget in the title field, and also returns documents that contain additional text in the title field, such as November budget or monthly budget meeting.
m/<field> = Returns documents where the specified value is equal to the value in the specified field. m/title = budget => Returns documents where the full text in the title field is budget.
Does not return documents that contain additional text in the title field, such as  November budget or monthly budget meeting.
m/<field> > Returns documents where the specified value is greater than the value in the specified field. m/text_size > 750 => Returns documents where the text_size is larger than 750 bytes.
m/<field> >= Returns documents where the specified value is greater than or equal to the value in the specified field. m/date_sent >= 8/1/2005 => Returns documents where the date_sent is on or after Aug 1, 2005.
m/<field> < Returns documents where the specified value is less than the value in the specified field.
Also returns documents where the field is empty.
m/date_last_mod < 1/1/2012 => Returns documents where the date_last_mod is before January 1, 2012.
Also returns documents where the date_last_mod field is empty.
m/<field> <= Returns documents where the specified value is less than or equal to the value in the specified field.

Also returns documents where the field is empty.
m/family_count <= 5 => Returns documents where the family_count value is 0-5.

Also returns documents where the family_count field is empty.
m/<field> >< Returns documents where the specified value falls between A and B in the specified field. This includes A and B themselves. m/date_created >< 1/1/2010 and 12/31/2011 => Returns documents where the date_created is on or after Jan 1, 2010 and is on or before Dec 31, 2011.
m/<field> = empty Returns documents where the specified field is empty. Empty is defined as: the field is null, the field is empty, or the field contains only whitespace (spaces or tabs). m/file_name = empty => Returns documents where the file_name field is empty.
m/<field> != empty Returns documents where the specified field is not empty. Empty is defined as: the field is null, the field is empty, or the field contains only whitespace (spaces or tabs). m/author != empty => Returns documents where the author field is not empty.
m/body Matches text in the body of the document, but does not match text in any metadata fields (including metadata fields in the default stream and outside the default stream). m/body: outlook => Returns documents that contain the text outlook in the body. Does not match text in the metadata fields.
m/notbody Matches text in all metadata fields (including metadata fields in the default stream and outside the default stream), but does not match text in the body of the document. m/notbody: outlook => Returns documents that contain the text outlook in any metadata field. Does not match text in the document body.
m/all Matches text in the body of the document as well as all in any metadata fields (including metadata fields in the default stream and outside the default stream). m/all: outlook => Returns documents that contain the text outlook anywhere. Matches text in the document body and all metadata fields.

Offset Metadata

Offset metadata is a special kind of metadata that classifies key areas of text within the body of a document. To see the offset metadata highlighted on a document, press the Show Highlighting button above the document and choose from the list, or right-click on the document, select Show Highlighting, and choose from the list (available only in Extracted Text Viewer).
Syntax Description Example search term(s)
m/eheader Matches text within the email headers present in the body of the document. This includes: to, from, cc, bcc, date, and subject.Does not match text in email metadata fields (the email_from field, the email_to field, the email_subject field, etc.).Use @ in the syntax to target specific fields within the email headers: from, to, cc, bcc, date, and subj.

Alternative syntax:  m/email_header

m/eheader: Harrison => Returns documents that contain the text Harrison in any email header present in the body of the document. Does not match text in the metadata fields.m/email_header: Harrison => Same search term as above with alternative syntax.m/email_header@from: Harrison => Returns documents that contain the text Harrison in the from field in any email header present in the body of the document. Does not match text in the metadata fields.
m/ebody Matches text within the email body.
The email body is defined as the text in the body of an email that is not part of the email headers. An email is defined as a document that a) contains at least one email header, and b) has an email header located at the top of the document.
Alternative syntax:  m/email_body
m/ebody: Rodriguez => Returns documents that contain the text Rodriguez in the email body. Does not match text within the email headers.
m/email_body: Rodriguez => Same search term as above with alternative syntax.
m/repeated Matches text within repeated content.
Repeated content is defined as exact strings of text that occur across a large number of documents in the corpus (thresholds may vary).
Text identified as repeated content often includes: email footers, email signatures, disclaimer language (often in email footers), boilerplate language (often in contracts), etc.
Alternative syntax:  m/repeated_content
m/repeated: confidential => Returns documents that contain the text confidential within the repeated content.
m/repeated_content: confidential => Same search term as above with alternative syntax.
m/bodytrim Matches text within the trimmed body.
The trimmed body is defined as the text in the document body that is not part of the email headers or repeated content.
Alternative syntax:  m/body_trimmed
m/bodytrim: Nicolas => Returns documents that contain the text Nicolas in the trimmed body of an email. Does not match text within the email headers or repeated content.
m/body_trimmed: Nicolas => Same search term as above with alternative syntax.

Email Segment offset metadata

There is a fifth type of offset metadata for email segments, which combines aspects of all four offset metadata types listed above into a single search operator. The syntax for email segment search is detailed in a separate guide: “ Email Segment Sub-Streams and Syntax.”

Quotation Marks

Syntax Description Example search term(s)
" " Query text enclosed in quotation marks is treated as exact text, including any syntax that is present. All symbols/special characters enclosed in quotes are treated as whitespace. "forget me not my love" => Returns documents that contain the phrase forget me not my love.
versus
forget me not my love => Returns documents that contain the phrase forget me and that do not contain the phrase my love.
m/title = "empty" => Returns documents where the full text in the title field is empty.
versus
m/title = empty => Returns documents where the title field is empty.

Quotation marks are helpful when you want to search for a word that without quotation marks would act as a search operator, such as OR, AND, NOT, or empty. Quotation marks can also be helpful if there are symbols/special characters in your search term that are part of DART syntax, but you want the search term to run as an exact text match. DART syntax no longer works as search syntax when enclosed in quotation marks; it is treated as whitespace instead. For example, consider the following search term enclosed in quotation marks:  "review w/10 [account*, deal{s}]"

This search term matches the following text: 

"review w 10 account deal s"

Previously Executed Searches

The History tab in DSR records a temporary history of the search terms you have executed in the selected search scope. The history persists for seven days (or you may delete history items at any time). Each search term saved in the history is assigned a number. You can reference these history numbers in DART syntax, which allows you to represent a previously executed search term by a number rather than rewriting the query text into a new search term.

Syntax Description Example search term(s)
#<n> References a previously executed search from the History tab in DSR. This can be run on its own, or it can be used as part of another search term.
Note: This syntax is specific to DSR search, and it cannot be used in Lace Jobs.
tax{es} /10 fil[e, ing*] NOT #14 => #14 represents the 14th search term in the DSR History tab.
#line<n> References a line in the Execute Multiple Queries tab in DSR. This can be run on its own, or it can be used as part of another search term.
Note: This syntax is specific to DSR search, and it cannot be used in Lace Jobs.
[budget, cost*, proposal*] review w/5 #line1
w(size=50, review, board*,* #line1)
=>For purposes of this example, imagine there are three search terms in the Execute Multiple Queries tab. #line1 references the search term on the first line in that tab.

NOTIN

NOTIN allows you to run a search term and make particular exclusions to that search term at the level of individual hits within documents. (Boolean NOT, by contrast, makes exclusions at the document level: any documents containing the element on the right side of NOT is excluded from the results.) NOTIN disregards only the specific hits where the element on the left side of NOTIN occurs within the pattern or context on the right side of NOTIN. If the element on the left side of NOTIN occurs in any other pattern or context anywhere in a document, the document is returned by the search term. See the examples in the table below.
Syntax Description Example search term(s)
NOTIN Returns documents when the element on the left side of  NOTIN occurs in any pattern or context other than the context described by the element on the right side of NOTIN.
Note: NOTIN can be a computationally expensive operation. Therefore, only one NOTIN operator is allowed per search term.
bank{s} NOTIN Bank of America => Matches the word bank or banks, except when the word occurs in the phrase Bank of America.
Davidson NOTIN [Laura, Rob{ert}] /1 Davidson => Matches the word Davidson, except when the word occurs within text hit by the search term [Laura, Rob{ert}] /1 Davidson, such as Laura A. Davidson or Davidson, Rob.
attorney{s} w/5 communicatNOTIN attorney client communication => Matches text hit by the search term attorney{s} /5 communicat*, except when any of that matched text is part of the phrase attorney client communication.

Proximity may be used on either side of NOTIN: (communicat* w/10 confidential*) NOTIN w(size = 50, [e{ }mail , communication, message], [privileged, confidential], [intended, error, indicated, designated])

Grouping

Grouping is the use of parentheses to resolve the search ambiguity that can exist in complex search terms.

Syntax Description Example search term(s)
( ) Dictates which actions are performed first. Without grouping, the default operator precedence is applied (see below). (budget* AND cost*) OR proposal* => Runs as:
  • budget* AND cost*
  • proposal*
budget* AND (cost* OR proposal*) => Runs as:
  • budget* AND cost*
  • budget* AND proposal*

Operator Precedence

The default precedence for DART operator syntax is:


1. Grouping (parentheses)
2. OR, brackets
4. Uni-directional Proximity
3. Bi-directional Proximity (smaller first, or if the same size, left to right)
5. AND
6. NOT
7. NOTIN

Additional Considerations for Grouping

Metadata Operator:

Everything that follows the metadata operator is included in the scope of the  m/ operator, unless grouping is used. For example:

  • m/custodian_all: Meyers NOT Lee => Returns documents that contain "Meyers" in the custodian_all field, and that do not also contain "Lee" in the same field.

versus

  • (m/custodian_all: Meyers) NOT Lee => Returns documents that contain "Meyers" in the custodian_all field, and that do not also contain "Lee" anywhere else in the document (anywhere in the default stream, to be exact).
  • m/repeated: e{ }mail AND privileged => Returns documents that contain both "email" and "privileged" within the repeated content in the body of the document.

versus

  • (m/repeated: e{ }mail) AND privileged => Returns documents that contain the word "email" within the repeated content in the body of the document, and that also contain the word "privileged" anywhere else in the document (anywhere in the default stream, to be exact).

Bi-directional Proximity:

When a search term contains two bi-directional proximity operators, the operator with smaller proximity takes precedence:
investigat* w/50 close~ w/5 account* runs as investigat* w/50 (close~ w/5 account) =>

  • investigat* f/50 close~ f/5 account*
  • close~ f/5 account* f/50 investigat*
  • investigat* f/50 account* f/5 close~
  • account* f/5 close~ f/50 investigat*


To change this operator precedence, use grouping:  (investigat* w/50 close~) w/5 account* =>

  • investigat* f/50 close~ f/5 account*
  • account* f/5 investigat* f/50 close
  • close~ f/50 investigat* f/5 account*
  • account* f/5 close~ f/50 investigat*


When a search term contains two bi-directional proximity operators of the same size, the operator on the left takes precedence:
budget* w/10 cost* w/10 proposal* runs as (budget* w/10 cost*) w/10 proposal* =>

  • budget* f/10 cost* f/10 proposal*
  • proposal* f/10 budget* f/10 cost*
  • cost* f/10 budget* f/10 proposal*
  • proposal* f/10 cost* f/10 budget*


To change this operator precedence, use grouping:  budget* w/10 (cost* w/10 proposal*) =>

  • budget* f/10 cost* f/10 proposal*
  • cost* f/10 proposal* f/10 budget*
  • budget*f/10 proposal* f/10 cost*
  • proposal* f/10 cost* f/10 budget*


If more than two bi-directional operators are used in the same search term, grouping must be used to specify the desired operator precedence.

Proximity and Booleans:

When combining proximity and Boolean operators in the same search term, some grouping combinations are not allowed.

Proximity + AND:

The following grouping combination of proximity and AND is not allowed because it can be difficult to ascertain the intent of the underlying search logic:

  • marketing w/5 (San Francisco AND New York)

The search term above should be rewritten so that the intent of underlying search logic is more readily apparent:

  • (marketing w/5 (San Francisco OR New York)) AND (San Francisco AND New York)
  • ((marketing w/5 San Francisco) AND New York) OR ((marketing w/5 New York) AND San Francisco)
  • (marketing w/5 San Francisco) AND New York
  • (marketing w/5 New York) AND San Francisco
  • (marketing w/5 New York) AND (marketing w/5 San Francisco)


Proximity + NOT:
The following grouping combination of proximity and NOT is not allowed because it can be difficult to ascertain the intent of the underlying search logic:

  • marketing w/5 (Washington NOT Washington D{ }C)

The search term above should be rewritten so that the intent of underlying search logic is more readily apparent:

  • (marketing w/5 Washington) NOT Washington D{ }C

NOT excludes at the document level. If the intention of the search term is to exclude at the level of individual hits, NOTIN should be used instead of NOT. In most cases, NOTIN can be used instead of NOT to exclude at the hit level rather than at the document level, but there are some exceptions when proximity and grouping are involved (see below).

Proximity + NOTIN:

The following grouping combination of proximity and NOTIN is not allowed because of the computationally expensive nature of NOTIN:

  • marketing w/5 (Washington NOTIN Washington D{ }C)

However, the search term above can be rewritten slightly (see below), and the rewritten version returns nearly the same results that would be returned by the search term above:

  • (marketing w/5 Washington) NOTIN (marketing w/5 Washington D{ }C)

Word Order:

Grouping can trump word order when used with bi-directional proximity in certain combinations:

  • bank account w/10 investigation questions =>
    • bank account f/10 investigation questions
    • investigation questions f/10 bank account

versus

  • bank (account w/10 investigation) questions =>
    • bank account f/10 investigation questions
    • bank investigation f/10 account questions

Case Sensitivity

These operators allow you to run searches with sensitivity to case.

Syntax Description Example search term(s)
allcaps( ) Matches when all letters in the word or phrase are upper case. allcaps(SAT) => Matches the text SAT. Does not match satSatSaT, etc.
firstcap( ) Matches when the first letter in the word (or the first letter in each word of a phrase) is upper case and all other letters are lower case.
This operator does not match words that contain only one letter, such as  A or A30 (the allcaps operator must be used to capture A or A30).
firstcap(Mark Jones) => Matches the text Mark Jones. Does not match mark jonesMARK JONESmaRk jOneS, etc.
nocaps( ) Matches when all letters in the word or phrase are lower case. nocaps(abc100) => Matches the text abc100. Does not match Abc100ABC100aBc100, etc.


By default, DART syntax is not case sensitive. For example, the following search terms do not contain case sensitive operators; therefore, they all match text without sensitivity to case, and they all return exactly the same results:

  • BUDGET AND COST
  • Budget AND Cost
  • budget and cost

Diacritics

When the correct foreign language settings are in place for a document population's search index, this operator allows searching for the presence or absence of diacritics on a word. Diacritics include notations such as accent marks, tildes, umlauts, etc.
Syntax Description Example search term(s)
diacritic( ) Searches for the presence or absence of diacritics on a word.
Alternative syntax:  diac( )
diacritic(está) => Matches the text está. Does not match esta.
diacritic(esta) => Matches the text esta. Does not match está.
diac(baño) => Matches the text baño. Does not match bano.


Without this operator, with the right foreign language settings in place, search terms are blind to diacritics and match both the word with diacritics and the word without diacritics. For example, with the right foreign language settings in place, the search term  está matches the words "está" or "esta." The search term diacritic(está) matches only the word "está," and the search term diacritic(esta) matches only the word "esta."
Note: If the correct foreign language settings are  not in place in the search index, the diacritic operator does not work, and any letters in the text with diacritics on them become part of the ignore class of characters and are treated as white space. For example, without the right foreign language settings, the word "baño" within the text is treated as "ba o" during search, and the word "está" is treated as "est" during search.

Proximity (Advanced)

These proximity operators allow you to specify additional conditions, such as requiring that the elements occur within the same sentence, requiring an exact number of tokens between elements, requiring that an element occur near the beginning or end of a document, or requiring that the elements  do  not occur near each other.
Syntax Description Example search term(s)
s/<n> Matches text where the element on the left occurs within N or fewer tokens of the element on the right, within the same  sentence.Alternative syntax: sw/<n> meetings/5 proposal* => Matches text such as: The proposal meeting is tomorrow.
Does not match:  The meeting is tomorrow. The proposal should be ready by then.
meetingsw/10 proposal* => Same search term as above with alternative syntax.
sf/<n> Matches text where the element on the left is followed by the element on the right, within N or fewer tokens, within the same  sentence. blue sf/2 packag* => Matches text such as: The new flavor comes in blue packaging.
Does not match:  I prefer the blue. The other packaging looks outdated.
e/<n> Matches text where the element on the left occurs  exactly N words away from the element on the right.Alternative syntax: ew/<n> Jeff e/1 Jerry => Matches text such as: Jeff and Jerry or Jerry or Jeff.
Does not match:  Jeff Jerry or Jerry Jeff.
Jeff ew/1 Jerry => Same search term as above with alternative syntax.
ef/<n> Matches text where the element on the left is followed by the element on the right, within  exactly N words. Mary ef/1 Rogers => Matches text such as: Mary-Anne Rogers or Mary Jordan-Rogers.
Does not match:  Mary Rogers.
es/<n> Matches text where the element on the left occurs  exactly N words away from the element on the right, within the same sentence.Alternative syntax: esw/<n> meetinges/3 proposal* => Matches text such as: We'll discuss the proposal again in the meeting tomorrow.
Does not match:  The meeting is tomorrow. The proposal should be ready by then.
meetingesw/10 proposal* => Same search term as above with alternative syntax.
esf/<n> Matches text where the element on the left is followed by the element on the right, within  exactly N words, within the same sentence. blue esf/2 packag* => Matches text such as: The new flavors come in blue and green packaging.
Does not match:  I prefer the blue. The other packaging looks outdated.
#start f/<n> Matches text where the element on the left occurs within N or fewer words from the start of the document body or from the start of a default stream metadata field. #start f/10 Securities "and" Exchange Commission => Matches the phrase Securities and Exchange Commission when it occurs within 10 words of the start of the document.
f/<n> #end Matches text where the element on the left occurs within N or fewer words from the end of the document body or from the end of a default stream metadata field. marketing director f/10 #end => Matches the phrase marketing director when it occurs within 10 words of the end of the document.
NOT + proximity operator Matches text where the element on the left does not occur within N or fewer words of the element on the right (or when the element on the left is not followed by the element on the right, within N or fewer words). marketing NOT /1 director => Matches the word marketing when it does not occur within one word of director.

Variant Sets

A “variant set” is a special element in DART syntax that acts as a stand-in or shorthand for a specific bracketed set of elements.For example, a variant set named “key people” is created and contains the following people: Chen, Goldman, Kapoor, Navarro, and Parker. Using variant set syntax, the search term  meet* w/10 <<key people>> runs as meet* w/10 [Chen, Goldman, Kapoor, Navarro, Parker]. This variant set can be used across multiple search terms, and it represents the same set of elements across all search terms.

Variant sets can also be updated at any time. For example, a few weeks after this variant set is created and in use, Chen is no longer considered a key person, and Carter is added to the list of key people. The underlying syntax for the  <<key people>> variant set is then updated to [Carter, Goldman, Kapoor, Navarro, Parker]. All existing and new search terms with this variant set now run with the updated name list. So meet* w/10 <<key people>> now runs as meet* w/10 [Carter, Goldman, Kapoor, Navarro, Parker].

Syntax Description Example search term(s)
<<vs name>> Stands in for a specific bracketed set of elements (a "variant set"), such that the same set of elements can be represented in shorthand and used across multiple search terms. meet* w/10 <<key people>> => Runs as meet* w/10 [Chen, Goldman, Kapoor, Navarro, Parker]


For more information on how to create, use, and maintain variant sets, see the separate DART guide "Variant Sets."

Token Classes


A "token class" is a special element in DART syntax that allows you to search for a set of tokens that have been defined as belonging to the same class. There are nine token classes available in DART syntax: number, currency, percent, date, time, symbol, email, credit card number, and social security number.

Although the search index ignores all non-alphanumeric characters and treats them as whitespace, this does not apply to token classes. Token classes can search for text that includes symbols, punctuation, and/or special characters.


Note: The chart below contains brief descriptions and examples that attempt to explain at a basic level how each token class works. However, the full syntax for token classes is not necessarily represented below, and the full capability of each token class operator can be more complex than what is represented below.

Syntax Description Example search term(s)
tc(number) Matches any numeral, except numerals written out entirely in letters.
Subclass: exacttext
tc(number) => Matches 1075k5 million350.56-4.956415-555-8301, etc. Does not match 10aten, etc.
tc(number = 70000) => Matches 7000070,00070k7 thousand, etc. Does not match 70700seven thousand, etc.
tc(number, exacttext = 415-555-8301) => Matches 415-555-8301. Does not match 415 555 8301415.555.8301415-555/8301, etc.
tc(currency) Matches any numeral expressing currency, limited to dollars, euros, pounds, or yen. Does not match numerals written out entirely in letters.
Subclass: type
tc(currency) => Matches $10$ 50k800 USD80 dollars3 bucks600 euros, €5 billion£80M900 GBP¥1694500k JPY3 million yen etc. Does not match $80eighty dollars, etc.
tc(currency >= 500) => Matches $500€501£600¥169450k dollars8 hundred yen, etc. Does not match 505$200six hundred dollars, etc.
tc(currency, type = euro) => Matches 60 euros 5 billion500k 12,000 EUR, etc. Does not match $500£80M900 GBP¥1694fifty euros, etc.
tc(percent) Matches any numeral expressing a percentage. Does not match numerals written out entirely in letters. tc(percent) => Matches 10%50 %80 percent95 pct15 per cent, etc. Does not match %10fifty percent, etc.
tc(percent <= 65) => Matches 40 percent65%5 pct, etc. Does not match 2075%nine percent, etc.
tc(date) Matches any token combination expressing a date. The date matched must contain a day, month, and year. Does not match dates written out entirely in letters.
Subclasses: year, month, day
tc(date) => Matches Mar 25, 201120 May 200808/10/20045/15/0925-10-112010-04-02, etc. Does not match JanuaryDecember 20thJanuary first two thousand ten, etc.tc(date = 3/25/11) => Matches 3/25/1103/25/2011March 25, 20113-25-1125-03-11, etc. Does not match March 25March twenty fifth two thousand eleven, etc.
tc(date, year >< 2004 and 2006) => Matches Sept 5 20047-18-057/1/2006, etc. Does not match 2004May 2005, etc.
tc(time) Matches any token combination expressing time. Does not match time written out entirely in letters.
Subclasses: h, m, s
tc(time) => Matches 3pm5:56am8:30 PM11:05:1621:28:32noonmidnight, etc. Does not match 31:5ten o'clock, etc.
tc(time >< 7pm and 11pm) => Matches 7pm22:56:028:30 p.m., etc. Does not match 6:59pm05:35:20, etc.
tc(symbol) Matches any of the following symbols: $, €, £, ¥, or %. tc(symbol = dollar) => Matches $$s$$$$s, etc. Does not match $70dollar, etc.
tc(symbol = %) => Matches %%s%%%%%%%s, etc. Does not match 70%percent, etc.
tc(email) Matches any email address.
Subclasses: account, domain, zone
tc(email = m.roberts@bofa.com) => Matches m.roberts@bofa.com. Does not match m_roberts@bofa.comm.roberts.bofa.comm/roberts/bofa/com, etc.
tc(email: gold*) => Matches goldiehawn@hotmail.comron.goldstone@bofa.commroberts@goldmansachs.com, etc. Does not match m.roberts@bofa.com, etc.
tc(email, account: robert*) => Matches robert.lee@bankofamerica.commatt.robertson@bofa.comm_roberts_surfer@gmail.com, etc. Does not match mattroberts@bofa.comjhobbs@robert.com, etc.

tc(email, domain = bofa or bankofamerica) => Matches mroberts@bofa.comjose.garcia@bankofamerica.comktanaka@bofa.com, etc. Does not match mroberts@gmail.com, etc.
tc(ccn) Matches any numeral expressing a credit card number. Does not match credit card numbers written out entirely in letters.Subclass: type (visa, mc, disc, amex) tc(ccn) => Matches 4111 1111 1111 11115500 0000 0000 0004, etc. Does not match 4111Credit Card, etc.tc(ccn, type = visa) => Matches text than fits the Visa credit card pattern type.
tc(ssn) Matches any numeral expressing a social security number. Does not match social security numbers written out entirely in letters. tc(ssn) => Matches 219-09-9999, etc. Does not match 219 09 0999Social Security, etc.

Computational Expense

Token classes can be computationally expensive because they search for a large variety of tokens that tend to have very high occurrence rates across documents. For example, searching for every number, every mention of currency, or every email address across a large number of documents would likely generate a very high number of hits. Search terms that generate a very high number of hits result in slow search execution and drain system resources. Running a token class on its own is also generally too broad and returns too many results.
For most scenarios, token classes are best used as an element within a search term. For example:

  • (inventory w/10 tc(number > 50)) AND shipment*
  • projected revenue w/50 tc(currency > 100k)
  • (m/email_from: tc(email, domain = bofa)) AND (m/email_dist: tc(email, domain = enron))
  • stash* w/5 tc(symbol = $)

Custom Token Classes

You may request customized token classes for your project. An administrator will need to assess the feasibility of creating and implementing the request (using regular expressions, etc.). Examples of custom token classes that could be created and implemented for a project include:

  • tc(phone number): Could match either of the following token patterns, where N is any digit:
    • (NNN) NNN-NNNN
    • NNN-NNN-NNNN
  • tc(hashtag): Could match any token that contains # as its first character.

Flags

Flags are used in DART to "mark" a document (often referred to as "flagging" or "tagging"). Flags are organized in DSR under flag folders. Flags can be applied at the document level, or they can be used to highlight sequences of text in the document. When flags are used to highlight sequences of text, they are called annotations. The flag operator allows you to search for documents marked by flags at the document level or the annotation level. This syntax also allows you to search for documents marked by any flag under a specified flag folder.

The flag operator syntax is relatively robust and can get complex. Use the Query Builder in DSR for help building out a complex search term using the flag operator.

Syntax Description Example search term(s)
F(flag = ) Returns documents marked with the specified flag.
Additional syntax can be used to return documents where the flag is not marked/set (notset).
Additional conditions can be included to specify the user who applied the flag (user), or the date the flag was applied (date).
F(flag = Accounting) => Returns documents marked with the Accounting flag.
F(flag = Reviewed, notset) => Returns documents that are not marked with the Reviewed flag.
F(flag = Maybe, user = rgreen, date > 3/5/2015) => Returns documents marked with the Maybe flag, where the Maybe flag was applied by user rgreen after March 5, 2015.
F(folder = ) Returns documents marked by any flag under the specified flag folder.
By default, flags located under subfolders are not included. Additional syntax can be used to return documents that have been marked by flags under subfolders (sf).
Additional conditions can be included to specify the number of flags marked on a document (count), the user who applied the flag (user), or the date the flag was applied (date).
F(folder = Issue Codes) => Returns documents marked with at least one flag located under the Issue Codes folder (not including subfolders).
F(folder = Potentially Privileged, sf) => Returns documents marked with at least one flag located under the Potentially Privileged folder, including flags located under subfolders.
F(Folder = Search Topics, count >= 2) => Returns documents marked by two or more flags located under the Search Topics folder (not including flags located under subfolders).

Document Flag vs. Annotation Flag

By default, the flag operator returns both documents marked at the document level and documents marked at the annotation level. Additional syntax can be used to specify those marked at the document level (d) or those marked at the annotation level (a). For example:

  • f(flag = Accounting) returns documents marked by the Accounting flag either at the document level, the annotation level, or both.
  • f(flag = Accounting, d) returns documents marked by the Accounting flag at the document level.
    • Returns a document that is marked by the Accounting flag both at the document level and the annotation level. Does not return a document marked only at the annotation level.
  • f(flag = Accounting, a) returns documents marked by the Accounting flag at the annotation level.
    • Returns a document that is marked by the Accounting flag both at the document level and the annotation level. Does not return a document marked only at the document level.
  • f(flag = Accounting, d) NOT f(flag = Accounting, a) returns documents marked by the Accounting flag at the document level, only when the document is not also marked by the Accounting flag at the annotation level.
    • Does not return a document that is marked by the Accounting flag both at the document level and the annotation level.

Annotation Status and Text

Additional conditions can be included to specify criteria unique to annotations: the annotation status (status), or the text highlighted by the annotation (text). For example:

  • f(flag = Accounting, Status = Accepted) returns documents marked by the Accounting flag at the annotation level, and where the status of the annotation is Accepted.
  • f(flag = Accounting, Text: audit* w/10 question*) returns documents marked by the Accounting flag at the annotation level, and where this annotation highlights text that is matched by the search term audit* w/10 question*.

Flag History

Additional syntax can be used to search the flag history (fh). Searching the flag history returns any document that has ever been marked by the specified flag, even if the flag has since been removed. For example:

  • f(flag = Maybe, fh) returns any document that has ever been marked by the Maybe flag. This includes documents that are still marked by the Maybe flag, as well as documents that were once but are no longer marked by the Maybe flag.
  • f(flag = Maybe, fh) NOT f(flag = Maybe) returns documents that were once but are no longer marked by the Maybe flag.

Vetting Count and Stack

"Vetting" refers to the process of reviewing a document for responsiveness, and marking the document as Responsive (R), Not Responsive (N), or Maybe (M). A special set of flags called "categories" are used in DART for vetting. Categories contain R, N, and M flags, and these flags represent the responsiveness designation applied to a document. A document that does not contain an R, N, or M flag is called "un-vetted."

Category flags have two extra search conditions that can be used in the flag operator: vetting count and stack (these conditions are not available for regular/non-category flags in DART). Vetting count and stack track each time a document's vetting is changed by a different user. This helps facilitate the vetting cross-check and maybe resolution processes. Vetting count and stack can be used to identify when two users disagreed on the responsiveness designation of a single document, or when one user resolved a Maybe designation on a document applied by another user. However, vetting count and stack do not track when a single user changed their mind while vetting, or when a single user resolved their own Maybe designation.

Vetting count and stack must be used in conjunction with the flag history syntax because they search the history of a flag. Vetting count must be used with flag folder syntax, and stack must be used with flag syntax. For example:

  • f(folder = Enron, fh, vcount => 3) returns documents where the vetting has been changed by a different user three or more times.
    • Returns a document where User A vetted the document M, then User B changed the vetting on this document to N, and then User C changed the vetting on this document to R.
    • Returns a document where User A vetted the document R, then User B changed the vetting on this document to N, and then User A changed the vetting on this document back to R.
    • Does not return a document where User A vetted the document R, then User B changed the vetting on this document to N, and then User B changed the vetting on this document back to R (User B changing his/her mind)
  • f(flag = M-Enron, fh, stack = 1) AND f(flag = N-Enron, fh, stack = 2) returns documents where the first vet was M, and then the second vet was N by a different user.
    • Returns a document where User A vetted the document M, and then User B changed the vetting on this document to N.
    • Does not return a document where User A vetted the document M, and then User A changed the vetting on this document to N (User A resolving his/her own Maybe designation).

Notes

Notes can be added to a document in DART. The notes operator allows you to search for documents associated with specified note conditions.
The notes operator syntax is relatively robust and can get complex. Use the Query Builder in DSR for help building out a complex search term using the notes operator.

Syntax Description Example search term(s)
N( ) Returns documents associated with the specified note conditions.
Note conditions include: the number of notes on the document (count), the date the note was created (date), the user who created the note (user), the note type (type), and the note text (text).
N(count > 0) => Returns documents with at least one note.
N(user = jjackson, date >< 2/1/2016 and 2/28/2016) => Returns documents with one or more notes created by user jjackson, created February 1st through 28th, 2016.
N(type = client-facing, text: follow{ }up) => Returns documents with a client-facing note, and where the text of this note is matched by the search term follow{ }up.

Collections

A collection is a set of documents in DART that acts as the search scope for search terms. The collection operator allows you to reference a collection in your search term.
Syntax Description Example search term(s)
C( ) Returns documents that exist within the specified collection name or ID, when those documents also exist within the search scope.
Similar to searching for a list of document IDs within a specified search scope.
budget* w/10 proposal* AND c(name = Delivery01) => Returns documents with text matched by budget* w/10 proposal*, when the document exists within the collection named Delivery01 (and when the document exists within the search scope selected at the time of running the search term).
budget* w/10 proposal* NOT c(id = 10025) => Returns documents with text matched by budget* w/10 proposal*, when the document does not exist within collection ID 10025 (the document must still exist within the search scope selected at the time of running the search term).

Collection Operator vs. Collection Search Scope

There are at least two collections involved when the collection operator is used in a search term: a collection associated with the collection operator, and a collection associated with the search scope. The results returned by a search term using a collection operator are always a subset of the documents that exist within the search scope (you can think of it as searching for a "collection within a collection"). For example:

  • Collection_01 contains five documents: 101, 102, 103, 104, and 105.
  • Collection_02 contains five million documents. Documents 101, 102, 103 are contained in Collection_02, but documents 104 and 105 are not.
  • The following search term is run over Collection_02: c(name = Collection_01).
  • The search term returns the following results: documents 101, 102, and 103.

Lace Jobs

Lace Jobs run batches of search terms over any collection in DART. Documents returned by a Lace Job are referred to as "tagged." Lace Jobs can be used to generate excerpts on tagged documents in DSR.
Once Lace Jobs are complete, the Lace operator allows you to reference them in your search terms. This syntax can also be used to pull up excerpts in DSR.
Syntax Description Example search term(s)
L( ) Returns documents that were tagged (i.e., returned) by the specified Lace Job.
Additional syntax can be used to return documents that were not tagged (i.e., not returned) by the Lace Job (nt), or to pull up excerpts associated with the documents tagged by a Lace Job (e).
L(job = 275) => Returns documents tagged by Lace Job 275.
L(job = 275, t) => Same search term as above with alternative syntax.
L(job = 275, nt) => Returns documents that were not tagged by Lace Job 275.
L(job = 275, e) => Returns documents tagged by Lace Job 275, and pulls up excerpts associated with Lace Job 275.

Additional Lace and Excerpts Features in DSR

Running a search term that contains a Lace operator pulls up two additional features in DSR: Lace View and Term Hits Tab. These additional features display information related to the Lace Job ID referenced in the Lace operator.


When excerpts are generated for a Lace Job, adding an 'e' to the Lace operator syntax pulls up additional features in DSR related to excerpts generated by the job: Excerpts View, Excerpts Tab, and excerpts highlighting.


In the examples below, excerpts have been generated for Lace Job 275:

  • L(job = 275) pulls up the Lace View and the Term Hits Tab.
  • L(job = 275, e) pulls up the Lace View, the Term Hits tab, the Excerpts View, the Excerpts Tab, and excerpts highlighting.

In the examples below, excerpts have not been generated for Lace Job 275:

  • L(job = 275) pulls up the Lace View and the Term Hits Tab.
  • L(job = 275, e) prompts an error message that reads "Excerpts have not yet been generated for this job."

Specifying additional features in Lace Job results

There are four additional features related to Lace Job results that can be specified in this syntax:

  • L(job = 275, terms =< 3) specifies that a document must be hit by 3 or fewer different terms in this Lace Job.
  • L(job = 275, hits > 5) specifies that a document must be hit more than 5 times (by any term) in this Lace Job.
  • L(job = 275, nodecount = 1) specifies that a document must be hit by terms in only one Term Management node in this Lace Job.
  • L(job = 275, expansions > 10) specifies that a document must be hit by more than 10 expansions (from any term) in this Lace Job.