Chapter 9. Text Search and Replace

The GenericTextDocument service supports the XSearchable and XReplaceable interfaces (see Chapter 5, Fig. 26), which are the entry points for doing regular expression based search and replace inside a document.

XSearchable.createSearchDescriptor() builds a search description (an ordinary string or a regular expression). The search is executed with XSearchable.findAll() or findFirst() and findNext().

XReplaceable works in a similar way but with a replace descriptor which combines a replacement string with the search string. XReplaceable.replaceAll() performs search and replacement, but the XSearchable searching methods are available as well. This is shown in Fig. 69.

Diagram of The X Searchable and X Replaceable_Interfaces

Fig. 69 :The XSearchable and XReplaceable Interfaces.

The following code fragment utilizes the XSearchable and XSearchDescriptor interfaces:

searchable = Lo.qi(XSearchable, doc)
srch_desc = searchable.createSearchDescriptor()
srch_desc.setSearchString("colou?r")

XReplaceable and XReplaceDescriptor objects are configured in a similar way, as shown in the examples.

XSearchDescriptor and XReplaceDescriptor contain get and set methods for their strings. But a lot of the search functionality is expressed as properties in their SearchDescriptor and ReplaceDescriptor services. Fig. 70 summarizes these arrangements.

Diagram of The Search Descriptor and Replace Descriptor Services.

Fig. 70 :The SearchDescriptor and ReplaceDescriptor Services.

The following code fragment utilizes the XSearchable and XSearchDescriptor interfaces:

The next code fragment accesses the SearchDescriptor properties, and switches on regular expression searching:

srch_props = Lo.qi(XPropertySet, srch_desc, raise_err=True)
srch_props.setPropertyValue("SearchRegularExpression", True)

Alternatively, Props.set_property() can be employed:

Props.set_property(srch_desc, "SearchRegularExpression", True)

Once a search descriptor has been created (i.e. its string is set and any properties configured), then one of the findXXX() methods in XSearchable can be called.

For instance, XSearchable.findFirst() returns the text range of the first matching element (or None), as in:

srch = searchable.findFirst(srch_desc)

if srch is not None:
    match_tr = Lo.qi(XTextRange, srch)

The example programs, Text Replace and Italics Styler, demonstrate search and replacement. Text Replace uses XSearchable to find the first occurrence of a regular expression and XReplaceable to replace multiple occurrences of other words.

Italics Styler calls XSearchable’s findAll() to find every occurrence of a phrase.

9.1 Finding the First Matching Phrase

Text Replace repeatedly calls XSearchable.findFirst() with regular expressions taken from a tuple. The first matching phrase for each expression is reported. For instance, the call:

words = ("(G|g)rit", "colou?r",)
find_words(doc, words)

prints the following when bigStory.doc is searched:

Searching for fist occurrence of '(G|g)rit'
- found 'Grit'
    - on page 1
    - at char postion: 8
Searching for fist occurrence of 'colou?r'
- found 'colour'
    - on page 5
    - at char postion: 12

Three pieces of information are printed for each match: the text that matched, its page location, and its character position calculated from the start of the document. The character position could be obtained from a text cursor or a text view cursor, but a page cursor is needed to access the page number. Therefore the easiest thing to use a text view cursor, and a linked page cursor.

The code for find_words():

def find_words(doc: WriteDoc, words: Sequence[str]) -> None:
    # get the view cursor and link the page cursor to it
    tvc = doc.get_view_cursor()
    tvc.goto_start()
    searchable = doc.qi(XSearchable, True)
    search_desc = searchable.createSearchDescriptor()

    for word in words:
        print(f"Searching for fist occurrence of '{word}'")
        search_desc.setSearchString(word)

        search_props = Lo.qi(XPropertySet, search_desc, raise_err=True)
        search_props.setPropertyValue("SearchRegularExpression", True)

        search = searchable.findFirst(search_desc)

        if search is not None:
            match_tr = Lo.qi(XTextRange, search)

            tvc.goto_range(match_tr)
            print(f"  - found '{match_tr.getString()}'")
            print(f"    - on page {tvc.get_page()}")
            # tvc.gotoStart(True)
            tvc.go_right(len(match_tr.getString()), True)
            print(f"    - at char position: {len(tvc.get_string())}")
            Lo.delay(500)

find_words() get the text view cursor (tvc) from WriteDoc.get_view_cursor().

tvc = doc.get_view_cursor()

find_words() creates the text view cursor (tvc), moves it to the start of the document, and links the page cursor to it.

There is only one view cursor in an application, so when the text view cursor moves, so does the page cursor, and vice versa.

The XSearchable and XSearchDescriptor interfaces are instantiated, and a for-loop searches for each word in the supplied array. If XSearchable.findFirst() returns a matching text range, it’s used by XTextCursor.gotoRange() to update the position of the cursor.

After the page position has been printed, the cursor is moved to the right by the length of the current match string.

tvc.go_right(len(match_tr.getString()), True)

9.2 Replacing all the Matching Words

Text Replace also contains a method called replace_words(), which takes two string sequences as arguments:

uk_words = ("colour", "neighbour", "centre", "behaviour", "metre", "through")
us_words = ("color", "neighbor", "center", "behavior", "meter", "thru")

replace_words() cycles through the sequences, replacing all occurrences of the words in the first sequence (ex: in uk_words) with the corresponding words in the second sequence (ex: in us_words). For instance, every occurrence of colour is replaced by color.

Change all occurrences of ...

  colour -> color
    - no. of changes: 1
  neighbour -> neighbor
    - no. of changes: 2
  centre -> center
    - no. of changes: 2
  behaviour -> behavior
    - no. of changes: 0
  metre -> meter
    - no. of changes: 0
  through -> thru
    - no. of changes: 4

Since replace_words() doesn’t report page and character positions, its code is somewhat shorter than find_words():

def replace_words(
    doc: WriteDoc, old_words: Sequence[str], new_words: Sequence[str]
) -> int:
    replaceable = doc.qi(XReplaceable, True)
    replace_desc = Lo.qi(XReplaceDescriptor, replaceable.createSearchDescriptor())

    for old, new in zip(old_words, new_words):
        replace_desc.setSearchString(old)
        replace_desc.setReplaceString(new)
    return replaceable.replaceAll(replace_desc)

The XReplaceable and XReplaceDescriptor interfaces are created in a similar way to their search versions. The replace descriptor has two set methods, one for the search string, the other for the replacement string.

9.3 Finding all Matching Phrases

The Italics Styler example also outputs matching details:

python start.py --show --file "cicero_dummy.odt" --word pleasure green --word pain red

The program opens the file and uses the “search all’ method in XSearchable to find all occurrences of the string in the document. The matching strings are italicized and colored, and the changed document saved as “italicized.doc”. These changes are not performed using XReplaceable methods.

Fig. 71 shows a fragment of the resulting document, with the “pleasure” and “pain” changed in the text. The search ignores case.

Screen shot of A Fragment of The Italicized Document

Fig. 71 :A Fragment of The Italicized Document.

The Italics Styler example also outputs matching details (partial output):

No. of matches: 17
  - found: 'pleasure'
    - on page 1
    - starting at char position: 85
  - found: 'pleasure'
    - on page 1
    - starting at char position: 319
  - found: 'pleasure'
    - on page 1
    - starting at char position: 350
  - found: 'pleasure'
    - on page 1
    - starting at char position: 408
  :
Found 17 results for "pleasure"
Searching for all occurrences of 'pain'
No. of matches: 15
  - found: 'pain'
    - on page 1
    - starting at char position: 107
  - found: 'pain'
    - on page 1
    - starting at char position: 548
  - found: 'pain'
    - on page 1
    - starting at char position: 578
  - found: 'pain'
    - on page 1
    - starting at char position: 647
    :
Found 15 results for "pain"

As with Text Replace, the printed details include the page and character positions of the matches.

The searching in Italics Styler is performed by italicize_all(), which bears a close resemblance to find_words():

def italicize_all(doc: WriteDoc, phrase: str, color: Color) -> int:
    # cursor = Write.get_view_cursor(doc) # can be used when visible
    cursor = doc.get_cursor()
    cursor.goto_start()
    page_cursor = doc.get_view_cursor()
    result = 0
    try:
        searchable = doc.qi(XSearchable, True)
        search_desc = searchable.createSearchDescriptor()
        print(f"Searching for all occurrences of '{phrase}'")
        phrase_len = len(phrase)
        search_desc.setSearchString(phrase)
        # If SearchWords==True, only complete words will be found.
        Props.set(search_desc, SearchCaseSensitive=False, SearchWords=True)

        matches = searchable.findAll(search_desc)
        result = matches.getCount()

        print(f"No. of matches: {result}")

        font_effect = Font(i=True, color=color)

        for i in range(result):
            match_tr = Lo.qi(XTextRange, matches.getByIndex(i))
            if match_tr is not None:
                cursor.goto_range(match_tr, False)
                print(f"  - found: '{match_tr.getString()}'")
                print(f"    - on page {page_cursor.get_page()}")
                cursor.goto_start(True)
                print(
                    f"    - starting at char position: {len(cursor.get_string()) - phrase_len}"
                )

                font_effect.apply(match_tr)

    except Exception:
        raise
    return result

After the search descriptor string has been defined, the SearchCaseSensitive property in SearchDescriptor is set to False:

srch_desc.setSearchString(phrase)
Props.set(search_desc, SearchCaseSensitive=False, SearchWords=True)

This allows the search to match text contains both upper and lower case letters, such as “Pleasure”. Many other search variants, such as restricting the search to complete words, and the use of search similarity parameters are described in the SearchDescriptor documentation (lodoc SearchDescriptor service).

XSearchable.findAll() returns an XIndexAccess collection, which is examined element-by-element inside a for-loop. The text range for each element is obtained by applying Lo.qi():

match_tr = Lo.qi(XTextRange, matches.getByIndex(i))

The reporting of the matching page and character position use text view and page cursors in the same way as find_words() in Text Replace.

XTextRange is part of the TextRange service, which inherits ParagraphProperties and CharacterProperties. These properties are changed to adjust the character color and style of the selected range:

font_effect.apply(match_tr)

This changes the CharColor and CharPosture properties are set to specified color and set to italic.

The color passed into command line can be a integer color such as 16711680 or any color name (case in-sensitive) in CommonColor.