Chapter 3. Examining

This chapter looks at ways to examine the state of the Office application and a document. A document will be examined in three different ways: the first retrieves properties about the file, such as its author, keywords, and when it was last modified. The second and third approaches extract API details, such as what services and interfaces it uses. This can be done by calling functions in OooDev Utility classes or by utilizing the Development Tools built into Office. See Fig. 24.

3.1 Examining Office

It’s sometimes necessary to examine the state of the Office application, for example to determine its version number or installation directory. There are two main ways of finding this information, using configuration properties and path settings.

3.1.1 Examining Configuration Properties

Configuration management is a complex area, which is explained reasonably well in Chapter 15. Complex Shapes of the developer’s guide and online at OpenOffice Configuration Management; Only basics are explained here. The easiest way of accessing the relevant online section is by typing: loguide "Configuration Management".

Office stores a large assortment of XML configuration data as .xcd files in the \share \registry directory. They can be programmatically accessed in three steps: first a ConfigurationProvider service is created, which represents the configuration database tree. The tree is examined with a ConfigurationAccess service which is supplied with the path to the node of interest. Configuration properties can be accessed by name with the XNameAccess interface.

These steps are hidden inside Info.get_config() which requires at most two arguments – the path to the required node, and the name of the property inside that node.

The two most useful paths seem to be /org.openoffice.Setup/Product and /org.openoffice.Setup/L10N, which are hardwired as constants in the Info class. The simplest version of get_config() looks along both paths by default so the programmer only has to supply a property name when calling the method

This is illustrated in the Office Info Demo example:

Many other property names, which don’t seem that useful, are documented with the Info class. One way of finding the most current list is to browse main.xcd in \share\registry.

# in demo

with Lo.Loader(Lo.ConnectSocket(headless=True)) as loader:
    print(f"OS Platform: {platform.platform()}")
    print(f"OS Version: {platform.version()}")
    print(f"OS Release: {platform.release()}")
    print(f"OS Architecture: {platform.architecture()}")

    print(f"\nOffice Name: {Info.get_config('ooName')}")
    print(f"\nOffice version (long): {Info.get_config('ooSetupVersionAboutBox')}")
    print(f"Office version (short): {Info.get_config('ooSetupVersion')}")
    print(f"\nOffice language location: {Info.get_config('ooLocale')}")
    print(f"System language location: {Info.get_config('ooSetupSystemLocale')}")

    print(f"\nWorking Dir: {Info.get_paths('Work')}")
    print(f"\nOffice Dir: {Info.get_office_dir()}")
    print(f"\nAddin Dir: {Info.get_paths('Addin')}")
    print(f"\nFilters Dir: {Info.get_paths('Filter')}")
    print(f"\nTemplates Dirs: {Info.get_paths('Template')}")
    print(f"\nGallery Dir: {Info.get_paths('Gallery')}")

Example output:

OS Platform: Linux-5.15.0-41-generic-x86_64-with-debian-bookworm-sid
OS Version: #44-Ubuntu SMP Wed Jun 22 14:20:53 UTC 2022
OS Release: 5.15.0-41-generic
OS Architecture: ('64bit', 'ELF')

Office Name: LibreOffice

Office version (long): 7.3.4.2
Office version (short): 7.3

Office language location: en-US
System language location:

Working Dir: file:///home/user/Documents

Office Dir: /usr/lib/libreoffice

Addin Dir: file:///usr/lib/libreoffice/program/addin

Filters Dir: file:///usr/lib/libreoffice/program/filter
...

3.1.2 Examining Path Settings

Path settings store directory locations for parts of the Office installation, such as the whereabouts of the gallery and spellchecker files. A partial list of predefined paths is accessible from within LibreOffice, via the Tools menu: Tools, Options, LibreOffice, Paths. But the best source of information is the developer’s guide, in the “Path Organization” section of chapter 6, or at OpenOffice Path Organization, which can be accessed using: loguide "Path Organization"

One issue is that path settings comes in two forms: a string holding a single directory path, or a string made up of a ; - separated paths. Additionally, the directories are returned in URI format (i.e. they start with file:///).

Info.get_paths() hides the creation of a PathSettings service, and the accessing of its properties.

Probably the most common Office forum question about paths is how to determine Office’s installation directory. Unfortunately, that isn’t one of the paths stored in the PathSettings service, but the information is accessible via one of the other paths. It’s possible to retrieve the path for Add-ins (which is \program\addin), and move up the directory hierarchy two levels. This trick is implemented by Info.get_office_dir().

Examples of using Info.get_office_dir() and Info.get_paths() appear in Office Info Demo:

print(f"\nOffice Dir: {Info.get_office_dir()}")
print(f"\nAddin Dir: {Info.get_paths('Addin')}")
print(f"\nFilters Dir: {Info.get_paths('Filter')}")
print(f"\nTemplates Dirs: {Info.get_paths('Template')}")
print(f"\nGallery Dir: {Info.get_paths('Gallery')}")

3.2 Getting and Setting Document Properties

Document properties is the information that’s displayed when you right-click on a file icon, and select “Properties” from the menu, as in Fig. 19.

A Properties Dialog in Windows 10 for algs.odp

Fig. 19 :A Properties Dialog in Windows 10 for algs.odp.

If you select the “Details” tab, a list of properties appears like those in Fig. 20.

Details Properties List for algs.odp

Fig. 20 :Details Properties List for algs.odp.

An issue with document properties is that the Office API for manipulating them has changed. The old interfaces were XDocumentInfoSupplier and XDocumentInfo, but these have been deprecated, and replaced by XDocumentPropertiesSupplier and XDocumentProperties. This wouldn’t really matter except that while OpenOffice retains those deprecated interfaces, LibreOffice has removed them.

3.2.1 Reporting OS File Properties

Doc Properties example prints the document properties by calling: Info.print_doc_properties(doc).

print_doc_properties() converts the document to an XDocumentPropertiesSupplier interface, and extracts the XDocumentProperties object:

@classmethod
def print_doc_properties(cls, doc: object) -> None:
    try:
        doc_props_supp = mLo.Lo.qi(XDocumentPropertiesSupplier, doc, True)
        dps = doc_props_supp.getDocumentProperties()
        cls.print_doc_props(dps=dps)
        ud_props = dps.getUserDefinedProperties()
        mProps.Props.show_obj_props("UserDefined Info", ud_props)
    except Exception as e:
        mLo.Lo.print("Unable to get doc properties")
        mLo.Lo.print(f"    {e}")
    return

Although the XDocumentProperties interface belongs to a DocumentProperties service, that service does not contain any properties/attributes. Instead its data is stored inside XDocumentProperties and accessed and changed with get/set methods based on the attribute names. For example, the Author attribute is obtained by calling XDocumentProperties.Author.

As a consequence, print_doc_props() consists of a long list of get method calls inside print statements:

print("Document Properties Info")
print("  Author: " + dps.Author)
print("  Title: " + dps.Title)
print("  Subject: " + dps.Subject)
print("  Description: " + dps.Description)
print("  Generator: " + dps.Generator)

keys = dps.Keywords
print("  Keywords: ")
for keyword in keys:
    print(f"  {keyword}")

print("  Modified by: " + dps.ModifiedBy)
print("  Printed by: " + dps.PrintedBy)
print("  Template Name: " + dps.TemplateName)
print("  Template URL: " + dps.TemplateURL)
print("  Autoload URL: " + dps.AutoloadURL)
print("  Default Target: " + dps.DefaultTarget)
# and more ...

However, user-defined file properties are accessed with an XPropertyContainer, as can be seen back in print_doc_properties().

3.2.2 Setting Document Properties

The setting of document properties is done with set methods, as in Info.set_doc_props() which sets the file’s subject, title, and author properties:

@staticmethod
def set_doc_props(doc: object, subject: str, title: str, author: str) -> None:
    try:
        dp_supplier = mLo.Lo.qi(XDocumentPropertiesSupplier, doc, True)
        doc_props = dp_supplier.getDocumentProperties()
        doc_props.Subject = subject
        doc_props.Title = title
        doc_props.Author = author
    except Exception as e:
        raise mEx.PropertiesError("Unable to set doc properties") from e

This method is called at the end of Doc Properties:

Info.set_doc_props(doc, "Example", "Examples", "Amour Spirit")

After the properties are changed, the document must be saved otherwise the changes will be lost when the document is closed.

The changed properties appear in the “Document Statistics” list shown in Fig. 21.

"Document Statistics" Properties List for "algs.odp"

Fig. 21 :”Document Statistics” Properties List for algs.odp.

3.3 Examining a Document for API Information

After programming with the Office API for a while, you may start to notice that two coding questions keep coming up. They are:

  1. For the service I’m using at the moment, what are its properties?

  2. When I need to do something to a document (e.g. close an XComponent instance), which interface should I cast XComponent to by calling Lo.qi()?

The first question arose in Chapter 2 when set properties in loadComponentFromURL() and storeToURL() were needed. Unfortunately the LibreOffice documentation or OfficeDocument doesn’t list all the properties associated with the service. Have a look for yourself by typing lodoc OfficeDocument service, which takes you to its IDL Page unfortunately. You’ll then need to click on the OfficeDocument link in the “Classes” section to reach the documentation. OfficeDocument’s “Public Attributes” section only lists three properties. There is a OfficeDocument Member List which is a little more helpful but can be challenging decipher.

The second problem is also only partly addressed by the LibreOffice documentation. The pages helpfully includes inheritance tree diagrams that can be clicked on to jump to the documentation about other services and interfaces. But the diagrams don’t make a distinction between “contains” relationships (for interfaces in a service) and the two kinds of inheritance (for services and for interfaces).

These complaints have appeared frequently in the Office forums. Two approaches for easing matters are often suggested. One is to write code to print out details about a loaded document, which is my approach in the next subsection. A second technique is to install an Office extension for browsing a document’s structure. Since LibreOffice 7.2 there is also Development Tools. 3.3.2 Examining a Document Using Development Tools looks at options.

3.3.1 Printing Programming Details about a Document

The messy job is hidden, the job of collecting service, interface, property, and method information about a document inside the Info and Props utility classes. The five main methods for retrieving details can be understood by considering their position in Fig. 22 Service and Interface Relationship diagram.

Methods to Investigate the Service and Interface Relationships and Hierarchies

Fig. 22 :Methods to Investigate the Service and Interface Relationships and Hierarchies.

The methods are shown in action in the Doc Info example, which loads a document and prints information about its services, interfaces, methods, and properties. The relevant code fragment:

with BreakContext(Lo.Loader(Lo.ConnectSocket(headless=True))) as loader:
    fnm = args.fnm_doc
    doc_type = Info.get_doc_type(fnm=fnm)
    print(f"Doc type: {doc_type}")
    Props.show_doc_type_props(doc_type)

    try:
        doc = Lo.open_doc(fnm=fnm, loader=loader)
    except Exception:
        print(f"Could not open '{fnm}'")
        raise BreakContext.Break

    if args.service is True:
        print()
        print(" Services for this document: ".center(80, "-"))
        for service in Info.get_services(doc):
            print(f"  {service}")
        print()
        print(f"{Lo.Service.WRITER} is supported: {Info.is_doc_type(doc, Lo.Service.WRITER)}")
        print()

        print("  Available Services for this document: ".center(80, "-"))
        for i, service in enumerate(Info.get_available_services(doc)):
            print(f"  {service}")
        print(f"No. available services: {i}")

    if args.interface is True:
        print()
        print(" Interfaces for this document: ".center(80, "-"))
        for i, intfs in enumerate(Info.get_interfaces(doc)):
            print(f"  {intfs}")
        print(f"No. interfaces: {i}")

    if args.xdoc is True:
        print()
        print(f" Method for interface: com.sun.star.text.XTextDocument ".center(80, "-"))

        for i, meth in enumerate(Info.get_methods("com.sun.star.text.XTextDocument")):
            print(f"  {meth}()")
        print(f"No. methods: {i}")

    if args.property is True:
        print()
        print(" Properties for this document: ".center(80, "-"))
        for i, prop in enumerate(Props.get_properties(doc)):
            print(f"  {Props.show_property(prop)}")
        print(f"No. properties: {i}")

    if args.doc_meth is True:
        print()
        print(f" Method for entire document ".center(80, "-"))

        for i, meth in enumerate(Info.get_methods_obj(doc)):
            print(f"  {meth}()")
        print(f"No. methods: {i}")

    print()

    prop_name = "CharacterCount"
    print(f"Value of {prop_name}: {Props.get_property(doc, prop_name)}")

    Lo.close_doc(doc)

When a word file is examined this program, only three services were found: OfficeDocument, GenericTextDocument, and TextDocument, which correspond to the text document part of the hierarchy in Chapter 1, Fig. 9. That doesn’t seem so bad until you look at the output from the other Info.getXXX() methods: the document can call 206 other available services, 69 interfaces, and manipulate 40 properties.

In the code above only the methods available to XTextDocument are printed:

for i, meth in enumerate(Info.get_methods("com.sun.star.text.XTextDocument")):
    print(f"  {meth}()")
print(f"No. methods: {i}")

Nineteen methods are listed, collectively inherited from the interfaces in XTextDocument’s inheritance hierarchy shown in Fig. 23.

Inheritance Hierarchy for XTextDocument.

Fig. 23 : Inheritance Hierarchy for XTextDocument.

A similar diagram appears on the XTextDocument documentation webpage, but is complicated by also including the inheritance hierarchy for the TextDocument service. Note, the interface hierarchy is also textually represented in the “List all members” section of the documentation.

The last part of the code fragment prints all the document’s property names and types by calling Props.show_property(). If you only want to know about one specific property then use Props.get_property(), which requires a reference to the document and the property name:

prop_name = "CharacterCount"
print(f"Value of {prop_name}: {Props.get_property(doc, prop_name)}")

File Types Another group of utility methods let a programmer investigate a file’s document type. Info.get_doc_type() get the document type from the file path and Props.show_doc_type_props() show the doc type information.

with BreakContext(Lo.Loader(Lo.ConnectSocket(headless=True))) as loader:
    fnm = args.fnm_doc
    doc_type = Info.get_doc_type(fnm=fnm)
    print(f"Doc type: {doc_type}")
    Props.show_doc_type_props(doc_type)
Doc type: writer8
Properties for 'writer8':
ClipboardFormat: Writer 8
DetectService: com.sun.star.comp.filters.StorageFilterDetect
Extensions: odt
Finalized: False
Mandatory: False
MediaType: application/vnd.oasis.opendocument.text
Name: writer8
Preferred: True
PreferredFilter: writer8
UIName: Writer 8
UINames: [
    en-US = Writer 8
]
URLPattern: private:factory/swriter

3.3.2 Examining a Document Using Development Tools

It’s hardly surprising that Office developers have wanted to make the investigation of services, interfaces, and properties associated with documents and other objects easier. There are several extension which do this, such as MRI - UNO Object Inspection Tool and APSO - Alternative Script Organizer for Python.

Since LibreOffice 7.2 we have the advantage of using Development Tools, that inspects objects in LibreOffice documents and shows supported UNO services, as well as available methods, properties and implemented interfaces. This feature as seen in Fig. 24 also allows to explore the document structure using the Document Object Model (DOM).

LibreOffice Develop Tools screenshot

Fig. 24 : LibreOffice Develop Tools