Chapter 24. Complex Data Manipulation

This chapter looks at a variety of less common text manipulation techniques, including the sorting of data, generating data based on examples, and the use of borders, headlines, hyperlinks, and annotations in cells.

24.1 Sorting Data

Sorting is available through SheetCellRange’s XSortable interface. There are four basic steps required for sorting a block of cells:

  1. Obtain an XSortable interface for the cell range;

  2. Specify the sorting criteria as a TableSortField sequence;

  3. Create a sort descriptor;

  4. Execute the sort.

These steps are illustrated by the data_sort.py example, which begins by building a small table:

# in data_sort.py
def main(self) -> None:
    loader = Lo.load_office(Lo.ConnectSocket())

    try:
        doc = CalcDoc(Calc.create_doc(loader))

        doc.set_visible()

        sheet = doc.get_sheet(0)

        # create the table that needs sorting
        vals = (
            ("Level", "Code", "No.", "Team", "Name"),
            ("BS", 20, 4, "B", "Elle"),
            ("BS", 20, 6, "C", "Sweet"),
            ("BS", 20, 2, "A", "Chcomic"),
            ("CS", 30, 5, "A", "Ally"),
            ("MS", 10, 1, "A", "Joker"),
            ("MS", 10, 3, "B", "Kevin"),
            ("CS", 30, 7, "C", "Tom"),
        )
        sheet.set_array(values=vals, name="A1:E8")  # or just "A1"

        # 1. obtain an XSortable interface for the cell range
        source_range = sheet.get_range(range_name="A1:E8")
        x_sort = source_range.qi(XSortable, True)

        # 2. specify the sorting criteria as a TableSortField array
        sort_fields = (self._make_sort_asc(1, True), self._make_sort_asc(2, True))

        # 3. define a sort descriptor
        props = Props.make_props(
            SortFields=Props.any(*sort_fields), ContainsHeader=True
        )

        Lo.wait(2_000)  # wait so user can see original before it is sorted
        # 4. do the sort
        print("Sorting...")
        x_sort.sort(props)

        if self._out_fnm:
            doc.save_doc(fnm=self._out_fnm)

        msg_result = MsgBox.msgbox(
            "Do you wish to close document?",
            "All done",
            boxtype=MessageBoxType.QUERYBOX,
            buttons=MessageBoxButtonsEnum.BUTTONS_YES_NO,
        )
        if msg_result == MessageBoxResultsEnum.YES:
            doc.close_doc()
            Lo.close_office()
        else:
            print("Keeping document open")

    except Exception:
        Lo.close_office()
        raise

The unsorted table is shown in Fig. 199.

An Unsorted Table.

Fig. 199 :An Unsorted Table.

The table is sorted so that its rows are in ascending order depending on their “Code” column values. When two rows have the same code number then the sort uses the “No.” column. Fig. 200 shows the result of applying these two sorting criteria:

The Sorted Table, Using Two Sort Criteria.

Fig. 200 :The Sorted Table, Using Two Sort Criteria.

The four sorting steps mentioned above are implemented like so:

# in data_sort.py
# ...
# 1. obtain an XSortable interface for the cell range
source_range = Calc.get_cell_range(sheet=sheet, range_name="A1:E8")
xsort = Lo.qi(XSortable, source_range, True)

# 2. specify the sorting criteria as a TableSortField array
sort_fields = (self._make_sort_asc(1, True), self._make_sort_asc(2, True))

# 3. define a sort descriptor
props = Props.make_props(SortFields=Props.any(*sort_fields), ContainsHeader=True)

Lo.wait(2_000)  # wait so user can see original before it is sorted
# 4. do the sort
print("Sorting...")
xsort.sort(props)
# ...

The A1:E8 cell range referenced using the XCellRange interface is converted to XSortable. This interface is defined in Office’s util module, not in sheet or table, probably because it’s also used in text documents for sorting tables.

The two sorting criteria are represented by two TableSortField objects in tuple. The _make_sort_asc() function is defined in data_sort.py as:

# in data_sort.py
def _make_sort_asc(self, index: int, is_ascending: bool) -> TableSortField:
    return TableSortField(Field=index, IsAscending=is_ascending, IsCaseSensitive=False)

Note

Because TableSortField is imported with ooouno library (from ooo.dyn.table.table_sort_field import TableSortField) TableSortField can be created using Keyword arguments. This a feature added by ooouno library for uno structs. Normally uno objects only take positional only arguments.

A sort descriptor is an array of PropertyValue objects which affect how XSortable.sort() executes. The most commonly used properties are SortFields and ContainsHeader. SortFields is assigned the sorting criteria (i.e. the TableSortField tuple), and the ContainsHeader boolean specifies whether the sort should exclude the first row because it contains header text.

The sort descriptor properties are defined in a number of classes (SortDescriptor2, TableSortDescriptor2, and TextSortDescriptor2), which are most easily accessed from the XSortable documentation page.

24.2 Generating Data

Cell data is generated by supplying numbers to a function which treats them as the initial values in a arithmetic (or geometric) series. The function employs the series to churn out as many more numbers as are needed to fill a given cell range.

A series is created by the XCellSeries interface, which is part of the SheetCellRange service (see Fig. 201).

The Cell Range Services

Fig. 201 :The Cell Range Services.

Several examples of how to use XCellSeries' two methods, fillAuto() and fillSeries(), are contained in the filler.py example described next.

filler.py starts by filling a blank sheet with an assortment of data, which will be used by the XCellSeries methods to initialize several series. The original sheet is shown in Fig. 202.

The filler.py Sheet before Data Generation

Fig. 202 :The filler.py Sheet before Data Generation.

The simpler of the two XCellSeries methods, XCellSeries.fillAuto(), requires a cell range, fill direction, and how many cells should be examined as ‘seeds’. For example, rows 7, 8, and 9 of Fig. 202 are filled using:

# in Filler._fill_series() of filler.py
# set first two values of three rows

# ascending integers: 1, 2
sheet.set_val(cell_name="B7", value=2)
sheet.set_val(cell_name="A7", value=1)

# dates, decreasing by month
sheet.get_cell(cell_name="A8").set_date(day=28, month=2, year=2015)
sheet.get_cell(cell_name="B8").set_date(day=28, month=1, year=2015)

# descending integers: 6, 4
sheet.set_val(cell_name="A9", value=6)
sheet.set_val(cell_name="B9", value=4)

# get cell range series
series = sheet.get_range(range_name="A7:G9").get_cell_series()

# use first 2 cells for series, and fill to the right
series.fillAuto(FillDirection.TO_RIGHT, 2)

The supplied cell range (A7:G9) includes the seed values, and the cells to be filled.

It’s converted into an XCellSeries interface by Calc.get_cell_series(), which is defined as:

# in Calc class
@staticmethod
def get_cell_series(sheet: XSpreadsheet, range_name: str) -> XCellSeries:
    cell_range = sheet.getCellRangeByName(range_name)
    series = Lo.qi(XCellSeries, cell_range, True)
    return series

XCellSeries.fillAuto() can be supplied with four possible fill directions (TO_BOTTOM, TO_RIGHT, TO_TOP, and TO_LEFT) which also dictate which cells are examined for seeds. By setting the direction to be TO_RIGHT, seed cells in the left-hand parts of the rows are examined. The numerical (2) in the call to fillAuto() shown above specifies how many of those cells will be considered in order to automatically determine the series used for the generated cell values.

Fig. 203 shows the result of filling rows 7, 8, and 9.

Row Filling Using X Cell Series fill Auto method

Fig. 203 :Row Filling Using XCellSeries.fillAuto().

If XCellSeries.fillAuto() doesn’t guess the correct series for the data generation, then XCellSeries.fillSeries() offers finer control over the process. It supports five modes: SIMPLE, LINEAR , GROWTH, DATE, and AUTO.

SIMPLE switches off the series generator, and the seed data is copied unchanged to the other blank cells. AUTO makes Office generate its data series automatically, so performs in the same way as fillAuto(). LINEAR , GROWTH, and DATE give more control to the programmer.

24.2.1 Using the LINEAR Mode

Rows 2 and 3 of the spreadsheet contain the numbers 1 and 4 (see Fig. 202). By using the LINEAR mode, a step, and a stopping value, it’s possible to specify an arithmetic series. For example:

# in Filler._fill_series() of filler.py
# ...
sheet.set_val(cell_name="A2", value=1)
sheet.set_val(cell_name="A3", value=4)

# Fill 2 rows; the 2nd row is not filled completely since
# the end value is reached
series = sheet.get_range(range_name="A2:E3").get_cell_series()
series.fillSeries(FillDirection.TO_RIGHT, FillMode.LINEAR, Calc.NO_DATE, 2, 9)
                # ignore date mode; step == 2; end at 9

The Calc.NO_DATE argument means that dates are not being generated. The 2 value is the step, and 9 is the maximum. The resulting rows 2 and 3 are shown in Fig. 204.

Data Generation Using the LINEAR Mode

Fig. 204 :Data Generation Using the LINEAR Mode.

Note that the second row is incomplete since the generated values for those cells (10 and 12) exceeded the stopping value.

If no stopping value is required, then the last argument can be replaced with Calc.MAX_VALUE.

24.2.2 Using the DATE Mode

If XCellSeries.fillSeries() is called using the DATE mode then it’s possible to specify whether the day, weekday, month, or year parts of the seed date are changed by the series. For example, the seed date at the start of row 4 (20th Nov. 2015) can be incremented one month at a time with the code:

# in Filler._fill_series() of filler.py
# ...
sheet.get_cell(cell_name="A4").set_date(day=20, month=11, year=2015)

# fill by adding one month to date
series = sheet.get_range(range_name="A4:E4").get_cell_series()
series.fillSeries(
    FillDirection.TO_RIGHT,
    FillMode.DATE,
    FillDateMode.FILL_DATE_MONTH,
    1,
    Calc.MAX_VALUE,
)

The result is shown in Fig. 205.

Data Generation Using the DATE Mode

Fig. 205 :Data Generation Using the DATE Mode.

When the month is incremented past 12, it resets to 1, and the year is incremented.

24.2.3 Using the GROWTH Mode

Whereas the LINEAR mode is for creating arithmetic series (i.e. ones incrementing or decrementing in steps), GROWTH mode is for geometric progressions where the ‘step’ value is repeatedly multiplied to the seed.

In the following example, the seed in G6 (10; see Fig. 205) is used in a geometric progression using multiples of 2. The series is placed in cells going up the sheet starting from G6. The code:

# in Filler._fill_series() of filler.py
# ...
sheet.set_val(cell_name="G6", value=10)

# Fill from  bottom to top with a geometric series (*2)
series = sheet.get_range(range_name="G2:G6").get_cell_series()
series.fillSeries(
    FillDirection.TO_TOP, FillMode.GROWTH, Calc.NO_DATE, 2, Calc.MAX_VALUE
)

The resulting sheet is shown in Fig. 206.

Data Generation Using the GROWTH Mode.

Fig. 206 :Data Generation Using the GROWTH Mode.

24.3 Cells with Fancy Text

The cell_texts.py example brings together a few techniques for manipulating text in cells, namely the addition of borders, headlines, hyperlinks, and annotations. The sheet ends up looking like Fig. 207.

Text manipulation in a Sheet.

Fig. 207 :Text manipulation in a Sheet.

24.3.1 Creating a Border and Headline

cell_texts.py draws a decorative border and headline by calling:

# in cell_texts.py
Calc.highlight_range(
    sheet=sheet.component,
    range_name="A2:C7",
    headline="Cells and Cell Ranges",
)

Calc.highlight_range() adds a light blue border around the specified cell range (A2:C7), and the string argument is added to the top-left cell of the range. It’s intended to be a headline, so is drawn in dark blue, and the entire top row is made light blue to match the border. The method is implemented as:

# in Calc class (simplified)
@classmethod
def highlight_range(cls, sheet: XSpreadsheet, headline: str, range_name: str) -> XCell:
    cls.add_border(sheet=sheet, range_name=range_name, color=CommonColor.LIGHT_BLUE)
    addr = cls..get_address(sheet=sheet, range_name=range_name)
    header_range = Calc.get_cell_range(
        sheet=sheet,
        col_start=addr.StartColumn,
        row_start=addr.StartRow,
        col_end=addr.EndColumn,
        row_end=addr.StartRow
        )
    Props.set(header_range, CellBackColor=CommonColor.LIGHT_BLUE)

    # add headline text to the first cell of the row
    first_cell = cls.get_cell(cell_range=headerRange, col=0, row=0)
    cls.set_val(value=headline, cell=first_cell)

    # make text dark blue and bold
    Props.set(first_cell, CharColor=CommonColor.DARK_BLUE, CharWeight=FontWeight.BOLD)

The three-argument add_border() method calls the four-argument version which was described back in 22.2.3 Adding Borders. It passes it a bitwise composition of all the border constants:

The cell range for the top row is extracted from the larger range supplied to Calc.highlight_range(). The easiest way of doing this is to get the address of the larger range as a CellRangeAddress object, and use its row and column positions. The header cell range uses the same row index for its starting and finishing rows:

# part of Calc.highlight_range() (simplified)
addr = Calc.get_address(sheet=sheet, range_name=range_name)
header_range = Calc.getCellRange(
    sheet=sheet,
    col_start=addr.StartColumn,
    row_start=addr.StartRow,
    col_end=addr.EndColumn,
    row_end=addr.StartRow
)

Perhaps the most confusing part of Calc.highlight_range() is how the first cell of the header range is referenced:

first_cell = cls.get_cell(cell_range=headerRange, col=0, row=0)

This is a somewhat different use of get_cell() than previous examples, which have always found a cell within a sheet. For instance:

cell = Calc.getCell(sheet=sheet, col=0, row=0)

The definition for this version of get_cell() is:

# in Calc class (overload method, simplified)
@classmethod
def get_cell(cls, cell_range: XCellRange, col: int, row: int) -> XCell:
    return cell_range.getCellByPosition(col, row)

See also

A position in a cell range (i.e. a (column, row) coordinate) is defined relative to the cell range. This means that the call: first_cell = cls.get_cell(cell_range=headerRange, col=0, row=0) is requesting the top-left cell in headerRange. Since the headerRange covers A2:C2, (0, 0) means the A2 cell.

24.3.3 Printing the Cell’s Text

The cell’s text is accessed via its XText interface:

# in cell_texts.py
def _print_cell_text(self, cell: XCell) -> None:
    txt = Lo.qi(XText, cell, True)
    print(f'Cell Text: "{txt.getString()}"')
    # ...

The call to XText.getString() returns all the text, which is printed as:

Cell Text: "Text in first line. And a hypertext"

The text can also be examined by moving a text cursor through it:

cursor = txt.createTextCursor()

However, it was surprising to discover that this text cursor can not be converted into a sentence or paragraph cursor. Both the following calls return None:

sent_cursor = Lo.qi(XSentenceCursor, cursor)
para_cursor = Lo.qi(XParagraphCursor, cursor)

24.3.4 Adding an Annotation

Cells can be annotated, which causes a little yellow text box to appear near the cell, linked to the cell by an arrow (as in Fig. 207). Creating a new annotation is a two-step process: the XSheetAnnotationsSupplier interface is used to access the collection of existing annotations, and a new one is added by supplying the annotation text and the address of the cell where its arrow will point. These steps are performed by the first half of Calc.add_annotation():

# in Calc class
@classmethod
def add_annotation(
    cls, sheet: XSpreadsheet, cell_name: str, msg: str, is_visible=True
) -> XSheetAnnotation:
    # add the annotation
    addr = cls.get_cell_address(sheet=sheet, cell_name=cell_name)
    anns_supp = Lo.qi(XSheetAnnotationsSupplier, sheet, True)
    anns = anns_supp.getAnnotations()
    anns.insertNew(addr, msg)

    # get a reference to the annotation
    xcell = cls.get_cell(sheet=sheet, cell_name=cell_name)
    ann_anchor = Lo.qi(XSheetAnnotationAnchor, xcell, True)
    ann = ann_anchor.getAnnotation()
    ann.setIsVisible(is_visible)
    return ann

Annotation creation doesn’t return a reference to the new annotation object. For that it’s necessary to examine the cell pointed to by the annotation. XCell is converted into a XSheetAnnotationAnchor, which has a getAnnotation() method for returning the annotation (if one exists).

XSheetAnnotation has several methods for obtaining information about the position, author, and modification date of the annotation. setIsVisible() allows its visibility to be switched on and off.