# Chapter 24. Complex Data Manipulation

This chapter looks at a variety of less common text manipulation techniques, including the sorting of data, generating data based on examples, and the use of borders, headlines, hyperlinks, and annotations in cells.

## 24.1 Sorting Data

Sorting is available through SheetCellRange’s XSortable interface. There are four basic steps required for sorting a block of cells:

1. Obtain an XSortable interface for the cell range;

2. Specify the sorting criteria as a TableSortField sequence;

3. Create a sort descriptor;

4. Execute the sort.

These steps are illustrated by the data_sort.py example, which begins by building a small table:

```# in data_sort.py
def main(self) -> None:

try:

doc.set_visible()

sheet = doc.get_sheet(0)

# create the table that needs sorting
vals = (
("Level", "Code", "No.", "Team", "Name"),
("BS", 20, 4, "B", "Elle"),
("BS", 20, 6, "C", "Sweet"),
("BS", 20, 2, "A", "Chcomic"),
("CS", 30, 5, "A", "Ally"),
("MS", 10, 1, "A", "Joker"),
("MS", 10, 3, "B", "Kevin"),
("CS", 30, 7, "C", "Tom"),
)
sheet.set_array(values=vals, name="A1:E8")  # or just "A1"

# 1. obtain an XSortable interface for the cell range
source_range = sheet.get_range(range_name="A1:E8")
x_sort = source_range.qi(XSortable, True)

# 2. specify the sorting criteria as a TableSortField array
sort_fields = (self._make_sort_asc(1, True), self._make_sort_asc(2, True))

# 3. define a sort descriptor
props = Props.make_props(
)

Lo.wait(2_000)  # wait so user can see original before it is sorted
# 4. do the sort
print("Sorting...")
x_sort.sort(props)

if self._out_fnm:
doc.save_doc(fnm=self._out_fnm)

msg_result = MsgBox.msgbox(
"Do you wish to close document?",
"All done",
boxtype=MessageBoxType.QUERYBOX,
buttons=MessageBoxButtonsEnum.BUTTONS_YES_NO,
)
if msg_result == MessageBoxResultsEnum.YES:
doc.close_doc()
Lo.close_office()
else:
print("Keeping document open")

except Exception:
Lo.close_office()
raise
```

The unsorted table is shown in Fig. 199.

The table is sorted so that its rows are in ascending order depending on their “Code” column values. When two rows have the same code number then the sort uses the “No.” column. Fig. 200 shows the result of applying these two sorting criteria:

The four sorting steps mentioned above are implemented like so:

```# in data_sort.py
# ...
# 1. obtain an XSortable interface for the cell range
source_range = Calc.get_cell_range(sheet=sheet, range_name="A1:E8")
xsort = Lo.qi(XSortable, source_range, True)

# 2. specify the sorting criteria as a TableSortField array
sort_fields = (self._make_sort_asc(1, True), self._make_sort_asc(2, True))

# 3. define a sort descriptor
props = Props.make_props(SortFields=Props.any(*sort_fields), ContainsHeader=True)

Lo.wait(2_000)  # wait so user can see original before it is sorted
# 4. do the sort
print("Sorting...")
xsort.sort(props)
# ...
```

The `A1:E8` cell range referenced using the XCellRange interface is converted to XSortable. This interface is defined in Office’s util module, not in sheet or table, probably because it’s also used in text documents for sorting tables.

The two sorting criteria are represented by two TableSortField objects in tuple. The `_make_sort_asc()` function is defined in data_sort.py as:

```# in data_sort.py
def _make_sort_asc(self, index: int, is_ascending: bool) -> TableSortField:
return TableSortField(Field=index, IsAscending=is_ascending, IsCaseSensitive=False)
```

Note

Because TableSortField is imported with ooouno library (`from ooo.dyn.table.table_sort_field import TableSortField`) `TableSortField` can be created using Keyword arguments. This a feature added by ooouno library for `uno structs`. Normally `uno` objects only take positional only arguments.

A sort descriptor is an array of PropertyValue objects which affect how `XSortable.sort()` executes. The most commonly used properties are `SortFields` and `ContainsHeader`. `SortFields` is assigned the sorting criteria (i.e. the TableSortField tuple), and the `ContainsHeader` boolean specifies whether the sort should exclude the first row because it contains header text.

The sort descriptor properties are defined in a number of classes (SortDescriptor2, TableSortDescriptor2, and TextSortDescriptor2), which are most easily accessed from the XSortable documentation page.

## 24.2 Generating Data

Cell data is generated by supplying numbers to a function which treats them as the initial values in a arithmetic (or geometric) series. The function employs the series to churn out as many more numbers as are needed to fill a given cell range.

A series is created by the XCellSeries interface, which is part of the SheetCellRange service (see Fig. 201).

Several examples of how to use `XCellSeries'` two methods, `fillAuto()` and `fillSeries()`, are contained in the filler.py example described next.

filler.py starts by filling a blank sheet with an assortment of data, which will be used by the XCellSeries methods to initialize several series. The original sheet is shown in Fig. 202.

The simpler of the two XCellSeries methods, `XCellSeries.fillAuto()`, requires a cell range, fill direction, and how many cells should be examined as ‘seeds’. For example, rows `7`, `8`, and `9` of Fig. 202 are filled using:

```# in Filler._fill_series() of filler.py
# set first two values of three rows

# ascending integers: 1, 2
sheet.set_val(cell_name="B7", value=2)
sheet.set_val(cell_name="A7", value=1)

# dates, decreasing by month
sheet.get_cell(cell_name="A8").set_date(day=28, month=2, year=2015)
sheet.get_cell(cell_name="B8").set_date(day=28, month=1, year=2015)

# descending integers: 6, 4
sheet.set_val(cell_name="A9", value=6)
sheet.set_val(cell_name="B9", value=4)

# get cell range series
series = sheet.get_range(range_name="A7:G9").get_cell_series()

# use first 2 cells for series, and fill to the right
series.fillAuto(FillDirection.TO_RIGHT, 2)
```

The supplied cell range (`A7:G9`) includes the seed values, and the cells to be filled.

It’s converted into an XCellSeries interface by `Calc.get_cell_series()`, which is defined as:

```# in Calc class
@staticmethod
def get_cell_series(sheet: XSpreadsheet, range_name: str) -> XCellSeries:
cell_range = sheet.getCellRangeByName(range_name)
series = Lo.qi(XCellSeries, cell_range, True)
return series
```

`XCellSeries.fillAuto()` can be supplied with four possible fill directions (`TO_BOTTOM`, `TO_RIGHT`, `TO_TOP`, and `TO_LEFT`) which also dictate which cells are examined for seeds. By setting the direction to be `TO_RIGHT`, seed cells in the left-hand parts of the rows are examined. The numerical (2) in the call to `fillAuto()` shown above specifies how many of those cells will be considered in order to automatically determine the series used for the generated cell values.

Fig. 203 shows the result of filling rows `7`, `8`, and `9`.

If `XCellSeries.fillAuto()` doesn’t guess the correct series for the data generation, then `XCellSeries.fillSeries()` offers finer control over the process. It supports five modes: `SIMPLE`, `LINEAR` , `GROWTH`, `DATE`, and `AUTO`.

`SIMPLE` switches off the series generator, and the seed data is copied unchanged to the other blank cells. `AUTO` makes Office generate its data series automatically, so performs in the same way as `fillAuto()`. `LINEAR` , `GROWTH`, and `DATE` give more control to the programmer.

### 24.2.1 Using the LINEAR Mode

Rows `2` and `3` of the spreadsheet contain the numbers `1` and `4` (see Fig. 202). By using the `LINEAR` mode, a step, and a stopping value, it’s possible to specify an arithmetic series. For example:

```# in Filler._fill_series() of filler.py
# ...
sheet.set_val(cell_name="A2", value=1)
sheet.set_val(cell_name="A3", value=4)

# Fill 2 rows; the 2nd row is not filled completely since
# the end value is reached
series = sheet.get_range(range_name="A2:E3").get_cell_series()
series.fillSeries(FillDirection.TO_RIGHT, FillMode.LINEAR, Calc.NO_DATE, 2, 9)
# ignore date mode; step == 2; end at 9
```

The `Calc.NO_DATE` argument means that dates are not being generated. The `2` value is the step, and `9` is the maximum. The resulting rows `2` and `3` are shown in Fig. 204.

Note that the second row is incomplete since the generated values for those cells (`10` and `12`) exceeded the stopping value.

If no stopping value is required, then the last argument can be replaced with `Calc.MAX_VALUE`.

### 24.2.2 Using the DATE Mode

If `XCellSeries.fillSeries()` is called using the `DATE` mode then it’s possible to specify whether the day, weekday, month, or year parts of the seed date are changed by the series. For example, the seed date at the start of row `4` (`20th Nov. 2015`) can be incremented one month at a time with the code:

```# in Filler._fill_series() of filler.py
# ...
sheet.get_cell(cell_name="A4").set_date(day=20, month=11, year=2015)

# fill by adding one month to date
series = sheet.get_range(range_name="A4:E4").get_cell_series()
series.fillSeries(
FillDirection.TO_RIGHT,
FillMode.DATE,
FillDateMode.FILL_DATE_MONTH,
1,
Calc.MAX_VALUE,
)
```

The result is shown in Fig. 205.

When the month is incremented past `12`, it resets to `1`, and the year is incremented.

### 24.2.3 Using the GROWTH Mode

Whereas the `LINEAR` mode is for creating arithmetic series (i.e. ones incrementing or decrementing in steps), `GROWTH` mode is for geometric progressions where the ‘step’ value is repeatedly multiplied to the seed.

In the following example, the seed in `G6` (10; see Fig. 205) is used in a geometric progression using multiples of `2`. The series is placed in cells going up the sheet starting from `G6`. The code:

```# in Filler._fill_series() of filler.py
# ...
sheet.set_val(cell_name="G6", value=10)

# Fill from  bottom to top with a geometric series (*2)
series = sheet.get_range(range_name="G2:G6").get_cell_series()
series.fillSeries(
FillDirection.TO_TOP, FillMode.GROWTH, Calc.NO_DATE, 2, Calc.MAX_VALUE
)
```

The resulting sheet is shown in Fig. 206.

## 24.3 Cells with Fancy Text

The cell_texts.py example brings together a few techniques for manipulating text in cells, namely the addition of borders, headlines, hyperlinks, and annotations. The sheet ends up looking like Fig. 207.

### 24.3.1 Creating a Border and Headline

cell_texts.py draws a decorative border and headline by calling:

```# in cell_texts.py
Calc.highlight_range(
sheet=sheet.component,
range_name="A2:C7",
headline="Cells and Cell Ranges",
)
```

`Calc.highlight_range()` adds a light blue border around the specified cell range (`A2:C7`), and the string argument is added to the top-left cell of the range. It’s intended to be a headline, so is drawn in dark blue, and the entire top row is made light blue to match the border. The method is implemented as:

```# in Calc class (simplified)
@classmethod
def highlight_range(cls, sheet: XSpreadsheet, headline: str, range_name: str) -> XCell:
sheet=sheet,
)

# add headline text to the first cell of the row
first_cell = cls.get_cell(cell_range=headerRange, col=0, row=0)

# make text dark blue and bold
Props.set(first_cell, CharColor=CommonColor.DARK_BLUE, CharWeight=FontWeight.BOLD)
```

The three-argument `add_border()` method calls the four-argument version which was described back in 22.2.3 Adding Borders. It passes it a bitwise composition of all the border constants:

The cell range for the top row is extracted from the larger range supplied to `Calc.highlight_range()`. The easiest way of doing this is to get the address of the larger range as a CellRangeAddress object, and use its row and column positions. The header cell range uses the same row index for its starting and finishing rows:

```# part of Calc.highlight_range() (simplified)
sheet=sheet,
)
```

Perhaps the most confusing part of `Calc.highlight_range()` is how the first cell of the header range is referenced:

```first_cell = cls.get_cell(cell_range=headerRange, col=0, row=0)
```

This is a somewhat different use of `get_cell()` than previous examples, which have always found a cell within a sheet. For instance:

```cell = Calc.getCell(sheet=sheet, col=0, row=0)
```

The definition for this version of `get_cell()` is:

```# in Calc class (overload method, simplified)
@classmethod
def get_cell(cls, cell_range: XCellRange, col: int, row: int) -> XCell:
return cell_range.getCellByPosition(col, row)
```

A position in a cell range (i.e. a (column, row) coordinate) is defined relative to the cell range. This means that the call: `first_cell = cls.get_cell(cell_range=headerRange, col=0, row=0)` is requesting the top-left cell in `headerRange`. Since the `headerRange` covers `A2:C2`, (`0`, `0`) means the `A2` cell.

### 24.3.3 Printing the Cell’s Text

The cell’s text is accessed via its XText interface:

```# in cell_texts.py
def _print_cell_text(self, cell: XCell) -> None:
txt = Lo.qi(XText, cell, True)
print(f'Cell Text: "{txt.getString()}"')
# ...
```

The call to `XText.getString()` returns all the text, which is printed as:

```Cell Text: "Text in first line. And a hypertext"
```

The text can also be examined by moving a text cursor through it:

```cursor = txt.createTextCursor()
```

However, it was surprising to discover that this text cursor can not be converted into a sentence or paragraph cursor. Both the following calls return `None`:

```sent_cursor = Lo.qi(XSentenceCursor, cursor)
para_cursor = Lo.qi(XParagraphCursor, cursor)
```

### 24.3.4 Adding an Annotation

Cells can be annotated, which causes a little yellow text box to appear near the cell, linked to the cell by an arrow (as in Fig. 207). Creating a new annotation is a two-step process: the XSheetAnnotationsSupplier interface is used to access the collection of existing annotations, and a new one is added by supplying the annotation text and the address of the cell where its arrow will point. These steps are performed by the first half of `Calc.add_annotation()`:

```# in Calc class
@classmethod
cls, sheet: XSpreadsheet, cell_name: str, msg: str, is_visible=True
) -> XSheetAnnotation:
# add the annotation
anns_supp = Lo.qi(XSheetAnnotationsSupplier, sheet, True)
anns = anns_supp.getAnnotations()
Annotation creation doesn’t return a reference to the new annotation object. For that it’s necessary to examine the cell pointed to by the annotation. XCell is converted into a XSheetAnnotationAnchor, which has a `getAnnotation()` method for returning the annotation (if one exists).
XSheetAnnotation has several methods for obtaining information about the position, author, and modification date of the annotation. `setIsVisible()` allows its visibility to be switched on and off.