LIBREOFFICE: Text conversion
Generate Writer, Draw, Impress, Calc and Base documents from text files with a bit of help from Andrew Davison and some command-line magic.
Generate Writer, Draw, Impress, Calc and
Base documents from text files with a bit of help from Andrew Davison and a mix of command line operations.
One advantage of using Libreoffice is that it enables documents to be created and edited via a GUI, so why supply it with mere text files? One reason is to simplify the task of note-taking on a low-powered machine, or on a device with a small screen. Under those circumstances, trying to create documents inside Libreoffice becomes an exercise in frustration. Also, a GUI may be unavailable in some situations, such as when the note-taker is logged into a remote machine via a command-line SSH terminal.
However, Libreoffice is such a useful tool that hardware deprivations during the note-taking process shouldn’t discourage us from importing the notes into the app at a later date. This article looks at a few text-based formats which can be read by
Libreoffice’s Writer, Draw, Impress, Calc and Base tools – perhaps after being processed by other command line operations first.
Writer
Of course, notes can be written in plain old ASCII, but there’s a more powerful format which only requires a little more work: Markdown. It was originally designed for easy conversion into HTML, but it’s been extended over the years, and there’s now a proliferation of variants to choose from. Fortunately, the Pandoc converter tool (https://pandoc.org) supports the main ones, such as Commonmark, Multimarkdown, Markdown Extra, and restructuredtext, a competitor to Markdown favoured by the Python community. Details about the Markdown versions understood by
Pandoc can be found in its user guide (http://bit.ly/ lxf258pandoc), or you might prefer to learn by reading (and modifying) an example, such as the excellent one by meleyal (https://gist.github.com/meleyal/ 5782256), which is used here.
Pandoc supports the conversion of Markdown into ODT (the document format used by Libreoffice Writer), and also offers the possibility of making the output more beautiful by employing styles specified in an OTT template file. The following example uses rapport.ott, which comes from Libreoffice’s templates website (https://extensions.libreoffice.org/templates):
$ pandoc examp.md --reference-odt=rapport.ott -o examp.odt
Figure 1 (bottom left) shows part of the Markdown file and its corresponding ODT translation.
Libreoffice is far from a slouch in the conversion business itself, and can easily convert an ODT file into a large number of formats. Of course, a popular one is PDF, which is done from the command line using:
$ soffice --convert-to pdf:writer_pdf_export examp.odt
The format for the --convert-to option is
Outputfile-extension[:filtername] . There doesn’t seem to be an up-to-date list of possible filter names; the most recent is for Openoffice 3.0 at http://bit.ly/ lxf258ooo. The filter names appear in the second column of the table, under the ‘API Name’ heading.
Another little-known switch option is --infilter= ”Filtername” , for example --infilter=”rich Text Format” ). It isn’t used much, as Libreoffice does a good job of guessing a file’s input filter based on its filename extension. A complete list of command-line arguments for Libreoffice can be found at http://bit. ly/lxf258switches.
Draw
The file examp.md contains an inline image, a resized PNG, specified by:
![Plane Clipart with transparency](bigplanet.png
“icon”){ width=25% }
This image in the resulting ODT document can be resized, moved and rotated by Libreoffice Write, but for more extensive editing, the image should be converted to an SVG file and imported into Libreoffice Draw.
SVG is itself a text-based format, and so falls under the remit of this article, but the reality is that most SVG documents are too difficult to manipulate except via a dedicated GUI tool, such as Draw or Inkscape (https:// inkscape.org), an SVG editor.
Fortunately, there are several text formats for defining graphics that are much less complex than SVG. We’ll briefly look at Graphviz (www.graphviz.org),
Brian Kernighan’s venerable pic language, and plantuml (http://plantuml.com) for UML diagrams.
As its name suggests, Graphviz focuses on the generation of graph and network diagrams. For details, see Mihalis Tsoukalos’ tutorial in LXF219. The Linux installation comes with a GUI graph editor called dotty,
but also command-line tools for creating various kinds of graph, including dot for directed graphs. As an example, the graph.dot file contains: digraph { node [shape=ellipse]; a b c e; // nodes
// labelled, weighted edges a -> b[label=”0.2”,weight=”0.2”]; a -> c[label=”0.4”,weight=”0.4”]; c -> b[label=”0.6”,weight=”0.6”]; c -> e[label=”0.6”,weight=”0.6”]; e -> e[label=”0.1”,weight=”0.1”]; e -> b[label=”0.7”,weight=”0.7”];
}
This can be converted into a PNG file using
$ dot -Tpng graph.dot -o graph.png
dot’s -T argument can take other output formats, including svg . As mentioned above, SVG files can be edited inside Libreoffice Draw.
Brian Kernighan’s pic language dates from the early 1980s, but it’s still a great tool for creating box-andarrow diagrams, such as flow charts and circuit schematics. It lives on as gpic, part of GNU’S groff. The Linux package plotutils includes an implementation, and there are several excellent tutorials on it, including Kernighan’s original manual (https://archive.org/ details/pic-graphics-language) and Eric Raymond’s more recent ‘Making Pictures With GNU PIC’ article (http://bit.ly/lxf258pic). See boxes.pic for a short example, which looks like this:
.PS ellipse “document”; arrow; box “\fipic\fp(1)” arrow; box width 1.2 “\figtbl\fp(1) or \figeqn\fp(1)” “(optional)” dashed; arrow; box “\figtroff\fp(1)”; arrow; ellipse “Postscript”
.PE
It’s converted into a PNG file using
$ pic2plot -Tpng boxes.pic > boxes.png
The PNG output may be a little pixelated, so it’s worth experimenting with increasing the generated bitmap size (which is 570x570 by default), and using a slightly smaller font size (the default is 0.0175). This changes the command to:
$ pic2plot -Tpng -f 0.015 --bitmap-size 1200x1200 boxes. pic > boxes.png
A full list of pic2plot command line options can be found at http://bit.ly/lxf258plot. The best way to manipulate an image on Linux is with the Imagemagick command line utilities (see https://imagemagick. org/script/command-line-tools.php). The pic2plotgenerated PNG image is rather large, and the figure is surrounded by a lot of empty space. The following call to Imagemagick’s convert automatically trims that space, leaving a border of 20 pixels, and scales the image by 75 per cent (which improves its resolution):
$ convert boxes.png -fuzz 1% -trim -bordercolor white -border 20 +repage -resize 75% boxestrim.png Plantuml (http://plantuml.com) focuses on creating UML figures, such as class and sequence diagrams, although it also supports a selection of other diagram types such as Gantt charts and mindmaps. The following example defines a small state diagram: @startuml
[*] --> State1
State1 --> [*]
State1 : this is a string
State1 : this is another string
State1 -> State2
State2 --> [*]
@enduml Plantuml is invoked from a JAR file using Java: $ java -jar plantuml.jar -tpng states.txt Plantuml can also be called from inside Libreoffice Writer as an extension (available from https://extensions. libreoffice.org/extensions/libo_plantuml). It appears as a toolbar called Plantuml, and can translate Plantuml text embedded in a document into an image.
Impress
Although Pandoc is mostly a tool for converting text documents, it also supports a number of slide formats, including LATEX Beamer, HTML Slidy and Dzslides. Sadly, the developers haven’t yet got around to Impress’s ODP format. However, some versions of Pandoc do support Microsoft Powerpoint, and Libreoffice can easily convert from Powerpoint to ODP. Pandoc only added Powerpoint conversion in version 2.0.5, which is fine on Windows where the current version has reached 2.6.3, but the most recent Linux release is still at 1.17.2. In other words, if you want
Pandoc to handle Powerpoint, you’ll have to use it on Windows. However, this inconvenience may have resolved itself by the time you read this article since it looks like Debian will soon support Pandoc 2.2.1.
The slides Markdown example used here, habits. md, comes from the Pandoc user’s guide, online at http://bit.ly/lxf258slide. The conversion ‘dance steps’ from Markdown to Powerpoint to ODP is achieved with two commands:
$ pandoc habits.md -o habits.pptx --reference-doc=ref. pptx
$ soffice --convert-to odp habits.pptx
The --reference-doc option applies a template to the conversion so that the resultant PPTX looks nicer.
Calc
The de facto text format for spreadsheets is CSV. For example, nums.csv contains four lines with their fields separated by commas:
Sally Whittaker,2018,mccarren House,312,3.75,=d1+e1 Belinda Jameson,2017,cushing House,148,3.52
Jeff Smith,2018,prescott House,17-d,3.20
Sandy Allen,2019,oliver House,108,3.48
Note that the last field of the first line is an equation,
=D1+E1 . After conversion to a spreadsheet, the first column is labelled A, so D1 and E1 refer to the fourth and fifth columns of the first row, which contain 312 and 3.75. The equation’s spreadsheet cell (F1) will therefore display 315.75.
Converting CSV into a Libreoffice Calc ODS file requires an --infilter option:
$ soffice --convert-to ods --infilter=”csv:44,34,utf8” nums.csv
The two numerical arguments of --infilter specify the ASCII code for the CSV’S field separator (44 is ASCII for , ), while 34 is the code (“) used to quote text. Details on Calc’s infilter option can be found at http:// bit.ly/lxf258filter. If you’re planning to use CSV extensively, it’s a good idea to install csvkit (https:// csvkit.readthedocs.io/en/latest), which offers a range of CSV tools, including csvlook, csvgrep and
csvcut. csvkit is a Python package, so it is installed using pip.
As you might expect, csvlook is a nice way to view a CSV file:
$ csvlook -d ‘,’ -H nums.csv
| a | b | c | d | e | f |
| --------------- | ----- | -------------- | ---- | ---- | ------ |
| Sally Whittaker | 2,018 | Mccarren House | 312 | 3.75 | =D1+E1 |
| Belinda Jameson | 2,017 | Cushing House | 148 | 3.52 |
|
| Jeff Smith | 2,018 | Prescott House | 17-D | 3.20 | |
| Sandy Allen | 2,019 | Oliver House | 108 | 3.48 | | -d specifies the CSV’S field delimiter, while -H means that there isn’t a header line in the file. Dummy header names (‘a, ‘b’ and so on) are used instead.
Base
Another reason for installing csvkit is that its csvsql command makes it possible to convert a CSV file into an SQL database. This is useful because SQL can be employed as a text-based format for writing databases, via its CREATE TABLE and INSERT INTO commands.
The CSV code in nums2.csv is a variant of the earlier example, but with a header and without the formulae:
Name,year,house,room,cost
Sally Whittaker,2018,mccarren House,312,3.75 Belinda Jameson,2017,cushing House,148,3.52
Jeff Smith,2018,prescott House,17-d,3.20
Sandy Allen,2019,oliver House,108,3.48
A CREATE TABLE operation for this code can be generated using csvkit:
$ csvsql -d ‘,’ -i mysql nums2.csv > nums2.sql
The result in nums2.sql is:
CREATE TABLE nums2 (
`Name` VARCHAR(15) NOT NULL,
`Year` DECIMAL(38, 0) NOT NULL,
`House` VARCHAR(14) NOT NULL,
`Room` VARCHAR(4) NOT NULL,
`Cost` DECIMAL(38, 2) NOT NULL
);
It’s also a good idea to add a conditional DROP TABLE before the CREATE:
$ echo “DROP TABLE IF EXISTS nums2;
$(cat nums2.sql)” > nums2.sql
csvkit doesn’t have a way to generate INSERT INTO operations, but it’s fairly easy to implement the task in Python. The following geninserts.py outputs an
INSERT INTO tuple for each row of a supplied CSV file (excluding the header row): import sys, os, csv if len(sys.argv) != 2: print(” Usage: getinserts
PRINT(“INSERT INTO”, nm, “VALUES”) rowcount = 2 # line 1 is the header line for row in csvreader: vals = ‘, ‘.join(‘\’{0}\’’.format(fld) for fld in row) if rowcount == numlines: # last line of data
print(” (”, vals, “);“) else: print(” (”, vals, “),“) rowcount+=1 csvfile.close()
It’s called like so:
$ python3 geninserts.py nums2.csv >> nums2.sql
The SQL script in nums2.sql becomes:
DROP TABLE IF EXISTS nums2;
CREATE TABLE nums2 (
`Name` VARCHAR(15) NOT NULL,
`Year` DECIMAL(38, 0) NOT NULL,
`House` VARCHAR(14) NOT NULL,
`Room` VARCHAR(4) NOT NULL,
`Cost` DECIMAL(38, 2) NOT NULL
);
INSERT INTO nums2 VALUES
( ‘Sally Whittaker’, ‘2018’, ‘Mccarren House’, ‘312’, ‘3.75’ ),
( ‘Belinda Jameson’, ‘2017’, ‘Cushing House’, ‘148’, ‘3.52’ ),
( ‘Jeff Smith’, ‘2018’, ‘Prescott House’, ‘17-D’, ‘3.20’ ), ( ‘Sandy Allen’, ‘2019’, ‘Oliver House’, ‘108’, ‘3.48’ ); This script can be used to build a nums2 table in most SQL engines (after a few preliminaries). When using MYSQL (or Mariadb) it’s necessary to first create a non-superuser account and an empty database. The MYSQL commands for creating the user ad are:
> create user ad;
> grant all privileges on * . * to ad;
By granting ad all privileges, there’s no need to supply a password to use the account. This is obviously unsafe, but simplifies matters during testing. Now an empty database called numsdb can be created at the command line:
$ mysql -u ad -e “create database numsdb;”
This database will hold the nums2 table and its data uploaded from nums2.sql:
$ mysql -u ad numsdb < nums2.sql
Check the resulting table by querying it:
$ mysql -u ad -e “use numsdb; select * from nums2;”
The output is:
+-----------------+------+----------------+------+------+
| Name | Year | House | Room | Cost | +-----------------+------+----------------+------+------+
| Sally Whittaker | 2018 | Mccarren House | 312 | 3.75 |
| Belinda Jameson | 2017 | Cushing House | 148 | 3.52 |
| Jeff Smith | 2018 | Prescott House | 17-D | 3.20 |
| Sandy Allen | 2019 | Oliver House | 108 | 3.48 | +-----------------+------+----------------+------+------+
The preceding steps converted a CSV file into SQL and imported it into MYSQL as a numsdb database holding a single nums2 table. The final step is to link
Libreoffice Base to that external database. Base connects to MYSQL via a JDBC Driver called libmysql-java. This is installed on Linux in the usual way, but its JAR file must also be added to the class path of Libreoffice’s JRE. This is done via Libreoffice’s
Tools > Options menu. Navigate to Libreoffice > Advanced, and press the Class Path button. Select Add Archive and select the path /usr/share/java/mysql. jar. Then restart Libreoffice.
The newly installed JDBC driver can now be linked to the numsdb database by completing Base’s Database Wizard. On the first screen click the ‘Connect to an existing database’ button, and choose ‘MYSQL’ in the drop-down list. On the second screen, select the ‘Connect using JDBC’ button. Perhaps the trickiest stage is setting up the JDBC connection on the third screen, which is shown in the screenshot top-left.