Linux Format

LIBREOFFIC­E: Text conversion

Generate Writer, Draw, Impress, Calc and Base documents from text files with a bit of help from Andrew Davison and some command-line magic.

- Andrew Davison is a teacher, author, and programmer who is rekindling his love for UNIX and Linux by hacking with the Raspberry Pi.

Generate Writer, Draw, Impress, Calc and

Base documents from text files with a bit of help from Andrew Davison and a mix of command line operations.

One advantage of using Libreoffic­e is that it enables documents to be created and edited via a GUI, so why supply it with mere text files? One reason is to simplify the task of note-taking on a low-powered machine, or on a device with a small screen. Under those circumstan­ces, trying to create documents inside Libreoffic­e becomes an exercise in frustratio­n. Also, a GUI may be unavailabl­e in some situations, such as when the note-taker is logged into a remote machine via a command-line SSH terminal.

However, Libreoffic­e is such a useful tool that hardware deprivatio­ns during the note-taking process shouldn’t discourage us from importing the notes into the app at a later date. This article looks at a few text-based formats which can be read by

Libreoffic­e’s Writer, Draw, Impress, Calc and Base tools – perhaps after being processed by other command line operations first.

Writer

Of course, notes can be written in plain old ASCII, but there’s a more powerful format which only requires a little more work: Markdown. It was originally designed for easy conversion into HTML, but it’s been extended over the years, and there’s now a proliferat­ion of variants to choose from. Fortunatel­y, the Pandoc converter tool (https://pandoc.org) supports the main ones, such as Commonmark, Multimarkd­own, Markdown Extra, and restructur­edtext, a competitor to Markdown favoured by the Python community. Details about the Markdown versions understood by

Pandoc can be found in its user guide (http://bit.ly/ lxf258pand­oc), or you might prefer to learn by reading (and modifying) an example, such as the excellent one by meleyal (https://gist.github.com/meleyal/ 5782256), which is used here.

Pandoc supports the conversion of Markdown into ODT (the document format used by Libreoffic­e Writer), and also offers the possibilit­y of making the output more beautiful by employing styles specified in an OTT template file. The following example uses rapport.ott, which comes from Libreoffic­e’s templates website (https://extensions.libreoffic­e.org/templates):

$ pandoc examp.md --reference-odt=rapport.ott -o examp.odt

Figure 1 (bottom left) shows part of the Markdown file and its correspond­ing ODT translatio­n.

Libreoffic­e is far from a slouch in the conversion business itself, and can easily convert an ODT file into a large number of formats. Of course, a popular one is PDF, which is done from the command line using:

$ soffice --convert-to pdf:writer_pdf_export examp.odt

The format for the --convert-to option is

Outputfile-extension[:filtername] . There doesn’t seem to be an up-to-date list of possible filter names; the most recent is for Openoffice 3.0 at http://bit.ly/ lxf258ooo. The filter names appear in the second column of the table, under the ‘API Name’ heading.

Another little-known switch option is --infilter= ”Filtername” , for example --infilter=”rich Text Format” ). It isn’t used much, as Libreoffic­e does a good job of guessing a file’s input filter based on its filename extension. A complete list of command-line arguments for Libreoffic­e can be found at http://bit. ly/lxf258swit­ches.

Draw

The file examp.md contains an inline image, a resized PNG, specified by:

![Plane Clipart with transparen­cy](bigplanet.png

“icon”){ width=25% }

This image in the resulting ODT document can be resized, moved and rotated by Libreoffic­e Write, but for more extensive editing, the image should be converted to an SVG file and imported into Libreoffic­e Draw.

SVG is itself a text-based format, and so falls under the remit of this article, but the reality is that most SVG documents are too difficult to manipulate except via a dedicated GUI tool, such as Draw or Inkscape (https:// inkscape.org), an SVG editor.

Fortunatel­y, there are several text formats for defining graphics that are much less complex than SVG. We’ll briefly look at Graphviz (www.graphviz.org),

Brian Kernighan’s venerable pic language, and plantuml (http://plantuml.com) for UML diagrams.

As its name suggests, Graphviz focuses on the generation of graph and network diagrams. For details, see Mihalis Tsoukalos’ tutorial in LXF219. The Linux installati­on comes with a GUI graph editor called dotty,

but also command-line tools for creating various kinds of graph, including dot for directed graphs. As an example, the graph.dot file contains: digraph { node [shape=ellipse]; a b c e; // nodes

// labelled, weighted edges a -> b[label=”0.2”,weight=”0.2”]; a -> c[label=”0.4”,weight=”0.4”]; c -> b[label=”0.6”,weight=”0.6”]; c -> e[label=”0.6”,weight=”0.6”]; e -> e[label=”0.1”,weight=”0.1”]; e -> b[label=”0.7”,weight=”0.7”];

}

This can be converted into a PNG file using

$ dot -Tpng graph.dot -o graph.png

dot’s -T argument can take other output formats, including svg . As mentioned above, SVG files can be edited inside Libreoffic­e Draw.

Brian Kernighan’s pic language dates from the early 1980s, but it’s still a great tool for creating box-andarrow diagrams, such as flow charts and circuit schematics. It lives on as gpic, part of GNU’S groff. The Linux package plotutils includes an implementa­tion, and there are several excellent tutorials on it, including Kernighan’s original manual (https://archive.org/ details/pic-graphics-language) and Eric Raymond’s more recent ‘Making Pictures With GNU PIC’ article (http://bit.ly/lxf258pic). See boxes.pic for a short example, which looks like this:

.PS ellipse “document”; arrow; box “\fipic\fp(1)” arrow; box width 1.2 “\figtbl\fp(1) or \figeqn\fp(1)” “(optional)” dashed; arrow; box “\figtroff\fp(1)”; arrow; ellipse “Postscript”

.PE

It’s converted into a PNG file using

$ pic2plot -Tpng boxes.pic > boxes.png

The PNG output may be a little pixelated, so it’s worth experiment­ing with increasing the generated bitmap size (which is 570x570 by default), and using a slightly smaller font size (the default is 0.0175). This changes the command to:

$ pic2plot -Tpng -f 0.015 --bitmap-size 1200x1200 boxes. pic > boxes.png

A full list of pic2plot command line options can be found at http://bit.ly/lxf258plot. The best way to manipulate an image on Linux is with the Imagemagic­k command line utilities (see https://imagemagic­k. org/script/command-line-tools.php). The pic2plotge­nerated PNG image is rather large, and the figure is surrounded by a lot of empty space. The following call to Imagemagic­k’s convert automatica­lly trims that space, leaving a border of 20 pixels, and scales the image by 75 per cent (which improves its resolution):

$ convert boxes.png -fuzz 1% -trim -bordercolo­r white -border 20 +repage -resize 75% boxestrim.png Plantuml (http://plantuml.com) focuses on creating UML figures, such as class and sequence diagrams, although it also supports a selection of other diagram types such as Gantt charts and mindmaps. The following example defines a small state diagram: @startuml

[*] --> State1

State1 --> [*]

State1 : this is a string

State1 : this is another string

State1 -> State2

State2 --> [*]

@enduml Plantuml is invoked from a JAR file using Java: $ java -jar plantuml.jar -tpng states.txt Plantuml can also be called from inside Libreoffic­e Writer as an extension (available from https://extensions. libreoffic­e.org/extensions/libo_plantuml). It appears as a toolbar called Plantuml, and can translate Plantuml text embedded in a document into an image.

Impress

Although Pandoc is mostly a tool for converting text documents, it also supports a number of slide formats, including LATEX Beamer, HTML Slidy and Dzslides. Sadly, the developers haven’t yet got around to Impress’s ODP format. However, some versions of Pandoc do support Microsoft Powerpoint, and Libreoffic­e can easily convert from Powerpoint to ODP. Pandoc only added Powerpoint conversion in version 2.0.5, which is fine on Windows where the current version has reached 2.6.3, but the most recent Linux release is still at 1.17.2. In other words, if you want

Pandoc to handle Powerpoint, you’ll have to use it on Windows. However, this inconvenie­nce may have resolved itself by the time you read this article since it looks like Debian will soon support Pandoc 2.2.1.

The slides Markdown example used here, habits. md, comes from the Pandoc user’s guide, online at http://bit.ly/lxf258slid­e. The conversion ‘dance steps’ from Markdown to Powerpoint to ODP is achieved with two commands:

$ pandoc habits.md -o habits.pptx --reference-doc=ref. pptx

$ soffice --convert-to odp habits.pptx

The --reference-doc option applies a template to the conversion so that the resultant PPTX looks nicer.

Calc

The de facto text format for spreadshee­ts is CSV. For example, nums.csv contains four lines with their fields separated by commas:

Sally Whittaker,2018,mccarren House,312,3.75,=d1+e1 Belinda Jameson,2017,cushing House,148,3.52

Jeff Smith,2018,prescott House,17-d,3.20

Sandy Allen,2019,oliver House,108,3.48

Note that the last field of the first line is an equation,

=D1+E1 . After conversion to a spreadshee­t, the first column is labelled A, so D1 and E1 refer to the fourth and fifth columns of the first row, which contain 312 and 3.75. The equation’s spreadshee­t cell (F1) will therefore display 315.75.

Converting CSV into a Libreoffic­e Calc ODS file requires an --infilter option:

$ soffice --convert-to ods --infilter=”csv:44,34,utf8” nums.csv

The two numerical arguments of --infilter specify the ASCII code for the CSV’S field separator (44 is ASCII for , ), while 34 is the code (“) used to quote text. Details on Calc’s infilter option can be found at http:// bit.ly/lxf258filt­er. If you’re planning to use CSV extensivel­y, it’s a good idea to install csvkit (https:// csvkit.readthedoc­s.io/en/latest), which offers a range of CSV tools, including csvlook, csvgrep and

csvcut. csvkit is a Python package, so it is installed using pip.

As you might expect, csvlook is a nice way to view a CSV file:

$ csvlook -d ‘,’ -H nums.csv

| a | b | c | d | e | f |

| --------------- | ----- | -------------- | ---- | ---- | ------ |

| Sally Whittaker | 2,018 | Mccarren House | 312 | 3.75 | =D1+E1 |

| Belinda Jameson | 2,017 | Cushing House | 148 | 3.52 |

|

| Jeff Smith | 2,018 | Prescott House | 17-D | 3.20 | |

| Sandy Allen | 2,019 | Oliver House | 108 | 3.48 | | -d specifies the CSV’S field delimiter, while -H means that there isn’t a header line in the file. Dummy header names (‘a, ‘b’ and so on) are used instead.

Base

Another reason for installing csvkit is that its csvsql command makes it possible to convert a CSV file into an SQL database. This is useful because SQL can be employed as a text-based format for writing databases, via its CREATE TABLE and INSERT INTO commands.

The CSV code in nums2.csv is a variant of the earlier example, but with a header and without the formulae:

Name,year,house,room,cost

Sally Whittaker,2018,mccarren House,312,3.75 Belinda Jameson,2017,cushing House,148,3.52

Jeff Smith,2018,prescott House,17-d,3.20

Sandy Allen,2019,oliver House,108,3.48

A CREATE TABLE operation for this code can be generated using csvkit:

$ csvsql -d ‘,’ -i mysql nums2.csv > nums2.sql

The result in nums2.sql is:

CREATE TABLE nums2 (

`Name` VARCHAR(15) NOT NULL,

`Year` DECIMAL(38, 0) NOT NULL,

`House` VARCHAR(14) NOT NULL,

`Room` VARCHAR(4) NOT NULL,

`Cost` DECIMAL(38, 2) NOT NULL

);

It’s also a good idea to add a conditiona­l DROP TABLE before the CREATE:

$ echo “DROP TABLE IF EXISTS nums2;

$(cat nums2.sql)” > nums2.sql

csvkit doesn’t have a way to generate INSERT INTO operations, but it’s fairly easy to implement the task in Python. The following geninserts.py outputs an

INSERT INTO tuple for each row of a supplied CSV file (excluding the header row): import sys, os, csv if len(sys.argv) != 2: print(” Usage: getinserts ”) exit(1) nm = os.path.splitext(sys.argv[1])[0] # remove filename’s extension csvfile = open(sys.argv[1], ‘r’) numlines = len(csvfile.readlines()) csvfile.seek(0) # reset file to beginning csvreader = csv.reader(csvfile) next(csvreader) # skip the header line print(“”)

PRINT(“INSERT INTO”, nm, “VALUES”) rowcount = 2 # line 1 is the header line for row in csvreader: vals = ‘, ‘.join(‘\’{0}\’’.format(fld) for fld in row) if rowcount == numlines: # last line of data

print(” (”, vals, “);“) else: print(” (”, vals, “),“) rowcount+=1 csvfile.close()

It’s called like so:

$ python3 geninserts.py nums2.csv >> nums2.sql

The SQL script in nums2.sql becomes:

DROP TABLE IF EXISTS nums2;

CREATE TABLE nums2 (

`Name` VARCHAR(15) NOT NULL,

`Year` DECIMAL(38, 0) NOT NULL,

`House` VARCHAR(14) NOT NULL,

`Room` VARCHAR(4) NOT NULL,

`Cost` DECIMAL(38, 2) NOT NULL

);

INSERT INTO nums2 VALUES

( ‘Sally Whittaker’, ‘2018’, ‘Mccarren House’, ‘312’, ‘3.75’ ),

( ‘Belinda Jameson’, ‘2017’, ‘Cushing House’, ‘148’, ‘3.52’ ),

( ‘Jeff Smith’, ‘2018’, ‘Prescott House’, ‘17-D’, ‘3.20’ ), ( ‘Sandy Allen’, ‘2019’, ‘Oliver House’, ‘108’, ‘3.48’ ); This script can be used to build a nums2 table in most SQL engines (after a few preliminar­ies). When using MYSQL (or Mariadb) it’s necessary to first create a non-superuser account and an empty database. The MYSQL commands for creating the user ad are:

> create user ad;

> grant all privileges on * . * to ad;

By granting ad all privileges, there’s no need to supply a password to use the account. This is obviously unsafe, but simplifies matters during testing. Now an empty database called numsdb can be created at the command line:

$ mysql -u ad -e “create database numsdb;”

This database will hold the nums2 table and its data uploaded from nums2.sql:

$ mysql -u ad numsdb < nums2.sql

Check the resulting table by querying it:

$ mysql -u ad -e “use numsdb; select * from nums2;”

The output is:

+-----------------+------+----------------+------+------+

| Name | Year | House | Room | Cost | +-----------------+------+----------------+------+------+

| Sally Whittaker | 2018 | Mccarren House | 312 | 3.75 |

| Belinda Jameson | 2017 | Cushing House | 148 | 3.52 |

| Jeff Smith | 2018 | Prescott House | 17-D | 3.20 |

| Sandy Allen | 2019 | Oliver House | 108 | 3.48 | +-----------------+------+----------------+------+------+

The preceding steps converted a CSV file into SQL and imported it into MYSQL as a numsdb database holding a single nums2 table. The final step is to link

Libreoffic­e Base to that external database. Base connects to MYSQL via a JDBC Driver called libmysql-java. This is installed on Linux in the usual way, but its JAR file must also be added to the class path of Libreoffic­e’s JRE. This is done via Libreoffic­e’s

Tools > Options menu. Navigate to Libreoffic­e > Advanced, and press the Class Path button. Select Add Archive and select the path /usr/share/java/mysql. jar. Then restart Libreoffic­e.

The newly installed JDBC driver can now be linked to the numsdb database by completing Base’s Database Wizard. On the first screen click the ‘Connect to an existing database’ button, and choose ‘MYSQL’ in the drop-down list. On the second screen, select the ‘Connect using JDBC’ button. Perhaps the trickiest stage is setting up the JDBC connection on the third screen, which is shown in the screenshot top-left.

 ??  ?? Figure 1: Part of a Markdown file showing some maths and a table, and the correspond­ing Writer ODT generated by Pandoc.
Figure 1: Part of a Markdown file showing some maths and a table, and the correspond­ing Writer ODT generated by Pandoc.
 ??  ??
 ??  ?? A PNG image of a weighted directed graph generated by applying ‘dot’ to the graph.dot Graphviz file.
A PNG image of a weighted directed graph generated by applying ‘dot’ to the graph.dot Graphviz file.
 ??  ?? A UML state diagram generated by Plantuml. The ‘-t’ option was set to ‘png’, but other formats are possible, such as ‘svg’.
A UML state diagram generated by Plantuml. The ‘-t’ option was set to ‘png’, but other formats are possible, such as ‘svg’.
 ??  ?? A PNG image generated by first applying ‘pic2plot’ to boxes.pic, then passing it through Imagemagic­k’s convert to trim and scale it.
A PNG image generated by first applying ‘pic2plot’ to boxes.pic, then passing it through Imagemagic­k’s convert to trim and scale it.
 ??  ?? The nums2 table in the numsdb MYSQL database, as displayed inside Libreoffic­e Base after opening the numsdb.odb file.
The nums2 table in the numsdb MYSQL database, as displayed inside Libreoffic­e Base after opening the numsdb.odb file.
 ??  ?? Base’s Database Wizard in action. All of the configurat­ion steps are explained in detail at http://bit.ly/lxf258sql.
Base’s Database Wizard in action. All of the configurat­ion steps are explained in detail at http://bit.ly/lxf258sql.

Newspapers in English

Newspapers from Australia