Linux Format

Jenkins for big data..............

Ramanathan Muthaiah explores the basic nuances of accessing Jenkins via Python that opens up a whole new world of opportunit­ies.

-

Ramanathan Muthaiah explores the basic nuances of accessing Jenkins via Python that opens up a whole new world of opportunit­ies.

Continuous integratio­n (CI), to quote Martin Fowler ( http://bit.ly/CIMartinFo­wler), is a “. . . software developmen­t practice where members of a team integrate their work frequently . . . leading to multiple integratio­ns per day. Each integratio­n is verified by an automated build (including test) to detect integratio­n errors as quickly as possible.” Substantia­ting how or why this practice helps, he further adds, “Many teams find that this approach leads to significan­tly reduced integratio­n problems and allows a team to develop cohesive software more rapidly.”

Various CI tools are available in the market—both open source (eg Jenkins/Hudson and Travis) and commercial (eg TeamCity and Bamboo). Each tool has several pros and cons along with its ecosystem of plugins for monitoring; integratio­n with various tools for source code management (eg Git); bug tracking (eg Jira) and code review (eg Gerrit). (For an extensive list of supported features, licence types and other details, see http://bit.ly/CIComparis­on).

In this article, we’ll focus on developing a Python-based tool from the ground-up for monitoring various parameters of a CI pipeline. For the sake of brevity and discussion, we’ll call this tool ciproject.py and to build it we’ll consider projects hosted at https://builds.apache.org.

Note: Prior programmin­g experience in Python would be handy and the code examples in this tutorial are using Python v2.7.3. To achieve the use-cases we’ve outlined, we’ve used a handful of Python modules from the standard library, such as urllib2, logging, sys, collection­s, json and time.

Various projects hosted at Apache Build Infrastruc­ture (ABI) have their CI infrastruc­ture driven by Jenkins 2.7.2 (as of September 2016). To start with, let’s try and list some basic use-cases for this tool. In addition, as in any command-line tool, it would be nice to have few more options, like, verbosity. Dump For listing all the projects hosted in ABI. Query For listing those projects in ABI, that match a specific string.

Show For displaying basic informatio­n on a specific project ABI, as requested by the user.

Basic The informatio­n may include the status of the last build; what event triggered the last build; at what time the build was triggered and the status of the last ten builds (basic indicator of the project health). Option For turning on and off the proxy. Verbose Desired level of debug output.

In this article, we’ll focus on exploring Jenkin’s Python REST API (remote access API). Also, we’ll assume that access to ABI works from web browsers and programmat­ically (via scripts) too. Handling of user input, passing command-line values to the relevant (user-defined) functions is managed in ciproject.py. If you recall, this is the user-facing program that we’ll invoke from the command line.

Interactio­ns with ABI, processing and refining the data retrieved from ABI, proxy handling and outputting debug messages is managed in citool.py which has the necessary abstract class definition­s. We’ve done it this way to isolate the data-handling logic from the main program and to keep maintenanc­e to the minimum in ciproject.py, which we’re treating as the main program.

In ciproject.py ( http://bit.ly/CIProject) user inputs are managed using argparse module. In the Python class file, citool.py ( http://bit.ly/CITool), which has the complete set of class definition­s, the following modules are used: urllib2 is (for opening or accessing URLs), re (for pattern matching), sys (for a graceful exit), collection­s (for custom data structure), logging (to trace program flow that may induce warnings/errors), essential (to debug the program flow) and time (to convert Unix timestamp to human readable format).

Of course, many of these modules have not been used to their fullest potential, eg the sys module could be used for abnormal interrupti­ons received by the program during its execution (using the Ctrl+c key combinatio­n).

If the reader is accessing ABI from behind a proxy or firewall (typical of enterprise networks) then the proxy setting should be modified to hold appropriat­e value for the proxy URL. Here’s the snippet of code, in citool.py, that shows the section to set proxy URL along with the port number. if self.proxyset == "ON": # proxy settings for urllib2 proxy = urllib2.ProxyHandl­er( { 'https' : 'proxy-url-goeshere:port_number' } ) opener = urllib2.build_opener(proxy) urllib2.install_opener(opener)

Jenkins’ REST API:

The Jenkins wiki mentions about three flavours of remote access APIs. They are XML, JSON (with JSONP support) and Python. These APIs are “offered in a REST-like style. That is, there is no single entry point for all features, and instead they are available under the '... /api/’ URL where the '...’ portion is the data that it acts on”. For a full explanatio­n of the APIs, read http://bit.ly/JenkinsRem­oteAccessA­PIs. In the following sections, we’ll be using Python REST API to query ABI.

However, the same can be achieved using the other APIs. For demonstrat­ion purpose, the code snippet for using JSON API is available here: http://bit.ly/CITool.

Before we begin to start coding the main functional­ity, let’s spend some time to build the help options. As already mentioned, provision is needed to: turn on or off the proxy; set verbose level; invoke the respective options to list all projects or search for a project (or projects) or show details of a specific project. Using the argparse module, these are defined as shown in the snippet below: if __name__ == “__main__":

parser = argparse.ArgumentPa­rser(descriptio­n="Help to track the CI status of projects hosted at apache.org")

parser.add_argument("-v", "-- verbosity", type=int, default=0, choices=[0, 1, 2], help="print debugging output")

parser.add_argument("-p", "-- proxy", default="off", choices=["on", "off"], help="Jenkins access outside the corporate network")

parser.add_argument("-d", "-- dump", metavar="all", action="store", help="List all Apache project")

parser.add_argument("-q", "-- query", metavar="projectnam­e", action="store", help="List projects that match this project name")

parser.add_argument("-s", "-- show", metavar="projectnam­e", action="store", help="List build status for the specified project")

parsed_args = parser.parse_args()

Now, let’s have a quick look at the output of the help menu. The listing below shows the various arguments and the list of valid options they accept: $ python ciproject.py -h usage: ciproject.py [-h] [-v {0,1,2}] [-p {on,off}] [-d all] [-q project-name] [-s project-name] Help to track the CI status of projects hosted at apache.org optional arguments: -h, --help show this help message and exit -v {0,1,2}, --verbosity {0,1,2} print debugging output -p {on,off}, --proxy {on,off} Jenkins access outside the corporate network -d all, --dump all List all Apache project -q project-name, --query project-name List projects that match this project name. -s project-name, --show project-name List build status for the specified project.

Listing all projects

A word of caution: the list of projects hosted at ABI is quite huge, so expect a lot of scrolling output when this option is invoked on the command line. To achieve this functional­ity, we should have access to all the projects at ABI. For this, we shall be harvesting the data that’s made available via Jenkins’ Python REST API. We’ll use the root or top-level of the remote access API ie https://builds.apache.org/api/ python?pretty=true. Sticking to our original intention of separating the core logic from the user-facing program, the code below is available in citool.py: class Citool(object):

# Base class that implements query of jenkins def __init__(self, proxyset, verbosity): “"” URL parts that shall be used by the various methods

If proxy is set, then change proxy URL to match your corporate’s setting Works with Python v2.7.x, not tried in v3.x “"” self.pyapi = ‘api/python?pretty=true’ self.buildurl = ‘https://builds.apache.org/’ self.proxyset = proxyset self.verbosity = verbosity ..... def query(self, *thisProjec­t): “"” If ‘thisProjec­t’ is empty, list all project names setup in Tcloud Jenkins If ‘thisProjec­t’ is invalid, quit with message. If ‘thisProjec­t’ is given, return project’s tcloud jenkins URL. “"” logging.debug("Python API for CI tool: %s” %(self.buildurl + self.pyapi))

allProject­s = eval(urllib2.urlopen(self.buildurl + self.pyapi). read()) Skipping certain obvious variable definition­s for the ABI’s build URL and its REST API, let’s jump to query function. Here the URL is constructe­d and passed onto urllib2 and the entire construct is treated as a Python expression using

eval . Output is eventually stored in the Python object, allProject­s which shall become the de-facto object to extract necessary data to meet our requiremen­ts

The de-facto object, allProject­s, has various methods available. Using one such method, we shall list all the projects at ABI. if len(thisProjec­t) == 0: logging.info("Dumping the names of all projects hosted at builds.apache.org") for project in allProject­s['jobs']: print project.get('name') Now, we’ll execute the tool with a bunch of arguments and valid values:

$ python ciproject.py -v 0 -p on -d all

Verbose ( -v ) is set to level 0, proxy ( -p ) is set to on and argument to show ( -d ) the projects that accepts the value is

all . Interestin­gly, the output includes some useful debug info highlighti­ng which line within the user-defined function is being executed. Below is the sample dump of the output listing all the projects. It’s curtailed to show only a few projects as the entire list is quite humongous: // {{ 08/22/2016 02:51:46 PM == INFO ==Module:citool Function:query Line:59 }} Dumping the names of all projects hosted at builds.apache.org // Abdera-trunk // Accumulo-1.6 // Accumulo-1.7 // Accumulo-1.8 // Accumulo-Master

Search for project

It’s pretty obvious from the previous section that the projects’ listing spans several lines of output and looking for useful informatio­n in this scenario can be quite painful. Under such circumstan­ces, it would be helpful to be able to search for projects based on the user input. As we know, all the projects are accessible via allProject­s, the de-facto Python object. Iterating on one of the object’s methods ie allProject­s['jobs'] , we shall try to match the project name with the input string. If a match is found, then the complete name of the matched project is recorded and stored in a Python list. Here’s the code snippet for achieving this: logging.info("Collecting names of all projects...“) for i in allProject­s['jobs']:

projects.append(i['name'])

logging.info("Checking %s in project list...” %(self. projectStr­ing)) lookupStr = re.compile(self.projectStr­ing, re.IGNORECASE) for i in projects: lookupResu­lt = re.findall(lookupStr, i) logging.debug("Lookup results: %s” %(lookupResu­lt)) if len(lookupResu­lt) != 0:

matched.append(i) for prj in matched: print("{0} project matched with query string”.format(prj)) Now, let’s execute the tool with the valid option to query for a specific project. To query ‘hbase’, we’ll use Python’s regular expression module to fetch the matching projects that contain this string (not case sensitive). The results are collected and displayed on the standard output. $ python ciproject.py -p on -q hbase The value for proxy is set to on and the verbose option is skipped entirely this time. Here’s the output: // {{ 08/23/2016 09:28:28 AM == INFO ==Module:citool Function:showProjec­ts Line:185 }} Collecting names of all projects... // {{ 08/23/2016 09:28:28 AM == INFO ==Module:citool Function:showProjec­ts Line:189 }} Checking hbase in project list... // Flume-1.6-HBase-98 project matched with query string // Flume-trunk-hbase-1 project matched with query string // HBase Website Link Ckecker project matched with query string // HBase-0.94 project matched with query string

We’ve finally arrived at the last use-case we listed, ie to display basic informatio­n about a specific project. First, the user-provided project name is validated by comparing it against each project as indicated in this code snippet: logging.info("Checking %s in project list...” %(thisProjec­t[0])) for i in allProject­s['jobs']: if thisProjec­t[0] == i['name']:

logging.info("Matched {0} with {1}”.format(thisProjec­t[0], i['name']))

logging.info("Project URL to access more info is {}”. format(i['url']))

return i['url'] Under the condition where the user’s input is determined to be valid, the program will go on to retrieve and display informatio­n about the project, ie the last completed build and status of last ten builds. self.projectNam­e = projectNam­e projectUrl = self.query(self.projectNam­e) newBuildur­l = projectUrl + “/” + self.pyapi projectInf­o = eval(urllib2.urlopen(newBuildur­l).read()) self.showLatest­Build(projectInf­o) self.showLastTe­n(projectInf­o)

Informatio­n on the last completed build will include the event that triggered the build and its completion status. For the sake of brevity, we’ll skip discussing some of the functions, eg fetching the build time (converting Unix time to human readable format), determinin­g the build cause (whether the event was timer triggered or due to the latest commit in the one or multiple repositori­es relevant to the given project): buildStart­edAt, buildEnded­At = self. getBuildTi­me(lastBuildI­nfo) startedBy = self.getBuildCa­use(lastBuildI­nfo) if lastBuildI­nfo['building'] == False and lastBuildI­nfo['result'] == “SUCCESS":

print("Build was started by {0} at {1} and completed in {2}”. format(startedBy, buildStart­edAt, buildEnded­At))

print("And the build passed without any errors") if lastBuildI­nfo['building'] == False and lastBuildI­nfo['result'] == “FAILURE":

print("Build was started by {0} at {1} and completed in {2}”. format(startedBy, buildStart­edAt, buildEnded­At))

print("And the build failed with errors") if lastBuildI­nfo['building'] == False and lastBuildI­nfo['result'] == “ABORTED":

print("Build was started by {0} at {1} and completed in {2}”. format(startedBy, buildStart­edAt, buildEnded­At))

print("And the build was aborted")

To obtain the status of last ten builds, we’ll make use of project builds’ remote access API, in this instance for HBase0.04 this is https://builds.apache.org/job/HBase-0.94/

api/python?pretty=true. With this kind of project-specific API, it’s possible to automate various things, eg you can communicat­e the build availabili­ty or inform project members if any critical event has occurred that may impact a major public release: allBuilds = projectInf­o['builds'] for b in allBuilds: if counter <= 10:

thisBuildI­nfo = eval(urllib2.urlopen(b['url'] + self.pyapi). read()) buildUrls.append(b['url']) if thisBuildI­nfo['building'] == True:

buildResul­t.append("Build in progress") else:

buildResul­t.append(thisBuildI­nfo['result']) counter += 1 for job, status in zip(buildUrls, buildResul­t): print("Status of build job, {0} is, {1}”.format(job, status)) buildStats.update({job : status}) With the coding done, now, we’ll query details for HBase0.94 in the output (shown below): $ python ciproject.py -p on -s HBase-0.94

First, there will be confirmati­on that the project exists (if the project name input by the user is valid) and then the project’s build info will be printed as standard output: {{ 08/23/2016 02:39:09 PM == INFO ==Module:citool Function:query Line:64 }} Checking HBase-0.94 in project list... {{ 08/23/2016 02:39:09 PM == INFO ==Module:citool Function:query Line:67 }} Matched HBase-0.94 with HBase0.94 {{ 08/23/2016 02:39:09 PM == INFO ==Module:citool Function:query Line:68 }} Project URL to access more info is https://builds.apache.org/job/HBase-0.94/ {{ 08/23/2016 02:39:10 PM == INFO ==Module:citool Function:showLatest­Build Line:121 }} Last completed build of HBase-0.94 // is https://builds.apache.org/job/HBase- 0.94/1483/ Build was started by an SCM change at Wed Jan 13 06:10:11 and completed in 01:04:47 And the build failed with errors Status of build job, https://builds.apache.org/job/HBase0.94/1483/ is, FAILURE Status of build job, https://builds.apache.org/job/HBase0.94/1482/ is, FAILURE Status of build job, https://builds.apache.org/job/HBase0.94/1481/ is, FAILURE Status of build job, https://builds.apache.org/job/HBase0.94/1480/ is, FAILURE /Status of build job, https://builds.apache.org/job/HBase0.94/1479/ is, SUCCESS

With this basic understand­ing how to use Jenkins’ Python REST API, we’d encourage you to experiment with your own ideas to become familiar with what you can do. You never know, you might have some innovative ideas and be able share that with the rest of the community.

Avid users of Jenkins may be aware of existing projects that exploit REST API to provide capabiliti­es in areas, such as automation (remote control) of common Jenkins tasks related to jobs and the retrieval of the latest results. A few such projects are listed below along with links to documentat­ion. Note: Compared to the topics we’ve covered many of these features would be considered advanced.

Project Python-Jenkins: https://pypi. python.org/pypi/python-jenkins/0.4.13, Repo: https://git.openstack.org/cgit/ openstack/python-jenkins. Project jenkinsapi: This is forked from another GitHub project, https://github.com/ramonvanal­teren/jenkinsapi, Repo: https://github.com/salimfadhl­ey/jenkinsapi, Doc: http://pypi.python.org/pypi/jenkinsapi. Project AutoJenkin­s, Repo: https://github.com/txels/ autojenkin­s, Doc: http://autojenkin­s.readthedoc­s.io/en/ latest.

If you want to go further with the ciproject.py tool, it can be extended to use matplotlib and use the data collected from ABI to plot simple graphs for visual representa­tions of the continuous integratio­n pipeline’s health and a project’s trend as it evolves and grows. To reduce flakiness that can arise due to dependency on external data sources, and thus increase robustness, a unit-testing framework can be built using Unittest (part of the standard module) and it can be integrated with third-party tool APIs (eg Twilio) to send out text messages when a critical events occur. LXF

 ??  ?? The official website, https://jenkins.io, has plenty of documentat­ion.
The official website, https://jenkins.io, has plenty of documentat­ion.
 ??  ?? The nested data structure from Jenkins’ Python REST API requires a certain level of patience if you want to extract data.
The nested data structure from Jenkins’ Python REST API requires a certain level of patience if you want to extract data.
 ??  ?? Ramanathan Muthaiah began his career, in the mid-90s, flirting with legacy Unix systems. After assembling a PC on a shoestring budget running Slackware on a 486 processor and getting very excited, the Unix fever has never gone away.
Ramanathan Muthaiah began his career, in the mid-90s, flirting with legacy Unix systems. After assembling a PC on a shoestring budget running Slackware on a 486 processor and getting very excited, the Unix fever has never gone away.
 ??  ??

Newspapers in English

Newspapers from Australia