Jenkins for big data..............
Ramanathan Muthaiah explores the basic nuances of accessing Jenkins via Python that opens up a whole new world of opportunities.
Ramanathan Muthaiah explores the basic nuances of accessing Jenkins via Python that opens up a whole new world of opportunities.
Continuous integration (CI), to quote Martin Fowler ( http://bit.ly/CIMartinFowler), is a “. . . software development practice where members of a team integrate their work frequently . . . leading to multiple integrations per day. Each integration is verified by an automated build (including test) to detect integration errors as quickly as possible.” Substantiating how or why this practice helps, he further adds, “Many teams find that this approach leads to significantly reduced integration problems and allows a team to develop cohesive software more rapidly.”
Various CI tools are available in the market—both open source (eg Jenkins/Hudson and Travis) and commercial (eg TeamCity and Bamboo). Each tool has several pros and cons along with its ecosystem of plugins for monitoring; integration with various tools for source code management (eg Git); bug tracking (eg Jira) and code review (eg Gerrit). (For an extensive list of supported features, licence types and other details, see http://bit.ly/CIComparison).
In this article, we’ll focus on developing a Python-based tool from the ground-up for monitoring various parameters of a CI pipeline. For the sake of brevity and discussion, we’ll call this tool ciproject.py and to build it we’ll consider projects hosted at https://builds.apache.org.
Note: Prior programming experience in Python would be handy and the code examples in this tutorial are using Python v2.7.3. To achieve the use-cases we’ve outlined, we’ve used a handful of Python modules from the standard library, such as urllib2, logging, sys, collections, json and time.
Various projects hosted at Apache Build Infrastructure (ABI) have their CI infrastructure driven by Jenkins 2.7.2 (as of September 2016). To start with, let’s try and list some basic use-cases for this tool. In addition, as in any command-line tool, it would be nice to have few more options, like, verbosity. Dump For listing all the projects hosted in ABI. Query For listing those projects in ABI, that match a specific string.
Show For displaying basic information on a specific project ABI, as requested by the user.
Basic The information may include the status of the last build; what event triggered the last build; at what time the build was triggered and the status of the last ten builds (basic indicator of the project health). Option For turning on and off the proxy. Verbose Desired level of debug output.
In this article, we’ll focus on exploring Jenkin’s Python REST API (remote access API). Also, we’ll assume that access to ABI works from web browsers and programmatically (via scripts) too. Handling of user input, passing command-line values to the relevant (user-defined) functions is managed in ciproject.py. If you recall, this is the user-facing program that we’ll invoke from the command line.
Interactions with ABI, processing and refining the data retrieved from ABI, proxy handling and outputting debug messages is managed in citool.py which has the necessary abstract class definitions. We’ve done it this way to isolate the data-handling logic from the main program and to keep maintenance to the minimum in ciproject.py, which we’re treating as the main program.
In ciproject.py ( http://bit.ly/CIProject) user inputs are managed using argparse module. In the Python class file, citool.py ( http://bit.ly/CITool), which has the complete set of class definitions, the following modules are used: urllib2 is (for opening or accessing URLs), re (for pattern matching), sys (for a graceful exit), collections (for custom data structure), logging (to trace program flow that may induce warnings/errors), essential (to debug the program flow) and time (to convert Unix timestamp to human readable format).
Of course, many of these modules have not been used to their fullest potential, eg the sys module could be used for abnormal interruptions received by the program during its execution (using the Ctrl+c key combination).
If the reader is accessing ABI from behind a proxy or firewall (typical of enterprise networks) then the proxy setting should be modified to hold appropriate value for the proxy URL. Here’s the snippet of code, in citool.py, that shows the section to set proxy URL along with the port number. if self.proxyset == "ON": # proxy settings for urllib2 proxy = urllib2.ProxyHandler( { 'https' : 'proxy-url-goeshere:port_number' } ) opener = urllib2.build_opener(proxy) urllib2.install_opener(opener)
Jenkins’ REST API:
The Jenkins wiki mentions about three flavours of remote access APIs. They are XML, JSON (with JSONP support) and Python. These APIs are “offered in a REST-like style. That is, there is no single entry point for all features, and instead they are available under the '... /api/’ URL where the '...’ portion is the data that it acts on”. For a full explanation of the APIs, read http://bit.ly/JenkinsRemoteAccessAPIs. In the following sections, we’ll be using Python REST API to query ABI.
However, the same can be achieved using the other APIs. For demonstration purpose, the code snippet for using JSON API is available here: http://bit.ly/CITool.
Before we begin to start coding the main functionality, let’s spend some time to build the help options. As already mentioned, provision is needed to: turn on or off the proxy; set verbose level; invoke the respective options to list all projects or search for a project (or projects) or show details of a specific project. Using the argparse module, these are defined as shown in the snippet below: if __name__ == “__main__":
parser = argparse.ArgumentParser(description="Help to track the CI status of projects hosted at apache.org")
parser.add_argument("-v", "-- verbosity", type=int, default=0, choices=[0, 1, 2], help="print debugging output")
parser.add_argument("-p", "-- proxy", default="off", choices=["on", "off"], help="Jenkins access outside the corporate network")
parser.add_argument("-d", "-- dump", metavar="all", action="store", help="List all Apache project")
parser.add_argument("-q", "-- query", metavar="projectname", action="store", help="List projects that match this project name")
parser.add_argument("-s", "-- show", metavar="projectname", action="store", help="List build status for the specified project")
parsed_args = parser.parse_args()
Now, let’s have a quick look at the output of the help menu. The listing below shows the various arguments and the list of valid options they accept: $ python ciproject.py -h usage: ciproject.py [-h] [-v {0,1,2}] [-p {on,off}] [-d all] [-q project-name] [-s project-name] Help to track the CI status of projects hosted at apache.org optional arguments: -h, --help show this help message and exit -v {0,1,2}, --verbosity {0,1,2} print debugging output -p {on,off}, --proxy {on,off} Jenkins access outside the corporate network -d all, --dump all List all Apache project -q project-name, --query project-name List projects that match this project name. -s project-name, --show project-name List build status for the specified project.
Listing all projects
A word of caution: the list of projects hosted at ABI is quite huge, so expect a lot of scrolling output when this option is invoked on the command line. To achieve this functionality, we should have access to all the projects at ABI. For this, we shall be harvesting the data that’s made available via Jenkins’ Python REST API. We’ll use the root or top-level of the remote access API ie https://builds.apache.org/api/ python?pretty=true. Sticking to our original intention of separating the core logic from the user-facing program, the code below is available in citool.py: class Citool(object):
# Base class that implements query of jenkins def __init__(self, proxyset, verbosity): “"” URL parts that shall be used by the various methods
If proxy is set, then change proxy URL to match your corporate’s setting Works with Python v2.7.x, not tried in v3.x “"” self.pyapi = ‘api/python?pretty=true’ self.buildurl = ‘https://builds.apache.org/’ self.proxyset = proxyset self.verbosity = verbosity ..... def query(self, *thisProject): “"” If ‘thisProject’ is empty, list all project names setup in Tcloud Jenkins If ‘thisProject’ is invalid, quit with message. If ‘thisProject’ is given, return project’s tcloud jenkins URL. “"” logging.debug("Python API for CI tool: %s” %(self.buildurl + self.pyapi))
allProjects = eval(urllib2.urlopen(self.buildurl + self.pyapi). read()) Skipping certain obvious variable definitions for the ABI’s build URL and its REST API, let’s jump to query function. Here the URL is constructed and passed onto urllib2 and the entire construct is treated as a Python expression using
eval . Output is eventually stored in the Python object, allProjects which shall become the de-facto object to extract necessary data to meet our requirements
The de-facto object, allProjects, has various methods available. Using one such method, we shall list all the projects at ABI. if len(thisProject) == 0: logging.info("Dumping the names of all projects hosted at builds.apache.org") for project in allProjects['jobs']: print project.get('name') Now, we’ll execute the tool with a bunch of arguments and valid values:
$ python ciproject.py -v 0 -p on -d all
Verbose ( -v ) is set to level 0, proxy ( -p ) is set to on and argument to show ( -d ) the projects that accepts the value is
all . Interestingly, the output includes some useful debug info highlighting which line within the user-defined function is being executed. Below is the sample dump of the output listing all the projects. It’s curtailed to show only a few projects as the entire list is quite humongous: // {{ 08/22/2016 02:51:46 PM == INFO ==Module:citool Function:query Line:59 }} Dumping the names of all projects hosted at builds.apache.org // Abdera-trunk // Accumulo-1.6 // Accumulo-1.7 // Accumulo-1.8 // Accumulo-Master
Search for project
It’s pretty obvious from the previous section that the projects’ listing spans several lines of output and looking for useful information in this scenario can be quite painful. Under such circumstances, it would be helpful to be able to search for projects based on the user input. As we know, all the projects are accessible via allProjects, the de-facto Python object. Iterating on one of the object’s methods ie allProjects['jobs'] , we shall try to match the project name with the input string. If a match is found, then the complete name of the matched project is recorded and stored in a Python list. Here’s the code snippet for achieving this: logging.info("Collecting names of all projects...“) for i in allProjects['jobs']:
projects.append(i['name'])
logging.info("Checking %s in project list...” %(self. projectString)) lookupStr = re.compile(self.projectString, re.IGNORECASE) for i in projects: lookupResult = re.findall(lookupStr, i) logging.debug("Lookup results: %s” %(lookupResult)) if len(lookupResult) != 0:
matched.append(i) for prj in matched: print("{0} project matched with query string”.format(prj)) Now, let’s execute the tool with the valid option to query for a specific project. To query ‘hbase’, we’ll use Python’s regular expression module to fetch the matching projects that contain this string (not case sensitive). The results are collected and displayed on the standard output. $ python ciproject.py -p on -q hbase The value for proxy is set to on and the verbose option is skipped entirely this time. Here’s the output: // {{ 08/23/2016 09:28:28 AM == INFO ==Module:citool Function:showProjects Line:185 }} Collecting names of all projects... // {{ 08/23/2016 09:28:28 AM == INFO ==Module:citool Function:showProjects Line:189 }} Checking hbase in project list... // Flume-1.6-HBase-98 project matched with query string // Flume-trunk-hbase-1 project matched with query string // HBase Website Link Ckecker project matched with query string // HBase-0.94 project matched with query string
We’ve finally arrived at the last use-case we listed, ie to display basic information about a specific project. First, the user-provided project name is validated by comparing it against each project as indicated in this code snippet: logging.info("Checking %s in project list...” %(thisProject[0])) for i in allProjects['jobs']: if thisProject[0] == i['name']:
logging.info("Matched {0} with {1}”.format(thisProject[0], i['name']))
logging.info("Project URL to access more info is {}”. format(i['url']))
return i['url'] Under the condition where the user’s input is determined to be valid, the program will go on to retrieve and display information about the project, ie the last completed build and status of last ten builds. self.projectName = projectName projectUrl = self.query(self.projectName) newBuildurl = projectUrl + “/” + self.pyapi projectInfo = eval(urllib2.urlopen(newBuildurl).read()) self.showLatestBuild(projectInfo) self.showLastTen(projectInfo)
Information on the last completed build will include the event that triggered the build and its completion status. For the sake of brevity, we’ll skip discussing some of the functions, eg fetching the build time (converting Unix time to human readable format), determining the build cause (whether the event was timer triggered or due to the latest commit in the one or multiple repositories relevant to the given project): buildStartedAt, buildEndedAt = self. getBuildTime(lastBuildInfo) startedBy = self.getBuildCause(lastBuildInfo) if lastBuildInfo['building'] == False and lastBuildInfo['result'] == “SUCCESS":
print("Build was started by {0} at {1} and completed in {2}”. format(startedBy, buildStartedAt, buildEndedAt))
print("And the build passed without any errors") if lastBuildInfo['building'] == False and lastBuildInfo['result'] == “FAILURE":
print("Build was started by {0} at {1} and completed in {2}”. format(startedBy, buildStartedAt, buildEndedAt))
print("And the build failed with errors") if lastBuildInfo['building'] == False and lastBuildInfo['result'] == “ABORTED":
print("Build was started by {0} at {1} and completed in {2}”. format(startedBy, buildStartedAt, buildEndedAt))
print("And the build was aborted")
To obtain the status of last ten builds, we’ll make use of project builds’ remote access API, in this instance for HBase0.04 this is https://builds.apache.org/job/HBase-0.94/
api/python?pretty=true. With this kind of project-specific API, it’s possible to automate various things, eg you can communicate the build availability or inform project members if any critical event has occurred that may impact a major public release: allBuilds = projectInfo['builds'] for b in allBuilds: if counter <= 10:
thisBuildInfo = eval(urllib2.urlopen(b['url'] + self.pyapi). read()) buildUrls.append(b['url']) if thisBuildInfo['building'] == True:
buildResult.append("Build in progress") else:
buildResult.append(thisBuildInfo['result']) counter += 1 for job, status in zip(buildUrls, buildResult): print("Status of build job, {0} is, {1}”.format(job, status)) buildStats.update({job : status}) With the coding done, now, we’ll query details for HBase0.94 in the output (shown below): $ python ciproject.py -p on -s HBase-0.94
First, there will be confirmation that the project exists (if the project name input by the user is valid) and then the project’s build info will be printed as standard output: {{ 08/23/2016 02:39:09 PM == INFO ==Module:citool Function:query Line:64 }} Checking HBase-0.94 in project list... {{ 08/23/2016 02:39:09 PM == INFO ==Module:citool Function:query Line:67 }} Matched HBase-0.94 with HBase0.94 {{ 08/23/2016 02:39:09 PM == INFO ==Module:citool Function:query Line:68 }} Project URL to access more info is https://builds.apache.org/job/HBase-0.94/ {{ 08/23/2016 02:39:10 PM == INFO ==Module:citool Function:showLatestBuild Line:121 }} Last completed build of HBase-0.94 // is https://builds.apache.org/job/HBase- 0.94/1483/ Build was started by an SCM change at Wed Jan 13 06:10:11 and completed in 01:04:47 And the build failed with errors Status of build job, https://builds.apache.org/job/HBase0.94/1483/ is, FAILURE Status of build job, https://builds.apache.org/job/HBase0.94/1482/ is, FAILURE Status of build job, https://builds.apache.org/job/HBase0.94/1481/ is, FAILURE Status of build job, https://builds.apache.org/job/HBase0.94/1480/ is, FAILURE /Status of build job, https://builds.apache.org/job/HBase0.94/1479/ is, SUCCESS
With this basic understanding how to use Jenkins’ Python REST API, we’d encourage you to experiment with your own ideas to become familiar with what you can do. You never know, you might have some innovative ideas and be able share that with the rest of the community.
Avid users of Jenkins may be aware of existing projects that exploit REST API to provide capabilities in areas, such as automation (remote control) of common Jenkins tasks related to jobs and the retrieval of the latest results. A few such projects are listed below along with links to documentation. Note: Compared to the topics we’ve covered many of these features would be considered advanced.
Project Python-Jenkins: https://pypi. python.org/pypi/python-jenkins/0.4.13, Repo: https://git.openstack.org/cgit/ openstack/python-jenkins. Project jenkinsapi: This is forked from another GitHub project, https://github.com/ramonvanalteren/jenkinsapi, Repo: https://github.com/salimfadhley/jenkinsapi, Doc: http://pypi.python.org/pypi/jenkinsapi. Project AutoJenkins, Repo: https://github.com/txels/ autojenkins, Doc: http://autojenkins.readthedocs.io/en/ latest.
If you want to go further with the ciproject.py tool, it can be extended to use matplotlib and use the data collected from ABI to plot simple graphs for visual representations of the continuous integration pipeline’s health and a project’s trend as it evolves and grows. To reduce flakiness that can arise due to dependency on external data sources, and thus increase robustness, a unit-testing framework can be built using Unittest (part of the standard module) and it can be integrated with third-party tool APIs (eg Twilio) to send out text messages when a critical events occur. LXF