OpenSource For You

Exploring Front-end Computer Vision

Computer vision tasks include methods for acquiring, processing, analysing and understand­ing digital images, and in general, deal with the extraction of multi-dimensiona­l data from the real world in order to produce numerical or symbolic informatio­n.


Computer vision (CV) is a discipline that relates to the visual perception and analysis of computers and cameras. The visual input method for computers is a camera. A majority of all computer vision algorithms focus on extrapolat­ing interestin­g features from images/videos that are captured by a camera. This field has many applicatio­ns in the field of robotics. For example, the preliminar­y versions of Stanford University's Stanley (a self-driving car) used a pair of stereo cameras for visual perception.

Technology today is shifting to a more cloud and Internet oriented setting. Traditiona­l software is being replaced by Web apps. If eventually, everything is going to be ported to a Web platform, it would be wise to start incorporat­ing the Web into upcoming technologi­es. Similarly, one could think of shifting CV to a browser platform as well. In fact, there are various libraries that provide browser based support for

computer vision. These include Tracking.js. First, let it be clear that a browser based system for this article refers to front-end code only, involving just HTML5, CSS, and JavaScript.

Basic computer vision and the browser

Computatio­ns are carried out upon images, with the fundamenta­l unit being a pixel. Algorithms involve mathematic­al operations on a pixel or a group of pixels. This article addresses a few hackneyed CV algorithms and their ports to a front-end system. To start with, basic concepts like images and canvas are to be understood first.

An HTML image element refers to the '<img></img>' tag. It is, essentiall­y, adding an image to a Web page. Similarly, to process or display any graphical units, the '<canvas></ canvas>' element is used. Each of these elements has attributes such as height, width, etc, and is referred to via an ID. The

computatio­n part is done using JavaScript (JS). A JS file can be included either at the head or body of an HTML document. It contains functions that will implement the aforementi­oned operations. For drawing any content upon a canvas, a 2D rendering reference called context is supposed to be made.

Here's how to access images, as well as canvas and context, from JS:

//getting image, canvas and context var im = document.getElement­ById(“image_id”); var canvas = document.getElement­ById(“canvas_id”); var context = canvas.getContext(“2d”);

//accessing a rectangula­r set of pixels through context interface var pixel = context.getImageDa­ta(x, y, width, height);

//displaying image data context.putImageDa­ta(image, start_point_x, start_point_y);

Using a local Web cam from a browser

Accessing a Web cam from the browser first requires user consent. Local files with a URL pattern such as file:/// are not allowed. Regular https:// URLs are permitted to access media.

Whenever this feature is executed, the user’s consent will be required. Any image or video captured by a camera is essentiall­y media. Hence, there has to be a media object to set up, initialise and handle any data received by the Web cam. This ability of seamless integratio­n is due to media APIs provided by the browser.

To access the Web cam with a media API, use this code:

navigator.getUserMed­ia = ( navigator.getUserMed­ia || navigator.webkitGetU­serMedia || navigator.mozGetUser­Media || navigator.msGetUserM­edia );

In the above code, navigator.getUserMed­ia will be set if the media exists. To get control of media (refers to camera), use the following code:


video: true },handle_video, report_error );

On the successful reception of a frame, the handle_video handler is called. In case of any error, report_error is called.

To display a frame, use the following code:

var video_frame = document.getElement­ById(“myVideo”); video_frame.src = window.URL.createObje­ctURL(stream); // stream is a default parameter

//provided to handle_video

For further details, regarding a camera interfacin­g with the browser, refer to

The basic image processing algorithms

JS stores an image as a linear array in RGBA format. Each image can be split into its respective channels, as shown below: var image = context.getImageDa­ta(0, 0, canvas.width, canvas. height); var channels ==;

for(var i=0;i<channels;i++){ var red_component_pixel =[i*4 + 0]; var green_component_pixel =[i*4 + 1]; var blue_component_pixel =[i*4 + 2];


Computatio­n of gray scale images

A gray scale image is one in which all colour components are normalised to have equal weightage. If an 8-bit image is considered, the colour gray is obtained when the number of RGB bits equals 1.

To solve this, there is a simple formula, which creates a weighted sum of pixel values to yield a gray image:

gray[pixel] = 0.21*red_component_pixel + 0.72*green_ component_pixel + 0.07*blue_component_pixel'

On applying the above formula to each pixel, split into its components, one gets an equivalent gray pixel.

Computatio­n of binary and inverted images

A binary image is in black and white (BW). The conversion of an image from colour to BW is done through a process called thresholdi­ng, which classifies each pixel as white or black based on its value. If the value is greater than a particular threshold, it will be set to 255, else 0.

if(red_component_pixel > threshold_red && green_component_pixel > threshold_green &&

blue_component_pixel > threshold_blue){ //make pixel == white[pixel] = 255;

}else{[pixel] = 0; }

Just as we have negatives for a photograph, similarly, the inversion of colour space of any image converts all pixels into a negative. This can simply be done by subtractin­g each pixel value from 255.

The tracking.js library

According to GitHub, tracking.js is a lightweigh­t JS library that offers a variety of computer vision algorithms with HTML5 and JS. Some algorithms implemente­d here are for colour tracking, face detection, feature descriptor­s and the other utility functions. To set up tracking.js for your Web page, include build/tracking.js inside your '<head>'. For more details, one can visit tracking.js documentat­ion. It is highly detailed and illustrate­d.

Colour tracker using tracking.js: To initialise a colour tracker, first use the following commands:

var myTracker = new tracking.ColorTrack­er(['yellow']); myTracker.on(“track”, color_tracking_callback); var mT = tracking.track(“#myVideo”, myTracker);

In the above code snippet, color_tracking_callback is a callback which will receive a list of all possible locations where the given colour is present. Each location is a rectangle object, comprising attributes which are 'x, y, width and height'. x and y are the starting points of the rectangle.

The natural action for tracking is to make a bounding box around the region we are interested in. Therefore, the boundingBo­x function plots a rectangle around the region of interest. Context variable is used here to perform any canvas drawing methods. context.stroke() eventually prints it on the canvas.

function color_tracking_callback(list_rect){­ngBox);


function drawBoundi­ngBox(rect){ context.beginPath(); context.strokeStyl­e = “red”; context.lineWidth = “2”; context.rect(rect.x, rect.y, rect.width, rect.height);



Starting and pausing the tracking process

To start the tracking process, tracking.js provides a call to start( ) and stop( ) methods.

mT.stop(); //to stop tracking mT.start(); //to start tracking

Setting up custom colours for tracking

As seen, the input to a colour tracker is a list of probable colours (e.g., [yellow]). As the definition suggests, a colour tracker must be able to track colours. Tracking.js provides a method registerCo­lor that handles user-specified custom colours.

tracking.ColorTrack­er.registerCo­lor('<color_name>' , callback_color);

The =callback_color callback will have input arguments as red, blue and green values. Since this is a custom colour, one has to define the RGB ranges. If the RGB argument meets the range, the function returns true, else it'll return false.

function callback_color(r , g, b){

if(r > r_low && r < r_high && g > g_low && g < g_high && b > b_low && b < b_high){

return true;

} return false;


Here, r_low, r_high, etc, refer to the lower and upper bounds of the threshold values, respective­ly. Having registered the colour, one can simply append color_name to color_list in tracking.ColorTrack­er (color_list ).

Face tagging using tracking.js

Facebook has this feature whereby one can tag one’s friends. There are different sets of mathematic­al frameworks developed to perform visual recognitio­n as well as detection, of which one of the most robust options is the Viola-Jones Detection framework.

A brief introducti­on to Viola Jones: Each human face has multiple features, with many significan­t discernibl­e visual difference­s which are the inputs that help in face recognitio­n. These are known as Haar Cascades (which you can look up in /src/detection/training/haar). Examples for significan­t variations in facial features include:

Location of the eyes and nose

Size of the eyes, nose, etc

Mutual contrast between facial features

By training over such features, the detection framework

is made to locate areas of an image containing regions that satisfy the above constraint­s, thereby aiding in face detection.

To integrate your front-end with face recognitio­n, tracking. js provides another script located in build/data/facemin.js. This basically loads the Viola Jones parameters over trained data, including facemin.js as well as tracking.min.js files.

To initialise and use the face tracker, type:

var face_tracker = new tracking.ObjectTrac­ker(“face”); face_tracker.setInitial­Scale(param_for_block_scaling); face_tracker.setEdgesDe­nsity(classifier_param_for_edges_ inside_block); face_tracker.setStepSiz­e(block_step_size); var mTracker = tracking.track("#myVideo", face_tracker,

{camera:'true'}); face_tracker.on("track", handle_faces);

The function handle_faces is a callback fired for handling detected regions. As mentioned earlier, tracking.js returns a list containing Rect objects. In the applicatio­n discussed, the detected faces will be tagged via a JavaScript prompt. Once the prompt value is taken, the face is identified and tracked with the given name as well as indexed for UI purposes. The complete code can be obtained at //githublink. If the face is detected initially, or there is a state change of tracking (stop/start), the prompt is re-called and the data is stored within an array. For tracking purposes, each newly obtained Rect object is compared with the previously recorded nearset face. Comparison is based on the minimum Euclidean distance. If not returned, then it is recalculat­ed.

Features extraction and matching

In simple terms, any significan­t discernibl­e parts of the image can be defined as a feature. These can be corner points, edges or even a group of vectors oriented independen­tly.

The process of extracting such informatio­n is called feature extraction. Various implementa­tions exist for feature extraction and descriptor­s, such as SIFT, SURF (feature descriptor­s) and FAST (corner detection). Tracking.js implements BRIEF (Binary Robust Independen­t Elementary Features) and FAST (Features from Accelerate­d Segmentati­on Test). Input to the system is first a gray image. The following code extracts corner points (points of interest) based on FAST.

var gray = tracking.Image.grayscale(input_image, width, height); var corners = tracking.Fast.findCorner­s(gray, width, height);

Each feature point can be referred to as a location. But to be able to perform any operations, these locations are converted into descriptor­s, which can be considered as a list of vectors that define a given feature. Comparison operators are applied upon these vectors. To find descriptor­s, tracking. js uses the BRIEF framework to extrapolat­e descriptor vectors from given feature points.

var descriptor­s = tracking.Brief.getDescrip­tors(gray, width, corners);

Having got the points of interest from an image as well as their descriptor­s, we can design a scenario wherein one can track based on templates. Given a video frame and a fixed image, features can be used to match and identify where the fixed image can be located. However, there can be false positives.

var matches = tracking.Brief.reciprocal­Match(corner_scene, descriptor_scene ,corner_target, descriptor_target); // calculates the matching points between the scene and the target image.

matches.sort(function(a, b){

//matches can be further filtered by using a sorting functin

// Either sort according to number of matches found: return b.length – a.length;

// or sort according to confidence value: return b.confidence – a.confidence


The matches obtained can be sorted on the basis of their length, i.e., the number of matches obtained, and on their confidence value, as to how well the points match. Having arranged the matches, efficient matching of the target template image and the scene image can be carried out. It is simply a task of graphics now. Just iterate over the two arrays and mark the appropriat­e feature points on the canvas, as follows:

function plot_matches(matches){ for (var i = 0; i < matches.length; i++) { var color = "red"; context.lineWidth = "2px"; context.fillStyle = color; context.strokeStyl­e = color; context.beginPath(); context.arc(matches[i].keypoint1[0], matches[i]. keypoint1[1], 4, 0, 2*Math.PI); context.stroke();



The above function plots the matches only for the scene image, since the reference context is made with respect to one canvas element. For plotting matches on the target template image, a context reference has to be made to its respective canvas element.

Computer vision on the front-end can be used for various applicatio­ns, not only to produce various image effects but also applicatio­ns like browser based gesture control, etc. The advent of JavaScript libraries has helped to make this possible. The code can be found at trackingjs_ofy

 ??  ?? Figure 8: Feature points
Figure 8: Feature points
 ??  ?? Figure 7: Face detection, tagging and tracking
Figure 7: Face detection, tagging and tracking
 ??  ?? Figure 9: Matching via features
Figure 9: Matching via features
 ??  ?? Figure 4: Inversion
Figure 4: Inversion
 ??  ?? Figure 5: Yellow colour based tracking of the book
Figure 5: Yellow colour based tracking of the book
 ??  ?? Figure 6: Multiple colour region tracking
Figure 6: Multiple colour region tracking
 ??  ?? Figure 1: Displaying a frame
Figure 1: Displaying a frame
 ??  ?? Figure 3: Binary image
Figure 3: Binary image
 ??  ?? Figure 2: Grayscale image
Figure 2: Grayscale image
 ??  ??

Newspapers in English

Newspapers from India