Computer vision expert, Scott Krig at Krig Research, explains how understanding ‘feature descriptor metrics’ can help optimise image processing in industrial vision
A better vision pipeline can be built for machine vision applications by understanding the various methods available for feature description in image processing. In addition, by using the best optimisation methods and tools, the system will be of higher quality, and be portable to newer and faster platforms.
‘Feature descriptor metrics’ are used to describe local features in images, as well as larger features in regions and whole images – examples include logos, facial features, or machine parts. They are designed using several approaches, such as ‘local feature metrics’, which are used to describe and recognise small groups of pixels, and in some cases to reconstruct entire images from sets of local features. Descriptor methods such as SIFT, SURF and ORB are used to create the feature signatures surrounding specific interest points.
Then there are ‘global feature metrics’, which cover whole images and larger regions, for example using statistical or structural texture metrics, or alternate ‘Basis Space metrics’ such as Fourier and Zernike moments. ‘Polygon Shape metrics’ describe polygon perimeter, centroid, area, and other shape moments.
A wide range of feature metrics are used in practice, (see reference [1] for a survey, taxonomy and analysis of over 100 methods, and figure 1 for a basic taxonomy), and colour, greyscale, depth maps and multispectral sensors are some of the imaging options upon which the descriptors are based.
Figure 1 showing a simplified, dimensional taxonomy of feature description attributes, see reference [1].
When designing the vision pipeline, the basic operations used in the algorithms will provide guidance for processor selection and choice of software language. For example, pixel processing or point processing is data parallel when each pixel is processed in the same way, therefore a data parallel machine, such as a VLIW DSP might be best suited. Many CPUs also have data parallel SIMD instructions that can be run on multiple CPU cores for data parallel optimisations. GPUs also have SIMD instructions, and if the GPU is powerful enough and contains enough data parallel SIMT engines, the GPU may be a good choice also.
Feature description is affected by image pre-processing, so both pre-processing and feature description should be considered together. For example, image pre-processing can be used to reduce noise, or normalise contrast, prior to feature description to enhance results. The fundamental operations in the vision pipeline are often best suited to a particular type of processor resource, such as a SIMD or VLIW machine, so software algorithms can be written accordingly to optimise the compute. See reference [1] chapter 8 for example vision pipelines, decomposed into hypothetical operations and compute resource assignments as per figure 2.
Figure 2 showing a simplified, example vision pipeline and assignment of operations to compute resources, see reference [1].
Optimisations can be based on a priori knowledge of the data patterns and data precision in the algorithms to choose the correct processor with the correct instruction set, indicating the right programming language. For example, data parallel languages such as computer graphics pixel shaders, and compute shaders such as OpenCL, can be used for coding data parallel parts of the algorithms.
Optimisations are also of course based on profiling the real system. And in some cases, optimisations are possible using off-the-shelf optimised libraries such as Intel Performance Primitives for standard image processing and computer vision algorithms, (see figure 3, and also reference [1], chapter 8, for more alternatives).
Figure 3 showing a sampling of vision pipeline software optimisation tools, taken from reference [1].
Define the imaging requirements
When selecting feature descriptors for an application, there are usually several alternatives to reach the same goals. By defining the system requirements and goals in terms of performance and robustness attributes, the correct image pre-processing methods can be used in combination with the right set of feature description metrics to achieve the goals.
The choice of feature description metrics is guided by the requirements of the system, and tuned by the results of training and testing. Usually, several feature metrics are used together as a multivariate descriptor to provide robust recognition, for example the colour, texture, size and shape of the object. In addition, local interest points and features such as SIFT and ORB may be useful as well. In most cases, there are several alternatives to reach the requirements by tuning the weights and constraints among related feature metrics to recognise objects correctly, so testing, profiling and optimisations are required to arrive at the final solution.
By defining the requirements completely, such as lighting, frame rate, frame resolution, pixel depth, as well as assembling a compete test set of ground truth data, objective criteria are then available to guide the selection of candidate feature metrics, and to tune the weights and recognition constraints as needed. Requirements must include robustness attributes, such as scale, rotation, blur, occlusion, and illumination. For example, scale invariance from 1x to 20x in size may be difficult to achieve, but scale invariance from 1x to 3x in size may be achievable.
Adding more robustness attributes to the system, such as rotational invariance to the scale invariance, increases complexity, both in choosing the ground truth data, as well as the testing and training. So it might be better to re-design the factory automation environment to reduce the range of variation in scale, illumination, and orientation of objects, rather than create complex robustness requirements that expect too much from the vision system.
In some applications, brute force feature matching from a database of features may be adequate for identification and tracking, and in other applications, machine learning methods are used to learn the best features, or simply to increase search and matching performance, (see reference [2] for more on machine learning).
References
[1] Krig, Scott, Computer Vision Metrics: Survey, Taxonomy and Analysis, Springer Apress, 2014.
[2] Prince, Simon, Computer Vision: Models, Learning, and Inference, Cambridge University Press, 2012.
--
Scott Krig has three decades of experience in computer vision, imaging, and graphics visualisation across numerous industries. In 1998 he founded Krig Research (http://www.krigresearch.com/), providing imaging and computer vision systems based on high-performance engineering workstations, supercomputers, and dedicated imaging hardware. Krig is the author of a new book Computer Vision Metrics: Survey, Taxonomy, and Analysis, published by Apress Media in partnership with Intel.