A New Algorithm for Identifying Pecan Weevils through Image Processing Techniques
Member, American Society of Agricultural and Biological Engineering (ASABE)
Member, Institute of Electrical and Electronics Engineers (IEEE)
Member, American Society of Agricultural and Biological Engineering (ASABE)
Pecan Weevil attacks the pecan nut, causes significant financial loss and can cause total crop failure. A traditional way of controlling this insect is by setting traps in the pecan orchard and regularly checking them for weevils. The objective of this study is to develop a recognition system that can serve in a wireless imaging network for monitoring pecan weevil. Recognition methods used in this study are based on template matching. The training set consisted of 205 pecan weevils and the testing set included 30 randomly selected pecan weevils and 75 other insects which typically exist in pecan habitat. Five recognition techniques were implemented in this study; namely, normalized cross-correlation, Fourier descriptors, moment invariants, string matching, and regional descriptors. Results indicate that a positive match from three of the five independent tests would yield reliable results; therefore, 100% recognition could be achieved by adopting the proposed algorithm.
The United States annual production of pecan nuts is about 177,300 million pounds . Nut losses from insects and diseases on pecans can cause significant financial losses and can be severe enough to result in total crop failure. The pecan tree can be attacked by more than twenty types of insects; however, pecan weevil (Curculio Caryae) is one of the most destructive pests of Oklahoma pecans. It is also considered as the most serious “late-season” pest because it attacks the nut.
As soon as they emerge from the soil cells, adult pecan weevils move to the nearest tree. Research indicates that 77 percent of adults fly to the tree trunk at a height of 6 to 8 feet, 5 percent walk to the tree trunk and 15 percent fly directly to the canopy. Once in the canopy, the tasks of feeding and finding a mate begin. The current method of monitoring pecan weevils is to trap them while they emerge from the soil by using different types of trapping techniques. Then, based on the number of trapped pecan weevils entomologists and growers may decide about specific treatment methods.
It would be a useful tool to the pecan growers to have a practical monitoring system that can automatically provide them with very accurate information about the presence of pecan weevils and their population. Such a system would not require intensive labor and periodic observation of the traps. In addition, it can be modified to recognize more than one type of insect and would be very effective in large growing areas by taking advantage of the communication devices. Furthermore, it would be an even more effective method for monitoring some dangerous insects that entomologists would like to know about their appearance to start some treatment.
The goal of this study is to identify pecan weevil among other insects that are naturally present in the pecan habitat by implementing several image processing techniques. This is the first step toward building a wireless imaging system that can be commercially available and affordable for farmers.
2. Previous Works
Among the cited studies, Digital Automated Identification System (DAISY) , Automated Bee Identification System (ABIS) , Species Identification , Automated and web Accessible (SPIDA) , and the Automated Insect Identification through Concatenated Histograms of Local Appearance (AIICHLA) , were significant in the field. However, these systems have some limitations and may not be applicable for identifying all insects. The target group that DAISY was designed to identify is Ophioninae (Hymenoptera: Ichneumonidae). For accurate classification, the system required that insects have to be aligned for capturing their image. In other words, the system may not be applicable for field application where no human interaction is preferred. Furthermore, for insects that are closely related and similar in shape, large number of training images would be required especially with the random n-tuple classifier (NCC) used in this system. The system could be a good tool for routine identification of a targeted group of insects.
ABIS system was designed specifically to identify bees based on differences of their forewings. It required user interaction for aligning the species’ wings before capturing its picture. Further, the system was limited to species with membranous wings as the algorithm depends on a specific set of characters of the wing venation for identification purposes. In the SPIDA-Web system, manual manipulation of a spider specimen is required for proper image acquisition. User interaction is also required for region selection and pre-processing of images. Finally, the AIICHLA system was specifically designed to identify stonefly larvae that live in water. The operator has to ensure that the larvae are in the standard orientation for capturing their images appropriately.
To our knowledge, there is no fully automated imaging system for identifying insects at the field level. Furthermore, no study was conducted for identifying pecan weevils. The absence of such a system motivated the research of this project. The aim of this study is to develop the software part of a wireless network imaging system that can automatically identify pecan weevils in the field. This system is essential for monitoring and controlling pecan weevils and a useful tool for pest control management, in general. Furthermore, the software with minor modifications can be used to identify other insects.
3. Recognition Methods
3.1. Correlation-Based Template Matching
This method is a standard way of finding a match of a template in a given image. The template could be a part of an object or whole object that need to be found in an image. The task of correlation-based matching involves searching for the location of the template in the tested image. The correlation calculates the similarity between the template and the area in the input image. A large value of the correlation indicates a high similarity. In this method, no pre- or post- processing is required. The computational complexity of correlation-based method depends on the size of the template and the image. When the searching area (image) is large, it is expected that the searching process will take longer time. Also, the computational cost will be high in case of searching larger databases.
The template will be denoted as of size that need to be matched with an image of size where the size of the template has to be less than or equal to the size of the image. The sum of squared differences (SSD) is a similarity measure widely used in computer vision. In gray level image, differences of the sum squared of each corresponding template and input image pixel is taken as an indication of the similarity between the template and the searched area of the image .
The cross-correlation can be derived from the SSD as
In (1), the energy of the searched area and the template are represented by the first and second terms respectively. The last term is the cross-correlation (CC) which forms the correlation between the image and the template. The value of the CC ranges from zero (no match) to [N.M. 2552] the maximum value. The need for normalizing the cross correlation (NCC) term appeared since the energy of the different searched area in the image is not usually constant . The CC can be normalized as follows:
As (3) shows, the normalization is done by dividing the CC by the square root of the energy of the searching area and the template. The range of the NCC is between [0, 1] where zero indicates no match and 1 indicates identical match.
In this study, NCC was used with a very simple algorithm to identify pecan weevils among other insects. First, the program reads the gray level input image and the image of pecan weevil stored in the database. Then, the input image is treated as a template and normalized cross correlation is performed between this template and the database images one by one. If the value of the correlation is greater than the threshold (0.75), then the input image will be recognized as a pecan weevil.
3.2. Geometric Moment Invariants
For robust and reliable recognition results, it is always desirable to utilize methods which are invariant to translation, scale, orientation, and rotation. One of these methods that have been widely implemented in pattern recognition and image classification is moment invariants. First introduced by , moment invariants can provide characteristics of an object that uniquely represent its shape .
The seven geometrical moment invariants derived by  from the second and the third moments are:
3.2. Zernike Moments
Zernike moment descriptor has the properties of rotation invariance, robustness to noise, expression efficiency, fast computation and multi-level representation for describing the various shapes of patterns . In many comparison studies of moments based methods [9, 11-17], Zernike moments outperformed the others especially the geometrical moments.
Zernike  introduced a set of complex polynomials which form a complete orthogonal set over the interior of the unit circle of x2+y2 = 1. The computation of Zernike moments from an input image consists of three steps: computation of radial polynomials, computation of Zernike basis function, and computation of Zernike moments by projecting the image on to the basis function . The form of these polynomials is as follows:
where . n is called “order”, m is a positive or negative integer (known as “repetition”) with constraint that: n-|m| is even and |m| ≤ n, r is the length of vector from origin to pixel, θ is the angle between vector r and x-axis in counter-clockwise direction, Rnm is the radial polynomial defined as:
These polynomials are orthogonal and satisfy the orthogonal properties for the same repletion:
The Zernike moments of order with repetition for a continuous image function outside the unit circle is:
In (8), the integral can be replaced by summations, as all the images are digital, as follows:
The Zernike moments are computed for an image by considering the center of the image as the origin and the pixel coordinates are mapped to the range of the unit circle. The computation will not include pixels outside the unit circle. The orthogonality implies no redundancy or overlap of information between the moments with different orders and repetitions . In this case, each moment will be a unique and independent representation to a given image.
3.4. Fourier Descriptors
Fourier descriptors have been widely used as features for object recognition and classification applications. Fourier descriptors can be related to some basic transformation to account for any changes. They provide a unique representation of an object. This great advantage would make the boundary description independent of some variances such as rotation, scale, and translation. For a given image, the Fourier descriptors are produced by the Fourier Transformation which represents the shape in the frequency domain. The lower frequency descriptors store the general information of the shape and the higher frequency store the smaller details . Therefore, lower frequency components of the Fourier descriptors are sufficient for general shape description.
The boundary of a shape consists of K points in the xy plane. Tracing once around the boundary from an arbitrary starting point (x0, y0) in the counter-clockwise direction, at a constant speed, produces a sequence of coordinate pairs: (x0, y0), (x1, y1), (x2, y2),....(xk-1, yk-1). For representing traversal at a constant speed, it is necessary to interpolate equidistant points around the boundary. The boundary can be represented as the sequence of coordinates:
for k = 0,1,2,….k-1. The coordinate pair of shape boundary can be described as a complex number as:
where. This representation changed the problem from two-dimensional to one-dimensional case.
The discrete Fourier transform of (10) is:
and the complex coefficients are known as Fourier descriptors of the boundary. The inverse Fourier transform of (11) is:
where ‘k’ is the number of points in the boundary and ‘s’ is the featured value from Fourier descriptors for object recognition and representation. As mentioned earlier, high frequency components account for fine detail and low frequency components determine global shape, therefore, not all Fourier descriptors are required for general object recognition. Instead, only the first P coefficients should be used. In this case, (12) can be rewritten as:
As a result, smaller P results in the finer details being lost on the boundary. On the other hand, the fewer components we include in calculations, the faster the algorithm would be.
The idea of template matching is all about measuring the similarity between an input image and a database of known shapes. When comparing two images, say G and H, two sets of valueswill be produced by the two images. The similarity (distance) between them can be measured as . The smaller the distance is the closer the two shapes are to each other and vice versa. If the distance , then the two shapes are identical. In measuring similarity, it is always desired to represent the result by a single value instead of a set of values like in . This can be done by treating as a vector in multi-dimensional space where the length of this vector represents the distance between the two compared images . The value of the distance can be obtained from the square root of the sum of the squares of the elements of .
To calculate the similarity degree of the corresponding moment invariants of an input insect’s image and the database of pecan weevils’ images, Euclidean Distance (ED) was utilized as the classifier measure. The ED’s equation can be written as:
3.5. String Matching
String matching is one of the region-based descriptors in which the boundary of a shape can be represented by a string. Strings are one-dimensional structures representing the boundary of two-dimensional shape. This representation requires an appropriate method for reducing the two-dimensional relations to one-dimensional form . The fundamental concept of using strings as descriptors is to extract the connected line segments from the shape to be recognized. The approach implemented in this study is to track the contour of an insect and code the result with segments of specified direction (angles).
In this method, the boundary of an insect is represented by a string which is generated by coding the interior angles of the polygons. Then, strings were generated from a given angle array by quantizing the angles into increments which produced strings whose elements were numbers between 1 and 8  with one increment. Table 1 presents this relationship:
Table 1. Designated Integers for each angles’ range
Symbol Representing the Range
For an input image of unknown insect and pecan weevil, the two boundaries can be coded into strings denoted respectively. If represents the number of matches between the two strings, and the match takes place in the location, then the number of unmatched symbols can be described as: where are the length of the string representing the unknown insect and the pecan weevil images respectively. In this case, the value of is equal to zero if the two images are identical.
Even though there are many definitions of string similarity, a simple measure between strings was implemented in this study which can be represented in the following ratio:
The value of is equal to zero when none of the symbols in (unknown insect’s image) and (pecan weevil’s image) are similar; is equal to infinite when the two images are identically matched. In string matching, a tested image is recognized as pecan weevil if the D value is greater than or equal to the value of the threshold (1.0).
3.6. Regional Properties Descriptors
While the aim of this study is to identify pecan weevils among other insects, it is also desired to keep such a system as simple as possible. A regional property is one of the regional descriptors approaches as it deals with the region(s) of the image instead of its boundary. It is a simple method for describing important properties of image regions such as the area, centroid, and orientation. Although there are many insects that are very close to pecan weevils in terms of shape description, one important feature can be utilize to distinguish pecan weevils from other insects. This feature is the pecan weevil’s rostrum. It is expected that pecan weevil can be recognized by its long rostrum which is ¾ the length of the male’s body and as long as the female’s body.
As pecan weevil is not the only insect that has a rostrum, hence, utilizing this feature alone (major -axis length) may not be very effective. Therefore, this feature was related to other features in order to form a unique representation of pecan weevils. The area, major-axis length, and minor- axis length were used to describe pecan weevils in this project. The area of the selected region is defined as the number of pixels in that region. The major-axis length can be defined as the length (in pixels) of the major axis of the ellipse that has the same second moments as the region. Finally, the minor-axis length is the length (in pixels) of the major axis of the ellipse that has the same second moments as the region .
4. Results and Discussion
The threshold at which insects are recognized as pecan weevil was experimentally determined using 205 pecan weevils. For each individual method, a test was performed between each image of this data set and the rest of the pecan weevils. This step showed the similarity value of each individual weevil to the others which helped in choosing an appropriate overall threshold.
4.1 Normalized cross-correlation
Results showed that 86% of the pecan weevils have at least one match that is greater than or equal to a correlation value of 0.75. As a result, the threshold was set to be at a correlation value of 0.75 or greater. In other words, an insect would be recognized as pecan weevil using normalized cross-correlation method if its correlation value with any pecan weevil individual (training set) is greater than or equal to 0.75.
The method was then tested using two types of data sets; the first one consisted of 30 pecan weevils that were randomly selected from a group of 200 pecan weevils and the second group set is a group of 19 different insects (74 insects) that are naturally presented in the pecan habitat. The results of this experiment showed that the average correlation value of pecan weevils was 0.79 which is above the threshold of recognition (0.75). Using the testing set, twenty seven pecan weevils out of thirty were positively recognized. On the other hand, when correlating the second group of insects with the training set, seventy non-pecan weevil insects out of seventy four were correctly classified.
Figure 1. Normalized cross-correlation method
Figure 1 illustrates the results of using normalized cross-correlation method to identify pecan weevils among other insects. In this figure, pecan weevils are represented by the solid circles while the other insects are represented with empty ones. Clearly, it can be noticed that this method could distinguish pecan weevils from other insects. It is seen that 90% of the pecan weevils are above the experimentally determined threshold of 0.75. Further, the three pecan weevils which were below the threshold were very close (0.74) to the passing criteria and not significantly away from being correctly distinguished.
4.2 Fourier descriptors
Figure 2 illustrates the results of Fourier descriptors method. It can be seen that this method distinguished a good number of pecan weevil insects using an experimentally determined threshold of 1.0. The results showed that 80% of the pecan weevils were correctly classified whereas 51% of the non-pecan weevils were positively classified. One attributes the relatively poor performance of Fourier descriptors method to the non-linear variation among the pecan weevils in terms of body size and part orientation. These results supported the idea of implementing more than one recognition method as it may not provide the desired result.
Figure 2. Fourier descriptors method
4.3 Geometrical Moment Invariants
The seven moment invariants of each insect were used for recognizing pecan weevils. The recognition criterion (threshold) for this method was found to be 0.3. Figure 3 presents the results of the two testing sets. All pecan weevils were correctly classified except one. In other words, 97% of the pecan weevils were positively identified. On the other hand, 59% of the non-pecan weevil insects were positively classified.
Figure 3. Geometrical moment invariants
The last result may appear to be rather low, but in fact it is very encouraging. It can be noticed that 39% of the data are well distinguished from the pecan weevils and their Euclidean distance is greater than 1.0. In short, the good performance of this method confirms the adoption of moment invariants to serve in this multi-recognition methods’ approach.
Zernike moments method was explored in this study as many studies proved that Zernike moments method outperforms geometrical moment invariants.
4.4 Zernike Moments
Due to the wide variation among pecan weevils in terms of body size and shape, the seven moment invariants may not capture all variation. Therefore, Zernike moments was implemented as it has more moments which enable more precise description of the tested insects and hence improves the recognition rate.
Figure 4. Zernike moment invariants method
It can be seen from Figure 4 that 93% pecan weevils are below the recognition threshold of 14.0. Also, also it can be noticed that the pecan weevils are very close to each other and not widely distributed in the method. This indicates that Zernike moments could incorporate the characteristics of pecan weevils better than the seven moment invariants.
The results showed that 69% of non-pecan weevil insects were correctly classified which is 10% higher than the recognition rate of geometrical moments method. The 30% misclassification ratio can be attributed to the high similarity of pecan weevils and some of the insects at the non-pecan weevil group. This group has more than 19 insects that are in fact weevils which are very similar to the pecan weevil.
4.5 Region Properties Descriptors
Region properties method provides several descriptors that can be used in the area of image recognition, for example, the area, centroid, orientation, Euler number, and others. In this study, three measurements (descriptors) were adopted to represent pecan weevils including area, major axis length, and minor axis length. The major axis length is defined as the length (in pixels) of the major axis of the ellipse that has the same second moment as the region. The minor axis can be related to the length (in pixels) of the minor axis of the ellipse that has the same second moment as the region length. The term area here refers to the number of pixels in a given region .
Pecan weevils belong to the superfamily Curculionidea which is the largest group (65,000 species) of order Coleoptera. They, as members of this group, are most specialized in having rostrums which are used in preparing oviposition holes as well as in feeding. A vector that has the three selected region properties descriptors was formed for each insect as a unique representation. The major axis length of the insects is the most significant descriptor. That is because in case of pecan weevils, it is almost always the case that the major axis of their body is in fact the length of their rostrum plus the length of the body (head and abdomen). However, this general rule has some exception especially when an image of pecan weevil was taken while its rostrum was reoriented or broken.
Figure 5. Region properties method
Figure 5 showed the result of applying the region properties method to the training data set of 205 pecan weevils. The results showed that 80% of the pecan weevils have at least one positive match with a minimum Euclidean distance less than one. The average of number of positive matches for this group (threshold ≤ 1.0) was three matches per insect. Although 90% of the pecan weevils were positively identified below a threshold of 2.0, it was preferred to have a more biased threshold of 1.0.
With this criterion of recognition, two experiments were conducted. The first one was done on a randomly selected group of thirty pecan weevils. This testing set is the same sample that was used to test the performance of all other recognition methods. In the other experiment, the region properties method was applied to a data set consisting of 74 insects (non-pecan weevils). Using Euclidean distance for measuring similarity, the result of first experiment showed that 27 pecan weevils out of 30 were positively identified. This 93% successful recognition rate was achieved with threshold of less than or equal 1.0. On the other hand, all non-pecan weevil insects group were correctly classified except five samples. In other words, more than 93% of the non-pecan weevil insects were accurately identified.
4.6 String matching
String matching is a simple, yet very effective method in recognizing pecan weevils. Recognition threshold for this method was set at 1.0. Using this method, 80% of pecan weevils were correctly classified. In the same experiment, 88% of the non-pecan weevil insects were positively identified.
Figure 6. String matching method
The performance of these five methods is compared in Figure 7. It illustrates the percentage of pecan weevils and non-pecan weevils successfully identified by each of the methods.
Figure 7. Summary of the results for all methods
The Type I and Type II error for each of the methods was also evaluated. Table 2 presents the Type I and Type II error observed in the five methods.
Table 2. Summary of performance of the methods in terms of Type I and II error
Type II Error
Based on the above results, it became evident that none of the five methods when used alone could yield the level of identification desired of a good system. Therefore, the application of all five methods in sequence is recommended at this stage. However, it may be possible in the future to optimize the number of methods used and preserve the quality of the results. The algorithm containing the implementation of these five methods is illustrated below in Figure 8:
Figure 8. Flow diagram of the algorithm for identifying Pecan Weevils
- United States Department of Agriculture (USDA), 2004,[online].Available: www.ers.usda.gov\briefing\fruitandtreenuts\fruitnutpdf\pecansfts304.pdf
- A. Watson, M. O’Neill and I. Kitching, “Automated identification of live moths (macrolepidoptera) using Digital Automated Identification System (DAISY)”, System Biodiv. 1, 2003, pp. 287–300.
- T. Arbuckle, S. Schroder, V. Steinhage and D. Wittmann, “Biodiversity informatics in action: identification and monitoring of bee species using ABIS. In Proc. 15th Int. Symp. Informatics for Environmental Protection, ETH Zurich, 1, 2001, pp. 425-430.
- M. Mayo and T. Watson, “Automatic Species Identification of Live Moths”, Computer Science- Artificial Intelligence, 20(2): 2007, pp. 195-202.
- M. Do, J. Harp and K. Norris, “A test of a Pattern Recognition System for Identification of Spiders, Bull. Entomology, Res. 89, 1999, pp.217-224.
- N. Larios, N. Deng, W. Zhang, M. Sarpola, J. Yuen, R. Paasch, A. Moldenke, D.A. Lytle, C. Ruiz, E. Mortensen, L.G. Shapiro, T.G. Dietterich, “Automated Insect Identification through Concatenated Histograms of Local Appearance Features”, Institute of Electrical and Electronics Engineers, Application of Computer Vision, 2007 WACV’07.
- M. Storring and T.B. Moeslund, “An Introduction to Template Matching”, adapted from “Fixation and Tracking using Active Cameras with Foveated Wide-angle lenses”, M.S. thesis, 1997.
- M.Hu, “Visual Pattern Recognition by Moment Invariants”, IRE Trans. Inf. Theory, 8, 1962, pp. 179-187.
- S. Liao and M. Pawlak, “On Image Analysis by Moments”, IEEE Trans. On Pattern Analysis and Machine Intelligence, 18(3), 1996.
- W. Y. Kim and Y. S. Kim, “A Region-Based Shape Descriptor Using Zernike Moments”, Signal Processing: Image Communication, 16 (1-2), 2000, pp. 95-102.
- C.H. Teh, “On Image Analysis by the Methods of Moments”, IEEE Trans. On Pattern Analysis and Machine Intelligence, 10(4), 1988.
- T. W. Lin and Y. F. Chou, “A Comparative Study of Zernike Moments”, Proceedings of the IEEE/WIC International Conference on Web Intelligence, 2003, pp. 516-519.
- S. Belkasim, M. Shridhar, and M. Ahmadi, “Pattern Recognition with Moment Invariants: A Comparative Study and New Results”, Pattern Recognition, 24 (12), 1991, pp. 1117-1138.
- D. S. Zhang and G. J. Lu, “Review of Shape Representation and Description Techniques”, Pattern Recognition, 37(1), 2004, pp 1-19.
- J. S. Park and T.Y. Kim, “Shape Image Retrieval Using Invariant Features. Advances in Multimedia Information Processing”, PCM-2004, Part 2, Proceedings Lecture Notes In Computer Science, 2004, 3332: 146-153.
- N. Ezer, E., Anarim and B. Sankur, “A Comparative Study of Moment Invariants and Fourier Descriptors in Planar Shape Recognition”, Institute of Electrical and Electronics Engineers, IEEE, 1, 1994, pp. 242-245.
- A. Padilla-Vivanco, G. Urcid-Serrano, F. Granados-Agustin, and A. Cornejo-Rodriguez, “Comparative Analysis of Pattern Reconstruction Using Orthogonal Moments”, Optical Engineering 46(1), 2007, 017002: pp. 1-15.
- M. Zhenjiang, “Zernike Moment-Based Image Shape Analysis and its Application”, Pattern Recognition Letters. 21 (2), 2000, pp. 169-177.
- S. K. Hwang and W.Y. Kim, “A Novel Approach to the Fast Computation of Zernike Moments”, Pattern Recognition, 39(11), 2006, pp. 2065-2076.
- M. Sarfraz, “Object Recognition Using Fourier Descriptors: Some Experiments and Observations”, Proceedings of the International Conference on Computer Graphics, Image Processing and Visualization, IEEE, 2006, pp. 1706 – 1708.
- R. Gonzalez and R. Woods, Digital Image Processing, 2nd Edition, Prentice Hall, Upper Saddle River, New Jersey, 2001.
- R. Gonzalez, R. Woods and S. Eddins, Digital Image Processing Using Matlab, Prentice Hall, Upper Saddle River, New Jersey, 2004.