Yield Prediction Using A Neural Network Classifier Trained Using Soil Landscape Features and Soil Fertility Data

S.A. Shearer, T.F. Burks, J.P Fulton, S.F. Higgins
Department of Biosystems and Agricultural Engineering, University of Kentucky, Lexington, KY, USA

J.A. Thomasson
Department of Agricultural & Biological Engineering, Mississippi State University, Mississippi State, MS, USA

T.G. Mueller
Agronomy Department

S. Samson
Sociology Department, University of Kentucky, Lexington, KY, USA


Burks, T.F., S.A. Shearer, C. J. Sobolik, and J.P. Fulton.  2000.  Combine Yield Monitor Test Facility DevelopmentASAE Paper No. 001084.  Annual International Meeting, Midwest Express Center, Milwaukee, Wisconsin, July 9-12.

Abstract

This manuscript presents the application of artificial neural network classifiers to the task of predicting spatially varing yield within a single field. Data were obtained from the 1998 corn harvest in the outter Bluegraas region of Kentucky to test this concept. Data features considered for inclusion in the various data models included fertility, elevation, electrical conductivity and satellite imagery. Four of the ten models considered show some promise as predictive tools for optimizing grain production using precision agriculture technologies. While many of the classifier models accurately predicted average yield levels, many fell short when predicting the range of yield variability. Model 6, which contained fertility, electrical conductivity and satellite image features, seemed to do the best overall job of predicting spatial yield variability. No attempt was made to account for temporal variations in this model.

Introduction

With the introduction of GPS agricultural producers now have the opportunity to collect yield and soil landscape data that quantifies the inherent variability associated with crop production. Managing this variability is now the objective of many producers. The focus is to maximize economic returns while reducing the risks associated with growing crops. As one might expect, crop models and other predictive tools will become an essential element of the technology known as "precision agriculture."

Two important trends are emerging for predicting crop performance. The more traditional approach is based on understanding the physiology of a particular crop and then fitting mathematical models to describe this crop growth. Crop models have been developed for many of the grain crops now produced in the U.S. A second and less accepted approach is the application of artificial intelligence for the prediction of crop response. The later approach requires the specification and characterization of factors that influence crop growth, as is the case with more traditional physiology based models. The differences between model types are the methods used to determine and quantify the interaction between the various input features.

In this manuscript, the authors attempt to identify several soil landscape features that may be useful for training a neural network classifier to predict grain yield. The intent is to demonstrate the use of an artificial neural network classifiers as an alternative to more traditional physiology-based models. Therefore the objectives of this work were:

  1. To develop an artificial neural network classifier to predict yield using a backpropagation (BP) training approach with data from a field transect.
  2. To evaluate the classification accuracy of the neural network approach by applying the classifier to predict grain yield on a 0.4 ha grid within the same field and cropping season.

 

Previous Work

Neural Network Classifiers

The semi-linear feedforward neural network has been reported to be an effective system for learning discriminant patterns for a number of different applications (Pao, 1989). The BP learning method is an iterative process used to train the feedforward neural network for minimal response error. An input pattern is applied to the network and forward propagated through the network using the initial node connection weights. Output error is determined then back propagated to establish a new set of network connection weights. The process is continued until a prescribed minimum error is achieved. A typical topology for a feedforward network consists of an input layer, hidden layer and output layer.

The input layer of the feedforward network is generally fully connected to all nodes in the subsequent hidden layer. Input vectors may be binary, integer or real and are usually normalized to values between -1 and 1 (Pao, 1989). When inputs are applied to the network, they are scaled by the connecting weights between the input layer and the first hidden layer. These connection weights are denoted by wji, where i, represents the ith input node and j represents the jth node in the first hidden layer. Each jth node of the hidden layer acts as a summing node for all scaled inputs and as an output activation function. This concept is illustrated in MATLAB (1994). The summing function is given in equation 1, where wji is the connection weights between the jth node of the hidden layer and the ith node of the input layer; pi is the input value at the ith node; and bj is the bias at the jth node in the hidden layer.

(1)

One of the more popular activation functions used in classification problems is the sigmoidal function. This function is given in equation 2, where oj is the output of the jth node. The parameter qj serves as a bias and is used to shift the activation function along the horizontal axis. The parameter qo is used to modify the shape of the sigmoidal function, where low values will make the function look like a threshold-logic unit and high values result in a more gradually sloped function (Pao, 1989).

(2)

Subsequent layers work in the same manner. The output from the preceding layer is summed at each node using equation 1. The output of the node is determined using the activation function of equation 2. There is no limit on the number of nodes in a hidden layer, nor is there a limit on the number of hidden layers. However, practical experience shows that excessively large networks are difficult to train and require large amounts of memory and execution time.

The BP training process begins by selecting a set of training input vectors along with their corresponding output vectors. An input vector is applied to a network where connecting weights have been initialized by some type of random value criteria. The resulting output is determined by calculating the feed forward output at each node and forward propagating the layer results until the kth output layer nodes are activated. The actual output ok is then compared to the target output tk and the error is calculated using the sum of square error (SSE), or other selected criteria. The relationship representing the SSE for an input pattern p is given in equation 3, where k represents the kth node in the output layer.

  (3)

Once the error has been determined the connection weights in the network are updated using a gradient descent approach, by back propagating change in the network weights starting at the output layer and working back toward the input layer. The incremental change in wkj for a given pattern p is termed Dpwkj, and is proportional to -DSSE/Dwkj. The relationship for calculating the incremental change in connection weights between the last hidden layer and the output layer is given by equation 4 (Pao, 1989)

  (4)

where
        m = iteration step in the training process
        h = learning rate
        a = momentum factor for previous iteration Dpwkj (m)

and where the d value at the output layer is represented by equation 5

  (5)

The subscript p represents the pattern number. The values of tpk and opk represent the target output and the actual output, respectively, of the pth pattern at the kth output node. The incremental change in wji (between the hidden layer and the input layer) for a given pattern p is termed Dpwji, and is identical to equation 4, with the exception that the delta function is redefined as given in equation 6.

  (6)

In general, the network connection weights are adjusted at the end of each training cycle. That is, after all training patterns have been presented to the network and the Dpwkj calculated, a summation of all pattern changes is applied to the network. This relationship is represented in equation 7.

  (7)

Once the network has been updated, the process is repeated until either a specified error limit is achieved or the total number of training cycles (epochs) have been completed.

The primary objective of this research was to determine if fertility, elevation, electrical conductivity, or satellite data features can be used in conjunction with standard BP neural network topologies to predict spatial variability in grain yields. Network training parameters will be evaluated and the training process checked for repeatability. The BP model used in this study is included in MATLAB’S Neural Network Toolbox. Momentum was used in the Dpwkj equation to enhance the training process. The momentum term helps the training process find the global minimum error by providing momentum for Dpwkj to move through a local minima as shown by equation 4. The various BP topologies, training parameters, and statistical analyses were executed using MATLAB program files. These program scripts helped manage the data files; execute the network initialization, training and simulation functions; and collect statistics from the classification test.

Soil Landscape Characterization

Producers now have the option of collecting massive quantities of data to describe crop performance. Yield monitoring is now commonplace on many farms. When coupled with DGPS, it is possible to see as many as 1,000 yield estimates per hectare. The fineness of this resolution usually exceeds our ability to characterize other landscape features such as soil fertility, elevation, slope, or aspect.

The field selected for this investigation was a 55.7 ha field located in Shelby County, which is in the Outer Bluegrass region of Central Kentucky. Soils in this region of the state are derived from limestone, where the "B" horizon is high in clay content. Terrain features include ridgetop with moderate side slopes. The productivity potential of the soils is often limited by lack of precipitation during the late summer months and the limited soil moisture holding capacity of these shallow soils. Soil mapping units for this field include predominantly Shelbyville and Nicholson silt loam soils.

Data were collected from this field during the 1998 cropping season. The field was planted to corn. Yield was characterized at harvest using Ag Leader 2000 yield monitors with DGPS receivers installed on a John Deere 9610 and Gleaner R-7O model combines. Both machines utilized 12-row corn heads. Figure 1 is a summary of the yield data obtained at harvest. A portion of the southeast corner of the field was chopped for animal feed and, therefore, yield data is unavailable for this location. The remaining yield data provided the basis for training and evlauating the neural network classifier.

Soil fertility data were obtained by sampling on a 63.4 m (0.4 ha) grid. Five soil cores, collected at a 5.0 m radius, were composited for a single sample at each grid point. Figure 2 illustrates the grid sample point locations. Soil sampling depth was limited to 10.0 cm as this field has been in continuous no-till production for over 15 years.

Soil samples were submitted to the University of Kentucky's Regulatory Services for analysis. Sample results included both water and SNP buffer pH values. Carbon contents were obtained by burning the soil sample and sampling the by-product gasses of the combustion process. Total carbon values were reported as percent organic matter. The remaining soil fertility values, P, K, Zn, Mg, and Ca, were obtained using the Mellich III extraction process. These values were reported in ppm.

Elevation data were obtained using a survey grade Lieca single frequency GPS system. The reported horizontal accuracy of this system was 1.0 cm with the elevation accuracy of approximately 2.5 cm. Data were collected by wlaking the field and recording GPS signal data for over 650 locations or uniques features within the boundary of the field. Data were download and post-processed for differential correction using software supplied by the reciever manufacturer. Point data were then used to construct an ArcView shape file for export to ArcInfo for further processing. A digital elevation model (DEM) was generated using standard interpolation techniques. Similarly, canned processes were used to generate surfaces describing slope, aspect and profile. Figure 3 illustrates the DEM model used to describe the elevation features of Field 16.

Soil electrical conductivity measurements were made using a Veris 3100 Soil EC Mapping System. A total of six straight coulters were arranged in an array that enabled coductivity redaings at 0-30 cm and 0-90 cm depths. The Veris machine was operated at approximatley 13 km/h in parallel paths spaced 9.0 m appart. Data readings were logged at 1.0 s intervals for both the shallow and deep sensing arrays. A total of 22,000 data points were avaialble to characterize the field. Figure 4 illustrates the soil conductivity response for the 0-90 cm sensing array.

Two scenes of Landsat satellite imagery were purchased from EOSAT. The first scene was obtained on May 19, 1998 while the second occurred on August 21, 1998. In both cases the quality of the images were exceptional as cloud cover was practically non-existant. The Landsat imagery has a resolution of 25 m. The volume set was provided in band sequential format. Because the data was being used in a neural network classifier, the original digital numbers (DN) were used. These digital numbers ranged from 0 to 255, with 255 indicating the highest radiance value. No attempt was made to correct the DNs to radiance values as the May and August scenes were not compared directly. Aside from projecting the satellite image data to NAD 83 longitude and latitude coordinates for entry into ArcView for analysis, no attempt was made to correct geometric errors or to register the image with ground features. All seven spectral Landsat bands from both the May and August scenes were considered for inclusion and training of the neural network classifier.

Experimental Design

Figure 1 shows the boundary of Field 16 at Worth and Dee Ellis Farms in Shelby County, Kentucky. This 55.7 ha field was planted to corn during the 1998 cropping season. A transect was placed diagonally across the subject field from the Northeast to Southwest corners. Sample sites were located at a 12.2 m spacing along this transect for a total of 63 observations. Data from this transect were used to train the artificial neural network classifier. This transect is depicted as a series of circular regions centered at the sampling sites. For the test case data were collected from a square grid placed over the field. The square grid had a resolution of 63.4 m or 0.4 ha. The test data grid resulted in 137 total observations for the test data case. The sampling grid is shown in Figure 2.

Preprocessing of the data was required to insure representative values for each of the landscape, fertility and spectral reflectance attributes. In all cases point data were iterpoltaed using an inverse distance approach. For each of the sample sites denotes above, circlular polygons at a radius of 9.1 m were centered at the sample points. The 9.1 m radius was selected to insure that data from a minimum of two passess of the combine could be used to determine an area estimate of grain yield. Twelve-row headers were attached to the combines used to harvest the field. At a row spacing of 76 cm, this resulted in an effective header width of 9.15 m. The number of test data set observations were reduced to 117 observations as the owners and operators of the farm elected to chop a portion of the field for silage to support the dairy enterprise of the operation. No yield monitor was available to estimate the grain yield of these locations within the field (refer to Figure 1). All preprocessed data were exported from the GIS package used to build the coverages for analysis. Further, they were accumulated in a spreadsheet package for further processing. Summary statistics for each class of features for both training and test data sets are shown in Tables 1 through 5.

To test the artificial neural network training approach the same classifier was used to predict the yield associated with each location in the test data set. Further, to determine the classification power of the variuos features, several models were constructed. The foundation of this process was to exclude variuos classes of data in an attempt to identify the minimal data set required to predict corn yield for the 1998 cropping season. Table 6 summarizes the models developed for the neural network evaluation phase of this work.

Artificial neural network training and classifire acccurcay testing was performed using the MATLAB software package with Neural Network TOOLBOX (MATLAB, 1994). A network topology with one hidden layer of nodes was selected for the implementation. Back-propogation traning with momentum rate, learning rate, and adaptive learning options was used to evalute each model. Again, the reader is reminded that the focus of this work was to evlaute the classification power of the data using a neural network approach, and not to evlauate the numerous training options avialable to train neural networks. Training for each of the data models proceed by fisrt selecting a target sum of square error (SSE) value. Training was continued until the SSE target was met, or until either the SSE or root mean square (RMS) error values revealed a trend to degrade the classification power of the data model. Provisions were made to step through these results epoch by epoch. When an acceptable training level was reached for each of the training data sets, the classifier was applied the the test data set for evaluation of the classification power.

Discussion of Results

Table 7 is a summary of the yield predictions associated with each of the previously identified data models. For comparison purposes similar statistics are include for the training and test data sets. It should be noted that the statistics of the Training data set nearly match those of the Test data set, with the exception minimum yield which differs by 1.7 Mg/ha, leading to speculation that the transect approach employed to training the classifier may be inappropriate.

In all cases the neural network classifier predicted average yields that closely approximated those of the Training and Test data sets. Unfortunately, the neural network clasifiers did not predict the spread of the actual observations, and in many cases the upper yield predictions did not surpass a threshold level. For example, while the predicted average yield of Model 4 (7.86 Mg/ha) nearly matches that of the Test data set (7.83 Mg/ha), the standard deviation of Model 4 predictions (0.68 Mg/ha) is only one-half of that of the Test data set (1.38 Mg/ha). In general all of the models did a good job of predicting the average yield. If one were interested in the ability of the classifier to predict spread of the yield data then Models 2, 3, 6 and 9 have some appeal, although it should be noted that all of the models fall short of predicting the range of vaiability in yield. This was a recurring problem encountered in training the network classifiers, and in fact was one of the driving forces in development of the train approach. The common traits of the models exhibitting the greatest spread in yield prediction include fertility and conductivity features.

Table 8 summarizes the SSE and RMS terms associated with the trained nerual network classifiers applied to the task of predicting the yield of the Test data set. The RMS values differ little when comparing the various data models. A review of the SSE terms revealed that Models 5, 6, 8 and 10 did better at predicting yield that the remaining models, with Model 8 having the lowest SSE. While the SSE and RMS errors in Table 8 indicate that some classification models outperformed others, plots of the predicted versus actual yields illustrated the limitation of the predictive ability of the artificial neural network classifiers. Figures 7 through 10 visually illustrate that Models 5, 8 and 10 show a limitation in the ability of the network classifiers to predict higher yields, while the same clustering in Model 6 is not quite as evident. One of the charactersitics of all four of these models was the inclusion fertility and conductivity features. This illustrates potential impact these features have on yield, at least during the 1998 cropping season.

When considering both SSE and RMS when combined with the yield prediction statistics it appears that Model 6 does the best job of predicting yield. Included in this model were fertility, conductivity and spectral features. A noteworthy item is the lack of direct measured elevation features in this data model. Just as many of the features were included in the data models because of availability, it is felt that some important controlling features may have been omitted, such as the quantificantion of soil compaction, or the specification of soil mapping units. Obviously these short comings provide the imputus for additional work in the area of neural network classfiers applied to the spatial predictions of yield.

Totally lacking from the effort identified in this document is the consideration of plant genetics, cultural practices, or temporal variations. The true value of any effort to predict spatial yield variability can only be successful if these latter factors are considered. As time and the accumulation of geographically referenced data permit. The value of the work contained in this manscript lies in the demonstration of neural network classifiers applied to the task of predicting spatial yield. It is hoped that similar success is possible with the temporal aspects of the overall problem of predicting yield.

Conclusions

  1. Artificial neural network classifiers show promise for use in predicting spatial yield variablity using fertility, elevation, electrical conductivity and spectral satelliet image features that describe the activily growing corn crop.
  2. Limitations of the nerual network classifier in this application include the inability to predict higher yields, thereby reducing the predictive power and utility of the classifier.
  3. While neural networks show promise for the application of predicting spatial yield, little insight was gained with respect to feature selection and incorporations into the model.

 

References

Acock, B. and Ya. Pachepsky. 1997. Holes in precision farming: mechanistic crop models. In Precision Agriculture ’97 Volume I: Spatial Variabiltiy in Soil and Crop, 413-420. Warwick University Conference Centre, UK, 7-10 September.

Ambuel, J. R., T. S. Colvin and D. L. Karlen. A fuzzy logic yield simulator for prescription farming. Transactions of the ASAE. 37(6): 1999-2009.

Audsley, E., B. J. Bailey, S. A. Beaulah, P. J. Maddaford, D. J. Parsons and R. P. White. 1997. In Precision Agriculture ’97 Volume II: Technology, IT and Management. 843-850. Warwick University Conference Centre, UK, 7-10 September.

Barnett, V., S. Landau, J. J. Colls, J. Craigon, R. A. C. Mitchell and R. W. Payne. 1997. Predicting wheat yields: the search for valid and precise models. In Precision Agriculture: Spatial and Temporal Variability of Environmental Quality. 79-99. Wageningen, Netherlands, 21-23 January.

Barrett, J. R. and B. M. Jacobson. 1995. Humanization of decision support for managing U. S. grain (soybean and corn) production. In Artificial Intelligence in Agriculture. 1-12. Wageningen, Netherlands, 29-31 May.

Center, B. and B. P. Verma. 1997. A fuzzy photosynthesis model for tomato. Transactions of the ASAE. 40(3): 815-821.

Diker, K., W. C. Bausch, and T. H. Podmore. GIS mapping of plant parameters and yield potential estimated by remote sensing. ASAE Paper No. 983143. St. Joseph, Mich.: ASAE.

Hanus, H. 1980. Regression agromet yield forecasting models. In Remote Sensing in Agriculture and Hydrology. 111-126. Ispra (Varese), Italy, 21 November-2 December.

Haskett, J. D., Y. A. Pachepsky and B. Acock. 1995. Agricultural Systems. 48: 73-86.

Jacbson, B. M. and J. W. Jones. 1996. Designing a decisiion support system for soybean management. In Sixth International Conference on Computers in Agriculture. 394-403. Cancun, Mexico.

Lechi, G. M. 1980. Survey of photointerpretation techniques in agricultural inventories. In Remote Sensing in Agriculture and Hydrolog. 17-24. Ispra (Varese), Italy, 21 November-2 December.

Mallet, Ph. 1980. A short review of biological agromet models. In Remote Sensing in Agriculture and Hydrology. 127-131. Ispra (Varese), Italy, 21 November-2 December.

Maas, S. J. 1997. Structure and reflectance of irrigated cotton leaf canopies. Agronomy Journal. 89:54-59.

MATLAB. 1994. Neural Network TOOLBOX. Natick, MA: The Mathworks Inc.

Matthews, R. and S. Blackmore. 1997. Using crop models to determine optimum management practices in precision agriculture. In Precision agriculture ’97 Volume I: Spatial Variabiltiy in Soil and Crop, 413-420. Warwick University Conference Centre, UK, 7-10 September.

McBratney, A.B., B.M. Whelan and T.M. Shatar. 1997. Variability and uncertainty in spatial, temporal and spatiotemporal crop-yield and related data. In Precision Agriculture ’97 : Spatial and Temporal Variability of Environmental Quality. 141-160. Wageningen, Netherlands, 21-23 January.

Meira, S., C. Hernandorena and E. Guevara. 1996. SUR95: decision support system for main crops. In Sixth International Conference on Computers in Agriculture. 899-904. Cancun, Mexico.

Pao, Y.H. 1989. Adaptive Pattern Recognition and Neural Networks. 120-127. New York, NY: Addison Wesley.

Steven, M. D. and C. Millar. 1997. Satellite monitoring for precision farm decision support. In Precision Agriculture ’97 Volume II: Technology, IT and Management. 697-704. Warwick University Conference Centre, UK, 7-10 September.

Wendroth, O., A. M. Al-Omran, C. Kirda, K. Reichardt, and D. R. Nielsen. 1992. State-space approach to spatial variability of crop yield. Soil Science Society of America Journal. 56:801-807.