Key Stages in Data Analytics

Ignacio Colonna

The development of new agricultural technologies involves the execution of multiple trials across different product stages. Often, this results in large sets of data compiled across multiple geographies over time. Innovators know there is meaningful information to be extracted but can be unsure where to start. How are they able to sort through all the data, that is often messy and in multiple formats, analyze that data appropriately and then apply analytics to improve and optimize the technology? At AgriThority® we conduct data analytics for clients to address key questions throughout different stages during the product development process and discover actionable insights.

The data analytics process in AgriThority® encompasses five stages:

Stage 1: Developing questions and preliminary assessment of the dataset

When working with a large data set for a new technology, it is important to establish clear key questions and identify whether we have valid data for the right analytical approach to obtain answers. These questions involve a hierarchy, starting from the more general ones and advancing into “breakouts” involving the analysis of specific subsets within the entire dataset. This process takes time and diligence. It’s not enough to simply list the question, you must also have a clear approach on how you will analyze each question through the data to obtain an unbiased, statistically robust answer. It is easy to fall into hidden biases if this step is not carefully followed.

Stage 2: Dataset compilation and first-stage quality control

Once you have determined the questions that need to be answered, it is time to formally compile and quality-check the data. Data can come in multiple file formats and in a variety of units – imperial, metric, volume. It takes a considerable amount of time to standardize the data prior to the actual analysis. Once a homogeneous dataset is put together, data quality is inspected through observation and formal statistical procedures to identify “outliers” – points that cannot be included in the analysis for different reasons, including both statistical aspects (e.g. a large deviation from the other datapoints) and a practical assessment (e.g. a very unique testing environment, out of the study target). We also must identify which response variables are suitable for analysis and which are not, depending on the final outcome of this quality check process.

Stage 3: Verify “balance” in data breakouts

Large data sets are crucial for analysis but can be dangerous. It is important to ensure that each data subset aimed at answering a key question has a reasonable balance, so you know you are comparing apples to apples. Strong unbalancing in the dataset throughout the different stages, groups or geographies can lead to biased answers and thus you will not be able to draw valid comparisons due to “statistical confounding” – hidden factors having a relevant effect on the final outcome of our analysis.

Stage 4: Data analysis – start general and advance to more specific breakouts

The data analysis stage begins with answering big questions such as what is the overall effect of a product on crop yield and progresses to more specific ones such as how does this product effect change across crops, field management practices or environments. All findings and statements in a data analytics process must be based on a statistical analysis. It is always tempting to identify a visual “trend,” but the chances of this being real without statistical significance are very low, which means that actions derived from this assessment may be highly risky. It is thus critical to re-check the data subsets used in each breakout analysis, and run multiple analyses on different data sets to reexamine the findings to answer the more specific questions. The use of statistical mixed models for unbalanced dataset analysis helps improve precision of analysis, within limits.

Stage 5: Geographical and environmental analysis

The last step of data analysis is to bring in data points to determine how technology works across different geographies and environments. Checking these variables will strengthen your analysis and allow you to develop preliminary product positioning strategies. This involves managing typically massive amounts of soil and weather information, selecting the more relevant variables for the analysis and establishing a tentative model on how environment modules the efficacy of the products under analysis. Ultimately, a model is fit that can allow the use of a predictive strategy for preliminary product positioning and a quantitative risk assessment of its variability across geographies and seasons.

When analyzing data for our clients, AgriThority provides a robust and a neutral assessment of efficacy across several events and monitoring factors. Then we provide a critical diagnostic of development gaps with an actionable plan of execution to move forward.

Reach out to AgriThority® to help gain credible and robust data, as well as to analyze existing data to draw insights to ensure your data doesn’t go to waste.

Share this article

Key Stages in Data Analytics

Stage 1: Developing questions and preliminary assessment of the dataset

Stage 2: Dataset compilation and first-stage quality control

Stage 3: Verify “balance” in data breakouts

Stage 4: Data analysis – start general and advance to more specific breakouts

Stage 5: Geographical and environmental analysis

Related Articles

Advancing Agricultural Innovation Begins with Greenhouse Trials

The Circular Economy in Agriculture: Resource Optimization Goes Beyond Waste Reduction