> ## Documentation Index
> Fetch the complete documentation index at: https://docs.shovels.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Data Labeling and Annotation Process

> Understand how Shovels ensures high-quality, accurately classified permit data through our rigorous annotation and validation methodology.

Shovels employs a rigorous data labeling and annotation process to ensure high-quality, accurately classified permit data.

## Our Approach

### Multiple Independent Annotators

Each record is labeled by **multiple independent annotators**. When their responses diverge, we manually review and resolve the discrepancies.

### Validation Sample Size

The validation sample size is proportionate to each category's representation in the dataset:

* Typically **1-5% of overall data**
* Ensures adequate validation points for every category

## Golden Dataset Methodology

A key aspect of our methodology is having annotators **independently solve the task** rather than validate model outputs.

This approach:

* **Prevents annotator bias**
* **Creates a "golden dataset"** of correct answers
* **Enables benchmarking** of new model outputs across iterations without requiring fresh human validation each time

## Why This Matters

This approach is particularly effective for accurately classifying permit descriptions, which often contain:

* Industry-specific terminology
* Abbreviations
* Inconsistent formatting

## Accuracy Results

Our case study on using specialist participants for data labeling shows how we achieved **98% accuracy** in our classifications by incorporating a panel of experts from the construction industry.

<Info>
  Learn more in our [blog post on data labeling](https://www.shovels.ai/blog/) with construction industry specialists.
</Info>

## Related Articles

* [Data verification methods](/docs/knowledge-base/data/quality/verification-methods)
* [Key differentiators](/docs/knowledge-base/getting-started/key-differentiators)
