Address Verification
For address standardization, we cross-reference details using four different address sources:- National Address Dataset from the US Census
- Open Address dataset
- Simple Maps
- ESRI
- Consistent formatting
- Accurate geocoding
- Reliable location identification
Contractor Verification
We match contractor information against:- Publicly available state license files
- Business registration records
- Properly licensed
- Legitimate operators in their respective fields
Data Labeling Process
For permit classification, we employ a rigorous annotation process:- Multiple independent annotators label each record
- Manual review resolves divergent responses
- Validation sample size is 1-5% of overall data
Our approach has achieved 98% accuracy in classifications, validated by construction industry experts.
Golden Dataset Methodology
Annotators independently solve tasks rather than validating model outputs. This:- Prevents annotator bias
- Creates a “golden dataset” of correct answers
- Enables benchmarking of new model outputs without fresh validation
