Why run a gap analysis before you ask for a quote?
Most enrichment projects go wrong in the scoping phase, not the delivery phase. A sales director forwards a CRM export and asks for "direct dials and emails appended to everything." The supplier runs a match pass, returns a file with 28% coverage, and the buyer is surprised. The issue was not the supplier's data quality. It was that the buyer never checked how many records were matchable in the first place.
A gap analysis takes two to four hours of analyst time and prevents that conversation entirely. It also changes the economics. If your CRM holds 40,000 contacts but only 22,000 have enough identity data to match, you are pricing 22,000 enrichments, not 40,000. The project cost drops sharply, and the matched output is higher quality because you are not returning partial appends against weak records.
There is a compliance angle too. Under UK GDPR, data enrichment is a form of data processing. Knowing what fields are missing from your records, and documenting that you identified the gap, forms part of the paper trail that supports your Legitimate Interests Assessment (LIA) for B2B contacts.
What are the five dimensions of a CRM gap analysis?
1. Field completeness
Completeness is the simplest dimension: for each field, what percentage of records carry a non-null value? Count nulls, blanks, and placeholder values ("N/A", "Unknown", "-") as missing. A completeness score of 55% on the email field means 45% of your contacts cannot be reached by email from your own data today.
Fields worth measuring in any UK B2B CRM audit, in rough order of commercial impact:
- Business email address
- Direct telephone number (DDI) or mobile
- Job title
- Seniority / decision-maker flag
- Company name
- UK SIC 2007 code or sector
- Royal Mail postcode
- LinkedIn URL
- Number of employees or revenue band
- Last verified or last active date
2. Field validity
A field is populated but not necessarily valid. An email address of the format [email protected] is structurally sound, but if the individual left the company 18 months ago it is functionally worthless. Validity checking has two layers: format validation (does the value look like an email, a UK phone number, a correctly formatted postcode?) and freshness validation (when was this field last verified, and does the underlying contact still hold the role?).
Format validation is quick. Run a regex against email addresses to flag anything without an @ symbol or a valid top-level domain. For UK telephone numbers, check that the value is 11 digits and starts with 01, 02, 03, 07, or 08. For postcodes, validate against the Royal Mail postcode format. A field that fails format validation is as useless as a null field for enrichment matching.
3. Match-key sufficiency
Match-key sufficiency is the most underappreciated dimension. An enrichment supplier matches your records against their reference file using one or more identity signals. The richness of those signals determines whether a match is possible at all, and how confident the match result will be.
For UK B2B enrichment, the match-key hierarchy looks like this:
- Strong key: business email address plus company name (deterministic match, high confidence)
- Good key: first name, last name, job title, company name, postcode (probabilistic, 85% to 95% confidence when all five present)
- Weak key: first name, last name, company name only (confidence below 75%, high false-positive risk)
- Unmatchable: first name only, or company name without any personal identifier
Count how many records fall into each tier. The strong plus good tier is your realistic enrichable universe. The weak and unmatchable tiers are where you either do manual research or accept a data gap permanently.
4. Duplicate volume
Duplicates inflate your apparent CRM size and waste enrichment budget. A record appearing three times in your system means you pay three times for the append, then merge the records later and discard two of the enriched copies. For a CRM of 30,000 records with a 12% duplicate rate, that is 3,600 wasted enrichment credits before you start.
Run two passes. The first is deterministic: flag every pair of records that share the same email address or the same phone number. These are almost always duplicates. The second pass is probabilistic: flag pairs where the normalised first name, last name, and company name match above a threshold score (typically Levenshtein distance below 2 on each field). The probabilistic pass catches records entered with slight spelling variations or nickname differences, such as "Rob Smith at Acme" and "Robert Smyth at Acme Ltd".
5. Field-by-field decay rate
Decay rate is the hardest dimension to measure from within your CRM alone, but it is critical for understanding how quickly you need to re-enrich after the first pass. UK B2B contact data decays at roughly 25% to 35% per year, driven primarily by job moves, redundancies, company restructures, and domain changes.
To estimate your current decay rate without sending a campaign, cross-reference your CRM against a reference dataset (or run a validation pass through an email verification service or telephone verification service). The proportion of records that come back as invalid gives you a point-in-time decay estimate. Compare that figure against the last-verified date on your records to calculate an annualised rate. A CRM where 30% of emails come back as invalid but records were last verified two years ago implies a 15% annual decay rate, which is lower than average and suggests good CRM hygiene historically.
How to run the analysis: SQL, export, and audit
If your CRM exports to a flat file, most of the analysis is a set of SQL queries or pivot-table operations. For a Salesforce or HubSpot export, a single spreadsheet with one row per contact and one column per field is enough to run completeness and format-validity checks in under an hour.
Completeness query pattern
For each field you want to audit, the logic is:
COUNT(records where field is not null and not blank) / COUNT(all records) * 100
In SQL on an exported table called crm_contacts:
SELECT
COUNT(*) AS total_records,
ROUND(100.0 * COUNT(email) / COUNT(*), 1) AS email_completeness_pct,
ROUND(100.0 * COUNT(phone_direct) / COUNT(*), 1) AS phone_completeness_pct,
ROUND(100.0 * COUNT(job_title) / COUNT(*), 1) AS job_title_completeness_pct,
ROUND(100.0 * COUNT(linkedin_url) / COUNT(*), 1) AS linkedin_completeness_pct
FROM crm_contacts
WHERE email != '' AND email IS NOT NULL;
Adjust for your field names and for the placeholder values your team uses (filter them the same way as nulls). If you are working in Excel rather than SQL, a COUNTIF formula with a "not blank" condition gives the same result per column.
Duplicate detection without specialist tools
Sort the export by email address, then use a COUNTIF to flag any email that appears more than once. Do the same by phone number. Then sort by company name plus last name and eyeball the first 500 rows. You will find most of the high-confidence duplicates in that pass. For a CRM of over 20,000 contacts, a proper deduplication tool or a short data-cleansing commission from your enrichment supplier is worth the cost.
CRM fields audit: typical UK B2B benchmarks
The table below gives benchmark completeness ranges for UK B2B CRMs that have not been enriched in the past 24 months. Your figures may sit outside these ranges depending on how thoroughly your sales team captures data at point of entry, and how much inbound demand you receive versus outbound prospecting.
| CRM field | Typical completeness (UK B2B) | Enrichable? | Notes |
|---|---|---|---|
| Company name | 90%+ | No (usually present) | Needed as match key; low absence rate |
| Postcode / county | 65%–85% | Yes, via Companies House | Can be derived from registered address |
| UK SIC 2007 code | 20%–50% | Yes, via Companies House | Often blank if CRM was not set up to capture it |
| Job title | 40%–65% | Yes | Quality varies; titles need normalisation |
| Business email | 30%–60% | Yes | Often the highest-value field to enrich |
| Direct-dial number | 15%–40% | Yes | DDIs are harder to source than switchboard numbers |
| Mobile number | 10%–30% | Yes | Business mobiles increasingly available via public sources |
| LinkedIn URL | 10%–25% | Yes | High commercial value for ABM targeting |
| Employee count / revenue | 25%–55% | Yes, via Companies House | Annual filed accounts give employee and turnover bands |
| Duplicate records | 5%–15% of total | Resolve before enriching | Higher in CRMs with multiple data-entry points |
In our experience, the gap between what sales teams think their CRM completeness is and what the audit reveals is usually 15 to 25 percentage points. Email completeness feels like 70% because everyone enters an email at the point a deal is opened, but the CRM also holds thousands of older contacts, imported list segments, and inbound leads that were never fully qualified.
How the gap analysis shapes enrichment scope and price
Once you have your five dimensions measured, you can translate them into an enrichment brief. The brief answers four questions:
- Which fields need appending? Prioritise by commercial impact. Direct dials and business emails drive the highest return on enrichment cost for outbound campaigns.
- How many records are in scope? Your matchable universe (strong and good match-key tiers only). Give this number to your supplier, not the total CRM size.
- What is the realistic match rate? For a UK B2B file enriched against a well-sourced reference dataset, expect 40% to 75% match on email and 25% to 55% on direct dial, depending on job seniority and industry sector. C-suite contacts in regulated sectors typically have lower match rates than mid-level roles in sectors with high staff turnover.
- What fields need validation rather than appending? Records that already carry a value but may be stale need a verification pass, not an append. Email verification and telephone verification are priced differently from append work and should be scoped separately.
Supplying this brief means your supplier can provide a fixed price rather than an estimate. It also means you can compare quotes accurately because every supplier is working from the same scope. Without it, one supplier may quote on total CRM size and another on matchable records, and the numbers will be incomparable.
For more on what enrichment delivers commercially once the gaps are filled, see our piece on CRM enrichment ROI for UK B2B teams.
Deduplicating before enriching: the order matters
The order of operations is deduplication first, then gap analysis, then enrichment. Running enrichment before deduplication is a common mistake and a costly one. Consider a CRM of 25,000 contacts with 2,500 duplicates. You commission an enrichment project, pay for 25,000 appends, and the supplier returns a file with 18,000 enriched records. You then run your deduplication pass and merge 2,200 pairs down to 1,100 records, discarding the duplicate enriched copy each time. You have paid for 1,100 enrichments you immediately threw away.
Deduplication also improves match rate. When a supplier's matching algorithm encounters two records for the same individual, it sometimes splits the match evidence across both records rather than consolidating it. One record gets the email, the other gets the phone number, and neither gets both. Merging the records first presents the matching algorithm with a single, richer target, which raises confidence and completeness in the output.
Practical sequencing checklist
- Export all contacts from CRM to a flat file.
- Run completeness counts for each priority field.
- Run format-validity checks (email regex, phone format, postcode format).
- Classify records by match-key tier (strong, good, weak, unmatchable).
- Run duplicate detection (deterministic pass, then probabilistic pass).
- Resolve duplicates and re-import the clean file.
- Document gap analysis output (field by field, with counts and percentages).
- Submit clean file and gap analysis to enrichment supplier for scoped quote.
