1

Data Audit & Data Management

The Impact

This analysis gave the team a clear view of how widespread duplicate company records were inside HubSpot and revealed the data completeness issues contributing to them. With Petavue’s structured process, the findings translate directly into improvements that support revenue teams and operational efficiency.

Improved visibility into 2,773 duplicate groups, enabling targeted cleanup efforts.
Clear identification of missing fields and inconsistent data entry patterns, paving the way for stronger CRM data integrity.
Actionable recommendations that help you get to a more accurate and unified company database.

DATA SOURCE

Salesforce

HubSpot

Craft the Plan

Petavue starts by designing a clear, reviewable plan for the analysis and confirming any key assumptions before it runs.

In this case, the initial goal was to find duplicate companies in HubSpot using company name and address. When Petavue inspected the HubSpot schema, it detected that Street Address wasn’t an available field in the Companies table. Instead of forcing a choice, it surfaced alternative options for the user:

Company Name + City
Company Name + State/Region
Company Name + Postal Code
Company Name only

Petavue then asked the user to choose which option they preferred for identifying duplicates.

The user selected Company Name + State/Region as the most meaningful combination.

With that clarification captured, Petavue generated a concrete analysis plan:

Pull all records from the HubSpot Companies table (hubspot_companies)
Use name and state/region as the grouping keys
Count how many companies appear in each name + state/region group
Filter to show only groups with more than one record (potential duplicates)
Sort the groups in descending order by duplicate count
Return both the grouped data and the counts for full visibility

Nothing is executed until the user approves this plan, so they know exactly what logic will be applied to their data.

02 / VERIFY

Ensure Accurate Execution

After the user approved the plan, Petavue ran the analysis and validated the output.

The verification step surfaced key results:

2,773 groups of companies shared the same name and state/region
Duplicate groups ranged from 2 to 6 records each
The highest-duplicate company names included (each appearing 6 times):
- Hyatt LLC
- Bayer LLC
- Grimes and Sons
55.1% of records in these duplicate groups were missing State/Region, indicating a major data completeness issue
States like Colorado, Utah, and Georgia showed the highest concentrations of duplicates

Petavue evaluates patterns, highlights missing or inconsistent data, and distinguishes between likely true duplicates and cases that may be legitimate multi-location companies.

03 / PRESENT

Surface the Insights

Once the analysis is verified, Petavue packages the results into an insight-rich, action-ready view.

Key Findings

Total duplicate groups: 2,773
Typical group size: 2 records (median), with some as high as 6
A high rate of missing State/Region values is making deduplication and segmentation harder
Certain states show a disproportionate share of duplicate companies

Business Impact

Petavue explains what this means for day-to-day operations:

Sales teams may contact the same company multiple times from different records
Account ownership and reporting become fragmented
Marketing campaigns can underperform when companies and contacts are split across duplicates
Leadership dashboards and account-based metrics lose reliability

Recommendations

To close the loop, Petavue offers concrete next steps:

Clean Up Duplicates – Merge the 2,773 duplicate groups, starting with those that have 5–6 records.
Tighten Data Entry – Make State/Region a required field to lower the risk of incomplete records.
Automate Ongoing Management – Turn on and tune HubSpot’s duplicate-detection and merge tools.
Improve Process & Training – Educate CRM users to search for existing companies before creating new ones.

Supporting Table

Alongside the narrative, Petavue provides a detailed results table showing:

Company name
Company state/region
Duplicate count for each group

Users can export this table directly into their cleanup workflows or use it to prioritize which duplicates to resolve first.