This analysis gave the team a clear view of how widespread duplicate company records were inside HubSpot and revealed the data completeness issues contributing to them. With Petavue’s structured process, the findings translate directly into improvements that support revenue teams and operational efficiency.
- Improved visibility into 2,773 duplicate groups, enabling targeted cleanup efforts.
- Clear identification of missing fields and inconsistent data entry patterns, paving the way for stronger CRM data integrity.
- Actionable recommendations that help you get to a more accurate and unified company database.
Craft the Plan
Petavue starts by designing a clear, reviewable plan for the analysis and confirming any key assumptions before it runs.
In this case, the initial goal was to find duplicate companies in HubSpot using company name and address. When Petavue inspected the HubSpot schema, it detected that Street Address wasn’t an available field in the Companies table. Instead of forcing a choice, it surfaced alternative options for the user:
- Company Name + City
- Company Name + State/Region
- Company Name + Postal Code
- Company Name only
Petavue then asked the user to choose which option they preferred for identifying duplicates.
The user selected Company Name + State/Region as the most meaningful combination.
With that clarification captured, Petavue generated a concrete analysis plan:
- Pull all records from the HubSpot Companies table (hubspot_companies)
- Use name and state/region as the grouping keys
- Count how many companies appear in each name + state/region group
- Filter to show only groups with more than one record (potential duplicates)
- Sort the groups in descending order by duplicate count
- Return both the grouped data and the counts for full visibility
Nothing is executed until the user approves this plan, so they know exactly what logic will be applied to their data.
Ensure Accurate Execution
After the user approved the plan, Petavue ran the analysis and validated the output.
The verification step surfaced key results:
- 2,773 groups of companies shared the same name and state/region
- Duplicate groups ranged from 2 to 6 records each
- The highest-duplicate company names included (each appearing 6 times):
- Hyatt LLC
- Bayer LLC
- Grimes and Sons
- 55.1% of records in these duplicate groups were missing State/Region, indicating a major data completeness issue
- States like Colorado, Utah, and Georgia showed the highest concentrations of duplicates
Petavue evaluates patterns, highlights missing or inconsistent data, and distinguishes between likely true duplicates and cases that may be legitimate multi-location companies.
Surface the Insights
Once the analysis is verified, Petavue packages the results into an insight-rich, action-ready view.
Key Findings
- Total duplicate groups: 2,773
- Typical group size: 2 records (median), with some as high as 6
- A high rate of missing State/Region values is making deduplication and segmentation harder
- Certain states show a disproportionate share of duplicate companies
Business Impact
Petavue explains what this means for day-to-day operations:
- Sales teams may contact the same company multiple times from different records
- Account ownership and reporting become fragmented
- Marketing campaigns can underperform when companies and contacts are split across duplicates
- Leadership dashboards and account-based metrics lose reliability
Recommendations
To close the loop, Petavue offers concrete next steps:
- Clean Up Duplicates – Merge the 2,773 duplicate groups, starting with those that have 5–6 records.
- Tighten Data Entry – Make State/Region a required field to lower the risk of incomplete records.
- Automate Ongoing Management – Turn on and tune HubSpot’s duplicate-detection and merge tools.
- Improve Process & Training – Educate CRM users to search for existing companies before creating new ones.
Supporting Table
Alongside the narrative, Petavue provides a detailed results table showing:
- Company name
- Company state/region
- Duplicate count for each group
Users can export this table directly into their cleanup workflows or use it to prioritize which duplicates to resolve first.