The normalisation engine
A seven-stage pipeline that transforms raw DC asset CSVs into clean, classified, enriched, and scored records — ready for any DCIM, ITSM, or CMDB platform.
Output ready for
CSV Pre-processing
Struktive detects and skips preamble rows, maps non-standard column headers to canonical fields using an extensive alias library, and handles real-world quoting and encoding issues. The pre-processor returns a clean, typed record set before any normalisation begins.
- Preamble row detection (title rows, blank rows, metadata blocks)
- Broad column header alias coverage for financial, lifecycle, and location fields
- Multi-encoding detection (UTF-8, Latin-1, Windows-1252)
- Resilient CSV parsing with automatic delimiter detection
Vendor Normalisation
A curated alias table maps hundreds of manufacturer name variations, abbreviations, and acquisition-history names to canonical vendor names. The table is applied deterministically — so every variation of a vendor name maps to the same canonical form consistently.
- Extensive vendor alias rules including acquisition history (e.g. legacy brand names mapped to current parent companies)
- Case-insensitive matching with punctuation normalisation
- Confidence-scored output: exact match, fuzzy match, and unknown tiers
- Unknown vendors preserved as-is with a low-confidence flag
Location Hierarchy Parsing
The location parser extracts a structured Site → Building → Floor → Room → Row → Rack → U Position from any free-text location format. It handles NetBox-style paths, structured colo codes, natural language, and partial inputs. A site default can be applied at upload time to fill gaps.
- NetBox path format support: 'NYC > DC1 > Row A > Rack 03 > U24'
- Structured colo code parsing
- Natural language location extraction
- Site default applied at upload time to records without explicit site
Asset Classification
A multi-tier classification engine assigns every asset to one of eight categories: Compute, Storage, Networking, Power, Cooling, Infrastructure, Monitoring, or Security. Deterministic rules handle the majority of assets. Ambiguous assets — edge-case controllers, hybrid appliances, custom hardware — are resolved by AI inference with full confidence scoring.
- 8 asset categories covering the full DC infrastructure stack
- Out-of-scope detection: non-DC assets (laptops, VoIP phones, printers) are excluded
- AI inference for ambiguous assets with confidence scoring
- Classification confidence reported as High, Medium, or Low per record
NetBox DeviceType Enrichment
Validated model records are matched against thousands of device definitions from the NetBox DeviceType Library. Matched devices receive U-height, weight, interface counts, and the canonical slug used for NetBox import. Unmatched devices are flagged for manual library creation.
- Matching against 5,000+ device definitions from the NetBox DeviceType Library
- Slug-based matching with manufacturer prefix handling
- Enriches: U-height, weight, interface counts, device type slug
- Unmatched devices flagged in pre-flight validation report
Duplicate Detection
A multi-signal duplicate detector identifies likely duplicates using serial number matching, fuzzy serial matching with sequential batch serial exclusion logic, IP address matching, and composite key matching across vendor, model, rack, and U position.
- Exact serial number matching (case-insensitive, punctuation-normalised)
- Fuzzy serial matching with sequential batch serial exclusion logic
- IP address duplicate detection
- Vendor + model + rack + U composite key flagging
Quality Scoring
Every record receives a 0–100 quality score across multiple weighted factors covering completeness, classification confidence, location depth, and data integrity signals. Records scoring 70 or above are considered import-ready. The score drives the Data Quality Heat Map in the Capacity Summary and the exception thresholds in the Compliance Audit Pack.
- Multi-factor scoring: completeness, classification confidence, location depth, integrity signals
- Proprietary weighting reflects DCIM ingestion requirements
- Score thresholds: ≥85 Excellent · 70–84 Import-ready · <70 Needs review
- Duplicate and out-of-scope flags applied as score penalties
Quality score factors
Each record receives a score from 0 to 100 based on multiple weighted factors. Records scoring 70 or above are considered import-ready. The exact weighting across factors is proprietary and reflects the requirements of DCIM, ITSM, and CMDB ingestion pipelines.
Column alias detection
Struktive maps non-standard column headers to canonical field names automatically. You don't need to rename your columns before uploading.
Request a sample report
Not ready to upload your own data? We'll run a representative DC asset dataset through the full pipeline and email you the complete output — all seven report types included.