The normalisation engine
A seven-stage deterministic pipeline that transforms raw DC asset CSVs into clean, classified, enriched, and scored records — ready for any DCIM, ITSM, or CMDB platform.
CSV Pre-processing
Struktive detects and skips preamble rows, maps non-standard column headers to canonical fields using 200+ alias patterns (including 'Cab Location', 'Cabinet', 'Asset Tag', 'S/N'), and handles real-world quoting and encoding issues. The pre-processor returns a clean, typed record set before any normalisation begins.
- Preamble row detection (title rows, blank rows, metadata blocks)
- 200+ column header alias patterns for financial, lifecycle, and location fields
- UTF-8, Latin-1, and Windows-1252 encoding detection
- Resilient CSV parsing with configurable delimiter detection
Vendor Normalisation
A curated alias table maps 250+ manufacturer name variations, abbreviations, and acquisition-history names to canonical vendor names. The table is applied deterministically — no LLM involved at this stage — so every 'dell inc.', 'DELL INC', and 'Dell, Inc.' maps to 'Dell Technologies' consistently.
- 250+ vendor alias rules with acquisition history (Brocade→Broadcom, Liebert→Vertiv, EMC→Dell Technologies)
- Case-insensitive matching with punctuation normalisation
- Confidence score: 1.0 for exact alias match, 0.85 for fuzzy match
- Unknown vendors preserved as-is with confidence 0.5
Location Hierarchy Parsing
The location parser extracts a structured Site→Building→Floor→Room→Row→Rack→U Position from any free-text location format. It handles NetBox-style paths, structured colo codes (A-12-24 → Row A, Rack 12, U24), natural language, and partial inputs. A site default can be applied at upload time to fill gaps.
- NetBox path format: 'NYC > DC1 > Row A > Rack 03 > U24'
- Colo code format: 'A-12-24' → Row A, Rack 12, U24
- Natural language: 'Rack 3, Row B, Floor 2, Building HQ'
- Site default applied at upload time to records without explicit site
Asset Classification
A two-tier classification engine assigns every asset to one of eight categories: Compute, Storage, Networking, Power, Cooling, Infrastructure, Monitoring, or Security. Keyword rules handle 80% of assets deterministically. The remaining 20% — Liebert iCOM controllers, Fibre Channel switches, busways, custom appliances — are resolved by LLM inference with full confidence scoring.
- 8 asset categories with 400+ keyword rules
- Out-of-scope detection: VoIP phones, laptops, printers, label makers
- LLM inference for ambiguous assets with confidence score
- Classification confidence: High (≥0.85), Medium (0.65–0.84), Low (<0.65)
NetBox DeviceType Enrichment
Validated model records are matched against 5,000+ device definitions from the NetBox DeviceType Library. Matched devices receive U-height, weight, interface counts, and the canonical slug used for NetBox import. Unmatched devices are flagged for manual library creation.
- 5,000+ device definitions from the NetBox DeviceType Library
- Slug-based matching with manufacturer prefix handling
- Enriches: U-height, weight, interface counts, device type slug
- Unmatched devices flagged in pre-flight validation report
Duplicate Detection
A multi-signal duplicate detector identifies likely duplicates using exact serial number matching, fuzzy serial matching (edit distance 1, minimum 12 characters, excluding last-character-only differences to avoid sequential batch serial false positives), exact IP address matching, and vendor+model+rack+U composite key matching.
- Exact serial number matching (case-insensitive, punctuation-normalised)
- Fuzzy serial matching: edit distance 1, min 12 chars, excludes sequential batch serials
- IP address duplicate detection
- Vendor + model + rack + U composite key flagging
Quality Scoring
Every record receives a 0–100 quality score across seven weighted factors. Records scoring ≥70 are considered import-ready. The score drives the Data Quality Heat Map in the Capacity Summary and the exception thresholds in the Compliance Audit Pack.
- Completeness: hostname, serial, model, vendor, location (weighted)
- Classification confidence contribution
- Location depth: site, rack, U position each add points
- Duplicate flag penalty: −15 pts; low classification confidence: −10 pts
Quality score formula
Each record receives a score from 0 to 100. Records scoring 70 or above are considered import-ready. The score is capped at 100.
Column alias detection
Struktive maps non-standard column headers to canonical field names automatically. You don't need to rename your columns before uploading.