How It Works

The normalisation engine

A seven-stage deterministic pipeline that transforms raw DC asset CSVs into clean, classified, enriched, and scored records — ready for any DCIM, ITSM, or CMDB platform.

01

CSV Pre-processing

Struktive detects and skips preamble rows, maps non-standard column headers to canonical fields using 200+ alias patterns (including 'Cab Location', 'Cabinet', 'Asset Tag', 'S/N'), and handles real-world quoting and encoding issues. The pre-processor returns a clean, typed record set before any normalisation begins.

  • Preamble row detection (title rows, blank rows, metadata blocks)
  • 200+ column header alias patterns for financial, lifecycle, and location fields
  • UTF-8, Latin-1, and Windows-1252 encoding detection
  • Resilient CSV parsing with configurable delimiter detection
02

Vendor Normalisation

A curated alias table maps 250+ manufacturer name variations, abbreviations, and acquisition-history names to canonical vendor names. The table is applied deterministically — no LLM involved at this stage — so every 'dell inc.', 'DELL INC', and 'Dell, Inc.' maps to 'Dell Technologies' consistently.

  • 250+ vendor alias rules with acquisition history (Brocade→Broadcom, Liebert→Vertiv, EMC→Dell Technologies)
  • Case-insensitive matching with punctuation normalisation
  • Confidence score: 1.0 for exact alias match, 0.85 for fuzzy match
  • Unknown vendors preserved as-is with confidence 0.5
03

Location Hierarchy Parsing

The location parser extracts a structured Site→Building→Floor→Room→Row→Rack→U Position from any free-text location format. It handles NetBox-style paths, structured colo codes (A-12-24 → Row A, Rack 12, U24), natural language, and partial inputs. A site default can be applied at upload time to fill gaps.

  • NetBox path format: 'NYC > DC1 > Row A > Rack 03 > U24'
  • Colo code format: 'A-12-24' → Row A, Rack 12, U24
  • Natural language: 'Rack 3, Row B, Floor 2, Building HQ'
  • Site default applied at upload time to records without explicit site
04

Asset Classification

A two-tier classification engine assigns every asset to one of eight categories: Compute, Storage, Networking, Power, Cooling, Infrastructure, Monitoring, or Security. Keyword rules handle 80% of assets deterministically. The remaining 20% — Liebert iCOM controllers, Fibre Channel switches, busways, custom appliances — are resolved by LLM inference with full confidence scoring.

  • 8 asset categories with 400+ keyword rules
  • Out-of-scope detection: VoIP phones, laptops, printers, label makers
  • LLM inference for ambiguous assets with confidence score
  • Classification confidence: High (≥0.85), Medium (0.65–0.84), Low (<0.65)
05

NetBox DeviceType Enrichment

Validated model records are matched against 5,000+ device definitions from the NetBox DeviceType Library. Matched devices receive U-height, weight, interface counts, and the canonical slug used for NetBox import. Unmatched devices are flagged for manual library creation.

  • 5,000+ device definitions from the NetBox DeviceType Library
  • Slug-based matching with manufacturer prefix handling
  • Enriches: U-height, weight, interface counts, device type slug
  • Unmatched devices flagged in pre-flight validation report
06

Duplicate Detection

A multi-signal duplicate detector identifies likely duplicates using exact serial number matching, fuzzy serial matching (edit distance 1, minimum 12 characters, excluding last-character-only differences to avoid sequential batch serial false positives), exact IP address matching, and vendor+model+rack+U composite key matching.

  • Exact serial number matching (case-insensitive, punctuation-normalised)
  • Fuzzy serial matching: edit distance 1, min 12 chars, excludes sequential batch serials
  • IP address duplicate detection
  • Vendor + model + rack + U composite key flagging
07

Quality Scoring

Every record receives a 0–100 quality score across seven weighted factors. Records scoring ≥70 are considered import-ready. The score drives the Data Quality Heat Map in the Capacity Summary and the exception thresholds in the Compliance Audit Pack.

  • Completeness: hostname, serial, model, vendor, location (weighted)
  • Classification confidence contribution
  • Location depth: site, rack, U position each add points
  • Duplicate flag penalty: −15 pts; low classification confidence: −10 pts

Quality score formula

Each record receives a score from 0 to 100. Records scoring 70 or above are considered import-ready. The score is capped at 100.

FactorPoints
Hostname present+5
Serial number present+15
Model present+10
Vendor normalised (exact match)+10
Site resolved+10
Rack resolved+10
U position resolved+5
Classification confidence ≥0.85+15
Classification confidence 0.65–0.84+8
NetBox DeviceType matched+10
Duplicate flagged−15
Out of scope0 (excluded)
≥85: Excellent
70–84: Import-ready
<70: Needs review

Column alias detection

Struktive maps non-standard column headers to canonical field names automatically. You don't need to rename your columns before uploading.

Canonical fieldRecognised aliases (sample)
hostnamehostname, host name, device name, asset name, server name, name
serialserial, serial number, serial no, s/n, sn, service tag
modelmodel, model number, model no, device model, part number
vendorvendor, manufacturer, make, brand, oem
locationlocation, loc, cab location, cabinet, rack location, physical location, site, position
ip_addressip, ip address, ip addr, management ip, mgmt ip, oob ip
ownerowner, owned by, cost centre, cost center, department, tenant
purchase_datepurchase date, purchased, acquisition date, install date, po date
warranty_expirywarranty, warranty expiry, warranty end, support end, eos date

See it in action

Upload the 250-row enterprise sample file to see the full pipeline output — including all seven report types.