All articles Data Strategy

The four data challenges that break category management workflows

Conceptual image of data streams being reconciled into clean category intelligence

The gap between how category management data is supposed to work and how it actually arrives is one of those problems that rarely gets discussed at the same seniority level as the decisions it affects. Range reviews happen. Range decisions get made. And somewhere underneath all of that, a category analyst has spent three hours on Monday morning untangling POS exports before the real analysis could begin.

This piece is about the structural data problems that slow down category management workflows — not the occasional one-off data quality issue, but the recurring, predictable failure modes that every team encounters when working across POS systems, EDI feeds, and distributor exports. Understanding these problems clearly is prerequisite to designing around them.

Timing desynchronisation across data sources

A category manager working with data from four sources — a POS export from their retail system, a weekly distributor sales file, a monthly panel report, and a promotional uplift feed from their promotional planning tool — is working with four different representations of the same underlying reality, measured at four different points in time.

The POS export might cover sales through Tuesday. The distributor file covers the prior full week. The panel report covers the prior four weeks. The promotional feed is current but covers only the promoted lines. When you try to combine these to understand what's happening in a category right now, you're not adding up consistent signals — you're reconciling temporally misaligned views of the same market.

The practical impact is that trend analysis becomes unreliable at the boundaries. A spike in sell-through that appears on the distributor file might have already normalised by the time the category manager sees it, because the POS export they're comparing it against is already three days newer. Weeks where the timing boundaries between sources fall differently — around holidays, promotional events, or year-end — create consistent analytical blind spots that experienced analysts learn to work around manually, rather than solve structurally.

EDI feed gaps and what they actually represent

EDI transaction data is one of the most reliable signals in retail supply chains for what was ordered — but it is frequently misread as a signal for what was sold. These are different things, and the gap between them is meaningful for category analysis.

An EDI 850 purchase order tells you what a buyer ordered from a supplier. An EDI 856 advance ship notice tells you what shipped. The delta between the two — the fill rate — is important but separate from whether those goods reached a shelf and sold through to a consumer. For category managers trying to understand true sell-out velocity, using EDI inbound orders as a demand proxy conflates supply chain activity with consumer demand.

The more acute problem is that EDI feeds are frequently incomplete even within their own scope. Large grocery retailers often operate across multiple EDI formats, some legacy, some current, some partially migrated. A supplier might have full EDI visibility into one banner's distribution centres but only email-based reporting from another. A distributor serving independent retailers might have consolidated weekly sales data but no SKU-level granularity. When an analyst tries to assemble a category-wide picture from these sources, the data coverage is uneven in ways that aren't always visible from the outside — gaps look like quiet periods until someone manually verifies them.

SKU proliferation and the deduplication burden

Every retailer that manages their own private label estate alongside branded ranges knows the deduplication problem firsthand. The same product can appear in a single retailer's data under multiple item numbers — the result of system migrations, regional range variations, packaging changes, or simply inconsistent item setup at the time of introduction.

A chilled ready meal that launched as item 4471829 gets reformulated and relaunched as 6238044. Both items may still appear in historical POS data as separate lines. An analyst running a velocity analysis across the range needs to know these are the same product for the comparison to be valid. If the deduplication hasn't been done cleanly — and in most retail data environments, it hasn't — the analysis either double-counts the reformulated product or understates the original line's contribution, depending on which direction the error runs.

For category management specifically, the deduplication problem is most acute during range rationalisation reviews. When a category manager is trying to identify underperforming SKUs for potential delisting, the velocity data they're reviewing needs to be at the true product level, not the item number level. A SKU that appears to have declining velocity because it was re-coded halfway through the measurement period will generate a false delisting signal. Over time, these false signals erode confidence in the data, and category managers start overriding the analytical output with personal judgment — not because their judgment is better, but because they know the data isn't right.

Distributor data contradictions and why they happen

A scenario that comes up with predictable regularity: a category manager receives monthly sales data from two regional distributors covering overlapping territories. The data covers the same period, nominally the same channels, and materially the same SKU set. The aggregate sell-out figures don't reconcile — one distributor shows a 7% volume increase in a subcategory where the other shows flat performance.

Before concluding that the market is behaving differently in different parts of the territory (which is possible but rarely the whole explanation), consider the more common causes: different stock-holding behaviours, different cut-off dates disguised by identical reporting period labels, different treatment of returns and credits, or different interpretations of what counts as "sold" versus "dispatched to outlet." Distributor reporting conventions vary widely, and the conventions are rarely documented.

We're not saying distributor data is unreliable — it's often the most granular sell-in signal category teams have access to. What we're saying is that distributor data without explicit reconciliation logic applied should not be treated as directly comparable across sources. The contradictions are not random noise; they have identifiable causes, and once those causes are mapped, the data becomes usable even before it's perfect.

The manual reconciliation tax

The cumulative cost of the problems described above is what might be called a reconciliation tax on category management. It's the time spent every week — commonly four to six hours for a mid-size category team — doing data preparation work that contributes no analytical insight. Deduplicating item numbers. Cross-referencing distributor and POS timing windows. Flagging EDI gaps to trading partners. Chasing a data contact at a distributor for a corrected file.

This time is largely invisible in the category management process because it happens before the "analysis" officially starts. It doesn't appear in the range review deck. It doesn't surface in the meeting with buyers. But it shapes what gets analysed: when reconciliation takes three hours, there are topics that simply don't get investigated because there isn't time. Not because the category manager doesn't know they matter, but because the data infrastructure makes them too expensive to reach.

The structural response to this problem isn't to hire faster analysts. It's to build or adopt a data layer that handles the timing reconciliation, deduplication, and gap-filling logic systematically, before category managers interact with the data. The analytical output is only as trustworthy as the foundation it rests on — and most category management tools are built on the assumption that the data arriving into them is already clean, which in real retail environments, it reliably is not.

What "good enough" data actually requires

A category manager doesn't need perfect data to make good range decisions. They need data that is good enough to distinguish signal from noise — which is a lower bar than perfection, but a substantially higher bar than most unprocessed retail data feeds provide by default.

Good enough, in practice, means: consistent measurement windows across sources, deduplicated SKU records, explicit documentation of coverage gaps rather than silent zeros, and a clear distinction between sell-in and sell-out signals. None of these require exotic technical infrastructure. They require deliberate design decisions at the data pipeline level — decisions that prioritise the analytical consumer's needs over the convenience of the source system's native export format.

The teams that operate well despite imperfect data have usually developed those habits over time, through hard-won knowledge of where their specific data stack fails and how to compensate. The question worth asking is whether that accumulated knowledge should live in the heads of individual analysts, or whether it should be systematised into the intelligence layer that the whole category team relies on.

More from the blog

See how Zenline handles your data stack

We scope every integration personally — connect your data and get live intelligence in under 48 hours.