Database normalisation: a solution for HR data quality issues

Jul 2, 2021 | Analytics

Chronic data quality problems? 

 

 

“What’s wrong with our data?” or  “We’ve been through several data cleanses but still we are getting problems” or “We hired on a data car to fix this problem, but it still happens – what are we doing wrong?” I’ve heard it in many organisations, where HR people ask “can we hire a consultant to fix this, or as I heard one Chief People Officer (who shall remain nameless) say more despairingly “how come our HR data is such a mess?”. There are many possible reasons – to name a few –

 

  • Line managers fail to advise HR of changes e.g. employee surname, change of address, marital status, promotion or transfer, left the company, etc – expecting that HR should somehow mind-read what happens in the lives of hundreds or thousands of employees
  • Line managers advise HR of changes, but the system is not updated to reflect the changes
  • Maybe you’re new into the situation, and finding that there’s a legacy of data mismanagement that you have to deal with
  • The HR data cleanses that happen regularly were only partially accurate, and there are still many data items out of date
  • Many of the current employees were TUPE’d in but the new data was either incorrectly or not fully captured

 

And there might be many other reasons. But suppose none of these things was ever a problem, and the data still isn’t right? What then? Well, it might be. something deeper, something  more structural. If you truly want to get a long term solution, it helps if you understand some of fundamental data and data quality concepts – and in this piece I’ll explore the issue of data normalisation.

 

 

“How come our HR data is such a mess?” – Chief People Officer

 

 

This might at first strike you as a more technical topic than most Compensation and HR professionals are likely to encounter or to require knowledge of, but if you have chronic data quality problems in your HRIS, the reason might be that the database tables were not set up using referential integrity (to keep the bad data out) and data normalisation in the first place. I’ve seen this happen even in a company that make their money from selling HR software, and the paying clients are often left scratching their heads, simply puzzled by what’s going wrong. HRIS people probably need to know this stuff though – especially if buying a new HR system.

 

 

Database normalisation – or data normalisation – helps to organise the contents of the tables in your HR database or in your data warehouse. It’s a fundamental part of successful database design, and if it’s ignored, your HR database can be inaccurate, perform slowly and inefficiently, and may often fail to reduce the data you expect. I’ll go into some detail of database normalisation – so if this helps you in your work, read on and nerd out.

 

Think a new HR system would solve your problems?

 

 

You might have a database system that your company bought from a mainstream vendor, or you might have to create a database to hold the data. How you create a database is really a whole other topic, but the database tables (which are 2-dimensional structures using rows and columns) need to be set up using referential integrity to enforce the rules on what constitutes valid data. This is done to keep illogical, impossible or bad data out.

 

If you’re more accustomed to spreadsheets like Excel, know that a database table looks like a spreadsheet, but also know that there are many tables in the database system. You might have one table called EMPLOYEE that contains employee first name, employee last name, home address, phone number and so on, with each bit of data living in a cell at the intersection of a row and column. You might have another table called POSITION that contains job title, job family, grade, salary minima and maxima, and so on. These tables, EMPLOYEE and POSITION, would be linked in a relationship where one POSITION might be performed by many EMPLOYEEs.

You’ll need to understand a couple of terms –

 

“…and nothing but the Key”

The following conditions have to be satisfied for us to say that the database is in 3NF –

  • It is in second normal form
  • There is no transitive functional dependency

 

What do we mean by transitive functional dependency? It means we have the following relationships in the table: A is functionally dependent on B, and B is functionally dependent on C. In this case, C is transitively dependent on A via B. The columns depend on primary key and nothing else. Hence “The Key, the Whole Key, and Nothing but the Key”.

 

Consider another example –

 

JOB_DETAILS

 

Job_ID Contract_ID Contract_Type Salary_Band
3074 1 FT Perm 14
3075 2 FT Fixed term 16
3076 3 PT Perm 20
3077 4 PT Fixed Term 8
3078 1 FT Perm 8

 

In the above table, Job_ID determines Contract_ID and Contract_ID determines Contract_Type. Therefore Job_Id determines Contract_Type and we have transitive functional dependency, which means that this structure doesn’t satisfy 3NF.

 

To bring this table to third normal form, we split the table into two as follows:

SALARY_BAND

Job_ID Contract_ID Salary_Band
3074 1 14
3075 2 16
3076 3 20
3077 4 8
3078 1 8

 

and…..

CONTRACT_TYPE

Contract_Type Contract_Type
1 FT Perm
2 PT Perm
3 FT Fixed Term
4 PT Fixed Term

 

Now all the non-key attributes are fully functional dependent only on the primary key. In the table JOB_DETAILS, both the Job_ID and Salary_Band are only dependent on Job_ID. In the table SALARY_BAND, Contract_Type is only dependent on Contract_ID.

 

 

 

OK, but I work in HR and you’re bamboozling me with tech stuff…

 

Not really. That’s kind of a weird attitude born of techno-fear. There’s a real benefit in learning about database normalisation and SQL though.Why be dependent on your IT team to grasp what data you need to add value in your job? Even if you aren’t working with data on a daily basis, learning database normalisation and SQL will probably help you to become a more systematic thinker. As with any programming language, you have to be logical and structured, and you have to develop solutions to problems. Knowing this stuff can also help you in areas like project management, procedure development and communications. Database normalisation and SQL are much easier for the average person with no coding knowledge – basically it’s simpler than programming languages, and most people have a basic grasp of how tables work.

 

Conclusion

 

Recent posts

Is this the “Moneyball” moment for HR & Reward?

Is this the “Moneyball” moment for HR & Reward?

    The movie Moneyball tells the real-life story of how a new manager at baseball club Oakland Athletics, Billy Beane used analytics to make massive improvements in the performance and results. Guys with spreadsheets and maths degrees showed their approach...

Who drafts the Job Description?

Who drafts the Job Description?

Writing job descriptions   Question is - how do you get consistently well-written, high-quality and compelling job descriptions for all the roles in your organisation? Back when I was starting out in HR, I had to conduct job analysis interviews, and capture what...