The Following dataset is available from Kaggle.com I used raw TXT which is typically how information would come from getting a file capture from VISTA or a Linux/Unix System
https://www.kaggle.com/datasets?search=human+resources

This data before being put into Excel or other location there are various transformation techniques straight to PowerBI Tableau. My method is Excel just because I like to be able to clean the data up a bit and it is my preferred method. You can download the raw text file here https://drive.google.com/file/d/1_WE-Gh1aKRFbEES5S8c3WtgL4BHY9E54/view?usp=drive_link

Most of the data is fairly clean there is no mis alignment in categories. Maybe some of the data needs to be put into proper format like Employee ID pay rates Zipcodes etc so that the data is consistent. I could create a macro so that if this is the only ETL method of using this data it is automated a bit faster. I could also use DAX coding to clean up the data as well. Usually I slow walk it just to examine the data. I also examine the data against the text file just to see how the pasting may have changed the data. The data set has 3311 Rows and 32 Columns.
Employee_Name, EmpID, MarriedID, MaritalStatusID, GenderID, EmpStatusID, DeptID, PerfScoreID, FromDiversityJobFairID, PayRate, Termd, PositionID, Position, State, Zip, DOB, Sex, MaritalDesc, CitizenDesc, HispanicLatino, RaceDesc, DateofHire, DateofTermination,_
Each of these fields need to be consistent and proper formatting
The Pay Rate was not in Dollar format the Zipgodes had missing zeros due to not being pasted in proper context I updated that as well. The same with the DOB. You can download the Excel file here already cleaned up. https://docs.google.com/spreadsheets/d/1_PUYroqcxkR0EdYNUh_uYa6DZMGxyNpJ/edit?usp=drive_link&ouid=110131933652277476625&rtpof=true&sd=true

Case Study Draft
Title: From Messy HR Data to Decision-Ready Insights
Context:
An HR dataset was provided as an evaluation/training tool. The file intentionally included multiple inconsistencies—mixed date formats, duplicate IDs, categorical codes alongside free-text descriptions, and performance scores bundled with personally identifiable information (PII).
Problem Statement:
Raw HR data is rarely analysis-ready. To derive reliable insights (turnover trends, performance outcomes, diversity metrics), data must be standardized, cleaned, and modeled into a relational structure.
Key Challenges in the Dataset:
- Dates: DOB, Date of Hire, and Termination had multiple formats (
MM/DD/YYYY,DD-MM-YY, text strings). - Identifiers: Columns like
MarriedID,PerfScoreID, andDeptIDlacked lookup tables, requiring manual mapping. - Categoricals: Redundant columns (e.g.,
GenderID+Sex) had conflicting values. - Structural risk: Sensitive fields (DOB, Race, Marital Status) were co-mingled with performance data—unrealistic in production, but useful for training.

0) Ingest (from TXT to a query you can govern)
- Source:
HR Data.txt(tab-delimited HR training set with DOB/Hire/Term, pay, performance, etc.).
Action: Power BI → Get data → Text/CSV → delimiter set correctly → Transform Data (opens Power Query).
1) Power Query (ETL you can refresh)
Data type normalization
- Dates were chaos (
DD-MM-YY, slashes, text). You fixed them with Change Type → Using Locale…- Parsed the “Old” text dates (DMY) and coalesced them into real
Datecolumns (DOB, DateofHire, DateofTermination).
- Parsed the “Old” text dates (DMY) and coalesced them into real
PayRate→ Decimal Number (stripped symbols where needed).
Lookup & recoding
- Built small dimension/lookup tables (Marital Status, Department, Performance Score) and replaced raw IDs with readable labels.
Standardization
- Gender:
M/F→Male/Female. - States: normalized to 2-char codes.
- Trimmed whitespace, fixed odd casing.
Derived features (still in PQ)
Salary Annual=PayRate * 2080.TerminationFlag= not blankDateofTermination.- Optional:
Tenure (Months)usingDate.MonthDifference([DateofHire], COALESCE([DateofTermination], Today)).
Close & Apply → model loads cleanly. Data contract achieved.
2) Data model (DAX that leadership can slice)
Calendar (disconnected)
- Modeling → New table:
Date =
CALENDAR(
MINX(Clean, Clean[DateofHire]),
MAX( TODAY(), MAXX(Clean, Clean[DateofTermination]) )
)
- Columns:
YearMonth = FORMAT('Date'[Date],"YYYY-MM")MonthStart = DATE(YEAR('Date'[Date]), MONTH('Date'[Date]), 1) - Mark as date table (Table tools).
Core measures (headcount/turnover)
Headcount EOP =
VAR d = MAX(‘Date'[Date])
RETURN COUNTROWS(
FILTER(Clean, Clean[DateofHire] <= d && (ISBLANK(Clean[DateofTermination]) || Clean[DateofTermination] > d))
)
Terminated (Period) =
VAR d1 = MIN(‘Date'[Date])
VAR d2 = MAX(‘Date'[Date])
RETURN COUNTROWS(
FILTER(Clean, NOT ISBLANK(Clean[DateofTermination]) &&
Clean[DateofTermination] >= d1 && Clean[DateofTermination] <= d2))
Avg Headcount (Period) =
AVERAGEX(VALUES(‘Date'[MonthStart]), [Headcount EOP])
Turnover Rate = DIVIDE([Terminated (Period)], [Avg Headcount (Period)])
Terminated (12M) =
CALCULATE([Terminated (Period)],
DATESINPERIOD(‘Date'[Date], MAX(‘Date'[Date]), -12, MONTH))
Avg Headcount (12M) =
CALCULATE([Avg Headcount (Period)],
DATESINPERIOD(‘Date'[Date], MAX(‘Date'[Date]), -12, MONTH))
Turnover Rate (12M) =
DIVIDE([Terminated (12M)], [Avg Headcount (12M)])
Format rates as % (1 decimal); counts as whole numbers; currency as $ with units.
3) Visual build (from data to story)
Executive snapshot page (your screenshot)
- Treemap: Average of Salary Annual by Department.
- KPI cards:
Sum Salary Annual,Average Salary Annual, Headcount (ideallyDISTINCTCOUNT(EmpID)).- Cards formatted with Display units = Millions/Thousands, 0–2 decimals.
- Slicers:
- Time:
Date[MonthStart]set to Between (orYearMonthdropdown sorted byMonthStart). - Department/Manager slicers for quick segmentation.
- Time:
- Turnover trend page: Line chart with
Turnover Rate (12M)byMonthStart(Continuous), plus the same slicers.
Interactions
- If a visual looked blank, you used Format → Edit Interactions so the treemap doesn’t over-filter sibling visuals.
4) QA & governance (so no one torpedoes your readout)
- Spot-check: pick a month → Termination rows in the data vs. Terminated (Period).
- Exclude statuses like “Future Start/Has not started yet” from headcount until hire date (EOP measure already guards this).
- Keep the whole flow refreshable: one click Refresh re-runs PQ + DAX on the raw file.
5) What you shipped, in business terms
- A single source of truth: ~35 standardized fields, ready for slicing by Department/Manager/Source/DEI attributes.
- Turnover telemetry that leadership can trust (period and rolling-12).
- Compensation lens (sum/avg salary, by org unit) that pairs with the turnover view for action.
You can download a PBIX file for the PowerBI slide here https://drive.google.com/file/d/11SvAzb3bMLuF9Z4v8pdgrgU7JifLVQU0/view?usp=drive_link