How a data-driven approach improved training for a pharma company
A major pharmaceutical company needed to improve how its medical staff learned about new products. Their manual process for analyzing training tests was inefficient. We developed an AI system that pinpoints weaknesses in training materials, allowing for quick, precise improvements. The result was an increase in knowledge retention and a more efficient training process.
Challenge: Inefficient training and a guessing game for improvement
The global pharmaceutical company regularly trains its medical staff on new products, services, and technologies. After these sessions, employees take short tests to confirm their understanding.
This process generates large amounts of data, but the company lacked a way to use it effectively. Manually reviewing results was slow and labor-intensive, causing long delays between detecting knowledge gaps and updating the training materials. As a result, three major issues persisted:
- Low first-time success: The percentage of correct answers on the initial test attempt was lower than desired.
- Rigid content: Training followed a “one-size-fits-all” model, ignoring different levels of prior knowledge.
- High cognitive load: Materials were often unnecessarily complex, overwhelming learners and reducing knowledge retention.
Because the company couldn’t clearly identify which topics caused confusion, or whether issues stemmed from the test itself or the training content, improving the program was essentially a guessing game. This led to inefficient training, wasted resources, and frustrated employees.
They needed a clear, data-driven view of training performance to replace guesswork with actionable insights.
Solution: Pinpointing problems with AI
Before any development began, we conducted an AI Readiness Workshop with the client. This step was crucial. It helped them evaluate their data, identify capability gaps, and prioritize this specific training optimization project as a high-impact, feasible AI use case. The workshop provided a clear roadmap, allowing us to move directly from a validated concept to a production-ready system.
We then designed a system that replaces guesswork with precise, data-driven insights. The AI platform automates the collection and analysis of test results, flagging questions that fewer than 70% of employees answer correctly and linking them to the relevant training sections. Dashboards visualize all the key statistics, making gaps and trends immediately clear. The system then generates targeted recommendations for course authors, such as clearer wording or restructuring content, creating a continuous cycle of measurement, and faster refinement that directly improves training effectiveness.
Technology behind the solution
Our expertise in AI and data engineering for the Healthcare, Pharmaceutical, and Life Sciences sectors was key to building a robust system.
- Data collection: Automated ETL processes pull test results from the company’s Learning Management System (LMS) into a Snowflake data warehouse. We use incremental data loading to ensure seamless operation, prevent duplicates, and minimize the system's workload.
- Data processing & analytics: Using Python, we handle data preprocessing: cleaning, standardizing answer formats, and calculating metrics (response time, question difficulty, correct answer rate). SQL (Snowflake) aggregates statistics across user groups, topics, and difficult questions. Data quality checks are run automatically during data loading and before calculations to ensure reliability.
- AI engine: The algorithms identify underperforming questions and link them directly to the relevant training content. It then generates clear recommendations for authors.
- Visualization: Interactive Power BI dashboards allow learning managers to get a clear view of answer distributions, track learning progress over time, and compare results across different teams.
- Updates automation & CI/CD: The entire pipeline is automated using Jenkins, which orchestrates daily updates. This includes triggering the ETL processes, executing SQL procedures in Snowflake, and publishing the refreshed Power BI reports. This CI/CD approach ensures all data and dashboards are consistently up-to-date.
How the solution works
Value and results
The impact was both immediate and measurable.
- Improved learning outcomes: The percentage of correct answers on follow-up tests increased by 20-25%. The material became clearer and easier to understand.
- Faster iterations: The feedback cycle for course authors shrank from several weeks to just 2-3 days. This speed allows for constant, incremental improvements.
- Time savings: The automated analysis freed up hundreds of hours for instructional designers and training managers, who no longer had to manually process results.
- Higher engagement: Employees reported greater confidence. The refined materials reduced cognitive load, allowing them to learn more effectively.
Future perspectives
This system provides a strong foundation for the future of training within the company. We see clear paths for expansion:
- Predictive analysis: Integrating ML models that can analyze question wording before a test is deployed, predicting which items might be problematic.
- Instant feedback: The system could be integrated with a chatbot to give employees personalized study tips right after they complete a test.
- Broader application: The same framework can be scaled to other types of training across the organization, from compliance to new hire onboarding.