Updated: May 12
A Human-Centered and Data Science Focus on Organizational Assessments and Process Improvement
In today’s professional world, especially in such a virtual environment, the concept of a “suggestion box” has been abandoned as leaders often view it as an ineffective way to promote change. However, the lack of a feedback mechanism can result in company leadership missing improvements that could affect their bottom line every year. Without emphasis on such change, issues would not be flagged early and lead to deeper and long-term organizational challenges.
Capturing the details and addressing systemic issues that often go unresolved requires a revolutionized interpretation of the suggestion box that can be used by organizational leaders. Such a solution requires a trinity of effective management: A framework that combines a Lean Six Sigma (LSS) focus, human-centered design, and a data science perspective to organizational evaluations.
Level 1: Lean Six Sigma (LSS) principles
Process feedback from employees and stakeholders often goes unheard in large organizations not only because of the complexity in quantifying the recommendations, but also because the feedback often only improves the daily operations of a small group of stakeholders. Only after performing a thorough and holistic process evaluation can one determine the relevancy of each these recommendations to the greater organization. However, evaluating the feasibility of every suggestion is not practical as it requires significant resources.
As process improvement experts, we are enamored by layers of qualitative and quantitative data that we collect and analyze. We build process maps and apply DMAIC to each of our evaluations. We remove process waste to help clients hit key performance indicators (KPIs) and quarterly values. However, we often get so engulfed in the data that we may not always account for the relationships between people, processes, and technology and the role they play in achieving an organization’s goals. In our attempt to quantify data as it relates to processes and profitability of an organization, the resulting recommendations and findings may be limited as a result of the type of data that is collected, who it is collected from, the transparency and anonymity the data allows for, and the holistic impact the evaluation will have on the firm.
Level 2: Human-Centered Design
To expand our focus of a data-driven evaluation, we introduced an additional layer to the evaluation with human-centered design. This design methodology uses behavioral and scientific principles to modify processes and organizational design with the people in mind, including designing based on the tendencies, comforts, preferences of each person it affects. Applying this mindset to process improvement and change management provides greater and dynamic insights to the end-users by allowing process experts to understand the context in which suggestions/ improvement recommendations were made. Practicing human-centered design methods leads to deeper collaboration and improved solutions, resulting in additional layers of meaning through multiple perspectives of an organization. Given the culture of a firm is driven by human factors, it is imperative that it be a focus of an organizational process improvement.
Level 3: Data Science Concepts
Data science tools also provide useful means of working with data to optimize efficiencies and minimize the lack thereof. We will be evaluating a case study to demonstrate resources that can be applied towards Human Resource’s suggestion box.
Company X is a domestic manufacturing company with about 25,000 employees. Throughout the course of the year, the company collects suggestions from employees via physical suggestion boxes as well as an online platform. Over the past 5 years, the company has collected roughly 80,000 suggestions. However, the team responsible for reading these suggestions to identify key issues is less than 20 people and has faced massive problems with manually processing all these responses.
Given Company X’s problem, our goal is to build a system that can help the administrative team responsible for collecting and analyzing suggestions to process and organize suggestions while autonomously identifying key firmwide issues more efficiently.
Three key Machine Learning (ML) components can be used to solve Company X’s problem: 1) Optical Character Recognition (OCR), 2) Natural Language Processing (NLP), and 3) Topic Modeling. The following content discusses how these methods can be applied to solve a common human resource problem.
Optical Character Recognition (OCR)
OCR is a tool that mimics a human’s ability to read. This technology is similar to how humans use eyes as means of rendering visual inputs for the brain to interpret. OCR enables the recognition of characters through an optical mechanism. The OCR engine enables the user to convert different types of document inputs captured by a digital scanner then renders the texts into an editable and searchable data. The file inputs can range from hand-written text to PDF files or images captured by a digital camera. This capability is key for our case study as Company X’s inputs include both hand-written text as well as digital forms.
How it works
There are several OCR tools and methodologies available today. However, there are six key steps to the OCR workflow. The steps include 1) image acquisition, 2) pre-processing, 3) segmentation, and 4) feature extraction 5) training a neural network 6) OCR model refinement [Figure 2.]
The primary step in OCR is to first acquire the text data as scanned images. In the case of Company X, there are five (5) years of text data that would need to be scanned and organized into a central corpus. After the text data is uploaded, the following step involves prep-processing the corpus in order to make raw data computer friendly. In this case, hand-written feedback might need more cleaning and standardization than electronic texts. Extracting clean characters is useful for the next step where characters are grouped and segmented to form meaning. Feature extraction then takes the grouped character extracts and splits them into recognizable patterns. As a result, the characters are classified into classes to form a robust dataset. This is useful for training a Neural Network (NN) which is used to train and recognize characters. Finally, the post-processing phase is key to any OCR methodology as revisiting the step above and refining elements is useful for a reliable and accurate output.
This tool would help the administrative team minimize manual parsing and organizing of all 80,000 recommendations by cutting down the amount of time it would take to extract useful information. One of the most popular OCR platforms used by developers today is known as Tesseract. It is an open-source tool with world-wide development team that can be used as a resource to learn more about OCR and its wide range of applications.
Natural Language Processing (NLP)
NLP takes the OCR process a step further by making relevant context in a given corpus recognizable. The core of NLP is to understand the rules of a given language. This tool helps computers understand the natural language by organizing thought process, representing and giving meaning to a linguistic input, then recognizing the relationship of a given word/phrase as it relates to the context of the entire corpus. NLP is also applicable to an entire sentence including an entire document where it can be used to determine grammar, order, and meaning. This has a useful application to our case study as parsing through a large dataset is going to require not only character recognition but also meaning extraction as it relates to a given context. In this case, the same word or phrase might have different meanings depending on the sentiment of the document’s context.
How it works
NLP is one of the natural language processing tools used to structure large bodies of texts in order to retrieve knowledge that can be used for a specific purpose. As with OCR, there are several approaches to the NLP workflow. Key elements include, 1) Exploratory Data Analysis (EDA), 2) text processing, 3) feature engineering, and 4) topic modeling.
Some of the exploratory tools used in NLP include filtering regular expressions, “regex”, to identify repeated patterns in a given text. This step also allows for a manual removal of punctuation and undesired characters or symbols. This helps eliminate character noise for the purpose of generating a more robust dataset. Another common method of EDA includes removing stop words. Examples of stop words include conjunctions and prepositions like “the”, “is”, and “are”. Stop words are usually removed as they provide minimal semantic value. Tools like Natural Language Toolkit (NLTK) is popular for stop word manipulation.
Text pre-processing usually follows the EDA workflow now that we are ready to tokenize and vectorize the clean data. Tokenization involves standardizing punctuation and capitalization. One way to generate an accurate frequency count of words is to lowercase the entire document. This will prevent the same word with different punctuation and capitalization from being counted separately. Word vectorization is then used to determine the frequency of each word in a given text. This step provides insight to the pattern of semantics in a large corpus. This is especially useful to our case study as it helps identify key words and sentiments that might be common across our dataset. In addition to tokenizing words, there are feature engineering methods that can be applied to tokenize adjacent words that have a single meaning like “social media” and “come across”. There are several other methods of pre-processing text that can be applied based on the volume and context of a given corpus. However, the main purpose of pre-processing is to automatically take a raw and clean text input and transform it into an output in a language the computer will understand. The steps above are then used to generate a reliable dataset in order to apply Machine Learning (ML) applications and classify the outputs based on pre-defined classes. Based on the insight the analytics team might be interested in, classes can be broken down by categories including, operations, accounting, IT, etc…Given the five-year span of the data collected, an annual or quarterly breakdown might also provide additional information.
Topic modeling is a form of unsupervised machine learning used to mine valuable information from a large corpus. This procedure incorporates NLP and machine learning along with statistical analysis methodologies to efficiently provide useful insight. Unlike supervised modeling where the class outputs are fixed in advance, topic modeling provides significant insight without prior class definition. Topic modeling brings together all the key pre-processing components mentioned above to help provide information that go beyond the limitations of supervised learning.
How it works
As discussed above, topic modeling is heavily reliant on how well a given corpus is pre-processed. This method incorporates NLP algorithms to mine contextually dependent meanings from words, phrases, sentences, as well as entire documents. The pre-processed raw text files and their NLP-generated findings are combined with their indexed outcomes and filtered through data topic modeling algorithms. There are several unsupervised models and open-source tools that can be applied to a large corpus, however, as shown in Figure 3., there are generally four key elements to generate an output:
This methodology also incorporates methods to evaluate the quality and performance of the output. This is a useful step in ensuring the accuracy of the algorithm. Statistical measures such as F-score and precision(P) are one of the many ways to determine integrity. F-score measures the success level of the output retrieval while precision is a metric that measures the fraction of retrieved recommended items as it relates to the actual relevant items. This method allows for an open-ended analysis to derive knowledge that might otherwise be overlooked through supervised learning. Furthermore, topic modeling requires minimal supervision while incorporating additional inputs to refine future outputs.
In summary, state-of-the-art tools like machine learning can be useful in alleviating human resource restrictions. Although there is a staff of 20 allocated to process and analyze 5 years worth of data, a machine learning system can provide users means to efficiently scan suggestions, ingest, and autonomously classify outputs that would have otherwise been overlooked through a manual approach.
Applying the trinity of effective management results in a holistic evaluation. Involving engineers and designers within the process improvement isn’t a revolutionary idea. But how often have you included insight from anthropologists and human behavioral psychologists? What about data scientists? These should be a critical factor in all strategic initiatives to improve the success of organization-led transformations. We challenge you to evaluate the relationships between people, processes, and technology in future evaluations of your organization.