Helping business monitor and explain their machine learning models
Product Problem
Businesses are increasingly adopting machine learning (ML) models to drive innovation and gain a competitive edge. However, ensuring the reliability, and performance of these models after deployment remains a significant challenge. Traditional monitoring methods lack the depth needed to understand complex model behavior.
Censius empowers enterprise data science teams and ML engineers with a comprehensive platform for monitoring and explaining their production machine learning models
My Role
I had the privilege of joining Censius as their very first product designer. It was an exhilarating challenge – I spearheaded the design efforts, laid the user experience (UX) foundation for the Censius Platform.
I worked with the founder (Ayush Patel), our engineering team, and a small team of ml practitioners to lead the end-to-end design of the Censius Platform and shipped the MVP in Mar 2022 and the alpha version in Jan 2023.
Ensuring Your Predictions Stay on Beat: Introducing Censius
Imagine the complex algorithms behind Spotify's music recommendations. These are essentially "ML models," powerful programs trained on vast amounts of data about music, listening habits, and user preferences (training data). They learn to predict songs you'll likely enjoy (make predictions). But music tastes evolve, and so do listening patterns. Censius helps ensure these predictions stay fresh and relevant.
Here's how:
Monitoring: Censius Platform keeps an eye on your model's performance. It detects potential issues, like the model becoming outdated on new music trends before they impact your recommendations.
Explainability : Ever wondered why a specific song popped up on your "Discover Weekly" playlist? Explainability lifts the hood on the model's decision-making process and explains the "why”.
Censius Platform operates after the deployment stage of the machine learning lifecycle and acts as a guardian, ensuring your models continue to function optimally and deliver the kind of personalized experiences you expect.
High Level Overview of MLOps Cycle
Research & wireframe
Understanding the user and their needs
The leadership recognized the need in the market for a strong machine-learning monitoring solution. To build the MVP we began conversing with engineers to understand their pain points and workflows.
Our initial conversations targeted data scientists, machine learning engineers, product managers, and leadership teams. This research led us to define two user groups: technical (ML engineers, data scientists) and non-technical (product managers, business analysts).
I actively collaborated with the team to list down user concerns, user requirements, and pain points throughout our research and created a comprehensive list of feedback from these sessions. This allowed us to prioritize features, functionalities that would deliver the most value to our target audience and ultimately shape the Censius product roadmap.
Snippets from Technical Research and User Flows
This research led us to define two user groups:
Technical - ML engineers, data scientists)
Non-Technical - product managers, business analysts
We also figured that existing solutions, often designed by engineers for engineers, didn't cater to this broader audience.
We recognized this common pitfall and identified a clear opportunity for differentiation. Our key focus was to prioritize ease of use and create a visually appealing and intuitive UI.
Design Objective
Our design objective was to create an accessible, Intuitive, and user-friendly machine-learning monitoring solution. This meant prioritizing an intuitive and efficient user experience for a diverse user base – both technical (data scientists, ML engineers) and non-technical users (product managers, business analysts, stakeholders).
Design PRINCIPLES
With the findings from Interviews and internal discussions, we arrived at our high-level design principals to help us to prioritize a user-friendly interface with intuitive workflows.
Improve clarity - Feedback from users made it clear our existing solution's experience lacked easy-to-understand monitor configurations, intuitive data visualizations, and clear explanations for alerts and how information was organized and displayed and to address these shortcomings, the design of Censius Platform prioritizes clarity.
Increase efficiency - We aim to optimize efficiency in several key areas and allow users gaining a richer understanding of model performance efficiently without getting bogged down in complex processes and interfaces.
Leverage Standard UX - We explored various options for component interactions and functionalities in the initial stage and recognized the value of leveraging established and familiar UX patterns from popular SaaS platforms to minimize the learning curve and ensure that the platfrom could be used by non technical people as well.
Design Research
I spent much of the early days discussing, sketching, ideas, and directions for the platform with the engineering team to understand product features, design requirements, and step-by-step user journey. I also studied the competitors and popular SaaS platforms to collect successful design patterns and inspiration, and learn common design pitfalls.
I along with Ayush (Founder) and engineering team had regular meetings where we presented our ideas and progress, and with the engineering team’s inputs and suggestions on product and feature architecture we came up with wireframes and flows.
Design Research, Wireframe planning
We initially wanted to build monitoring solution for the MVP, later we incorporated explainability and dashboards in the app which I'll talk about in a bit.
Monitoring your models
First, let's understand what is monitoring, just like a smoke detector warns you of fire, model monitoring keeps an eye on models.
If something unexpected happens or the model starts acting strangely, it sends an alert so you can investigate your model.
Censius simplifies model monitoring by allowing users to manage their model with a one-level folder structure where all models related to a specific goal are grouped within a single project, fostering clarity and streamlining workflows.
Imagine there is a bank for eg. HDFC, is using models to predict their offers to their users in that case they have an project named "Offer Recommender" under which all their models and datasets will be.
Projects Tab
Individual project cards display key information like names, models housed within, and datasets used. This allows for quick identification and understanding of each project's scope and complexity.
Some of the Iterations of Project Card
Within each project card, users can easily browse and locate it's models and datasets.
Model and Dataset View
Project Detail screen shows all the models and datasets related to that project, as seen above on the screen.
I used tabs to allow users to quickly switch between models and datasets for eg. one of the model can be credit card model upgrade that determines (predict) whom the bank should offer credit card upgrade offer.
A model can have multiple versions, each optimized for updated dataset or better prediction or just testing. I decided to use a two-column layout for the active “Models” tab's list of models that allows users to click on the model from the left column's list and shows all the model versions related to selected model on the right along with it's key information.
When a user first lands upon this screen the model will be selected by default and will show all it's model version on the right.
Using a two-column layout wasn't my first choice here, I explored this screen's navigation using side sheets, pop-ups, and finally considering we didn't want to hide and show the information, I decided to use a two-column layout where all the information is present and doesn't hide and show.
This intuitive layout enables direct browsing to the desired model, eliminating the need for navigating through multiple folders or levels.
by clicking on model version from the right column they will see the all the model details in a dedicated screen, let's talk about it…
Model details
This Model/Model Version detail screen allows users to analyze, investigate, and get actionable insights about their models
The challenge to design this screen was how we can show the vast amount of information related to all the data points in simple and efficient manner that helps users perform actions and make decisions.
By discussing with team I started understanding all the data points we wanted to show. The goal was to organize the information in a layout that simplifies the findability .
I started thinking in terms of components and from an user interaction perspective and started exploring wide range of layout options using pop-ups and side windows to accordions and tables and came up with rough mockup to get feedback from the team
Finally with our engineering team's active feedback and our considering our design principals, we decided to use below layout.
Full view of Monitors, Model Performance, Dataset, Explain, Fairness, and Iterations
By understanding all the data points and prioritizing discoverability and user mental models I implemented a streamlined navigation system that helps users have an intuitive navigation experience.
I used contextual navigational components to facilitate the efficient investigation, the interface elements based on the user's current view. Elements like Links and buttons, breadcrumbs, categories, tags, and filters, tabs, dropdowns, Search within context, and tables of content, dynamically appear to guide users to do efficient investigation of their models and make decisions.
Let's talk about what's happening in the monitors tab.
Monitors View
The active monitors tab shows the list of the monitors associated with the model/model version
A monitor is simply a way to track a specific data point in the monitoring pipeline for eg. an engineer working at HDFC bank on credit card upgrade model can set up a monitor to send an alert if the mean credit score falls below a specific threshold value like 650.
Censius supports 3 types of monitors - Data Drift, Performance and Data Quality; I used the accordion component with a list component to list out all the monitors with their type and added a search input to allow users to quickly jump to any specific monitor and perform their actions.
This helps users focus on one thing at a time and not get bombarded with endless list of monitors and empowers users to seamlessly navigate to specific monitors.
Each Monitor type displays a table summarising violations (When a monitor detects a condition that falls outside of acceptable parameters)
Monitor's Violation View
I used a circle icon to indicate severity of the violation, where red means that the data point has violated multiple times demanding immediate attention, , orange means, moderate priority violations, and green means, violation are there but they are vey close to threshold they set up and can be addressed on low priority.
This visual approach empowers data scientists to quickly prioritize and address potential issues within their models.
To summarise my design decisions to enhance information clarity for informed decisions on monitor tab screen.
Used icons with accompanying text in the table row to show severity of violation
Incorporated contextual navigation components.
Used 2-column layout to visually separate information
Date Component
The date component remains constant on it's positions throughout all the model related data points and allows users to view and compare the information for historical time periods for deep analysis & investigation of their models.
I brainstormed and collected ideas for what an ideal date component might look like and there were two key data points I had to consider to come with it's design
List of most frequently used period
Custom option to choose a period
I tried multiple options from allowing users to choosing weeks, months and quarters to direct dates input and with internal team's feedback we finalized the below design considering our design principals and ease of use.
The date-picker design shows the most used periods as default options (Last 24 hrs, 7 days, 1 months, 6 months and 1 year) along with a calendar icon to choose the custom period that allows users to input direct dates and shows the exact dates of time period providing clear context even if users switch tasks.
Date Component
Monitor creation
Users can monitor any specific data point within their models creating new monitors for their desired data points and define violation conditions. The Censius app detects violations of these conditions and alerts users.
The creation of new monitor was complicated and spanned across multiple screens in existing tools. This overwhelmed users and made it difficult to focus on specific step.
We wanted to address it by simplifying and allowing users to do it from a single screen, I anticipated that designing for clarity and focus were crucial for monitor creation flow.
Based on the inputs from engineering team about all the data points we need in order to set up a monitor, I and Ayush started brainstorming how can simplify and make it easier to set up new monitors, and designed an 4 step monitor creation screen after trying multiple iterations.
We implemented it in the app and asked users to go through the monitor creation flow and set up new monitors.
Upon asking questions like, how easy it is for them to create and edit their existing monitor, was there anything that was missing from this screen that they would like to see when they set up new monitors, we received some good feedback.
Some of the feedback was related to technicality and some could be addressed by design. for eg. one of the user need was while setting up new violation, they would like to see the violation on historical data (design), other one being ability to integrate webhook in the app (technical)
Based on the feedback we got from our users and internal discussion I addressed the points that could be solved by design and designed a three-step alert creation process with a focus on progressive disclosure, clear visuals, and user control and added a step indicator that guides users where they are in their monitor creation journey.
Step 1. Create Monitor
Create monitor is the first step of the monitor creation journey, where users can provide basic information about the monitor they want to set up such as for which model, model version, monitor type and the feature they want to associate this new monitor.
Step 1. Create Monitor
I used form inputs to take this information from user and kept it left as opposed to right in the previous version we shipped, I'll explain why in next step
For the "Monitor Type" I initially used a dropdown to let user select the type but one of the feedback from the team was that we want to highlight the type of monitors app supports so I decided to use a tab, that visibly shows the type of monitors they can choose from
Step 2. Configure Monitor
Confiure Monitor step allows user to set up conditions for their monitors for eg. an engineer at HDFC bank working on credit card upgrade model can set up a condition like send me an alert if approval rate of the credit cards upgrade falls below 68%
We call these conditions "Data Segment", a Data Segment is essentilly a condition or group of condition that tracks any specific data point
User can input a single or multiple conditions based on their needs and add more condition input by clicking on "+ Add Condition" button.
Once we shipped this and tested with users we got to know that creating these condition was a repetitive task for them every time they set up new monitor, to address this I along with Ayush had discussion with our engineering team and made 3 changes in this step.
I added a new input to give user's condition a name and added a check box to allow users to save these conditions and added tabs on the top that allows them to create new or choose form existing conditions and repurposing the condition that created in the past and reducing repetitive work
The conditions or I should call them "Data Segments" are available to users on project basis to repurpose.
Step 2. Configure Monitor
I kept the steps 600px width and left aligned on the screen to facililate the space for graph that appears on the configure step.
As users fill in the monitor configuration, a dynamic graph appears on the right side that serves three key purposes:
Visualize Threshold: See your chosen threshold level clearly displayed on the graph, helping you understand how strict or lenient your monitor will be.
Baseline Clarity: The graph displays the baseline data for the chosen metric, providing context for setting the threshold.
Historical Insights: This allows users to see how the chosen threshold would have performed in the past, fostering informed decision-making.
This real-time graph preview empowers users to configure their monitors with confidence, ensuring they effectively track the data point.
By choosing Data Segment and choose what's the baseline against and threshold value users can configure their monitor
Now Let's talk about the final step of the monitor creation flow
Step 3. Set up Alerts
Step 3. Set up Alerts
The final step allows users to where do they want to get notified if the Censius app detects any violation
Users can add their preferred email and slack notification channel using two dedicated multi-select input box, and by giving the monitor a name they are good to monitor their model.
Explainability
Explain the why behind model's decisions.
The “Explain” tabs allow users to dissect model decisions and help understand the "why" behind your models' predictions so that users can make more informed decisions, and optimize model performance for their specific needs.
The design objective of this screen was how can we allow users to understand and analyze the model’s decision while not overloading the information and avoiding technical soup?, since non-technical users would also be interested in knowing the model’s decisions we wanted to make it super simple to investigate “why” behind the model.
Each model would have certain features linked with its prediction for example a movie recommendation model at Netflix would have features like “genre preferences”, "actor preference”
In order to explain "why", we wanted to show 2 things
What features are being considered actively by model to make prediction and what are the ones that are not.
How an individual model feature is affected any specific prediction
To solve for this, we came with two kind of explainability - global and local…
Global Explainability
This helps users understand model-level decisions, eg. In the case of HDFC bank credit card upgrade offer, it will show what and how many features model is taking into account to predict which credit card user should get an credit card upgrade offer.
In the context of HDFC bank these feature can be age, income, location, total spend, credit score, account tenure, digital banking usage etc. to predict which user should get an credit card upgrade offer and which user shouldn't
Global Explainability
I opted for two collapsible sections – "Global Explainability" and "Local Explainability." This structure mirrors the user's thought process – starting with a big-picture view and then zooming in for specifics.
I used a bar chart to indicate the feature importance of a model’s predictions where on the right blue bars indicate features currently being actively used and on the left red bars indicate that these are the features that are not being used at the moment to make the prediction.
This reveals key factors driving predictions and areas for optimization for the model and the user can investigate into model’s feature importance history by changing the time duration from the date component from the top of the screen
Below the bar chart for a detailed analysis of the feature user can access a table that is curated based on the influence on output for the chosen data point and shows since these data points can be numerous in quantity, we added sort, and filter functionality to allow users filter, sort the data points to speed up their investigations and perform actions.
Feature Analysis
Local explainability
While global explainability helps users understand the big picture behind the model’s prediction, Local explainability considers each instance in the data and individual features that affect model predictions
For eg. In the case of HDFC bank credit card upgrade offer engineer at HDFC can see how any individual feature for eg. "total spend" has affected any specific model prediction for any user to offer them a credit card upgrade.
The app does this by finding casual relationship between feature and their use overtime in predictions
I used scatter chart along that allows users to select each individual feature and for X and Y axis to find relationship between different feature
By plotting different data point and checking the graph on historical data points users can investigate specific predictions.
The chart shows the distribution of data point importance, by comparing and finding casual relationship between data points users can understand how any specific model feature has contributed to any specific prediction and why.
Local Explainability
Dashboards
Create internal reports and model insights
The current model reporting process is fragmented and inefficient as engineers rely on external tools to create reports and share the model’s insights with their team. This increases the risk of errors, and delays understanding model performance. Ultimately hinders collaboration, slows down decision-making, and utilizes resources inefficiently.
The Dashboard feature transforms the model’s reporting from a fragmented process to a collaborative one. By incorporating it users can create reports for model health, performance data, and explainability, and gain a holistic view of their models in one place.
Design Challenge
Our Challenge was how we allow our users to design dashboards balancing powerful features for complex dashboards with a user-friendly interface that caters to users with varying technical skill sets. The interface should be intuitive for beginners but also offer enough flexibility for experienced data analysts.
Design Research
When I first heard that we wanted to build a feature that allows users to create dashboards from our founder, the first thing that came to my mind was a lot of numbers, a couple of charts, and a cramped screen. and had a lot of questions in my mind, I listed those down and shared them with the team to get their alignment on the approach.
What information our users might need to create a dashboard and how can we scope it
What kind of data will be displayed on the dashboards, how complex it’s gonna be and how can we make it easy to use and consume?
How do users currently create reports and monitor models? What are the pain points and inefficiencies in the current process?
How can we balance powerful features with a clean and intuitive UI and make the information hierarchy balanced?
How much flexibility should users have in customizing their dashboards (e.g., layout, widgets, filtering options)
What data is essential to a user’s journey, and what could be considered superfluous or potentially concealed to streamline the user experience and prevent cognitive overload?
The engineering team suggested checking out Tableau and Retool to get some context and inspiration about our dashboard creation, with the help of data scientists in our team I sat with him and explored both tools, retool stood out to us in terms of ease of use.
Leveraging internal feedback and Retool's best practices, I formulated the following principles for the dashboard design.
Snippets from Dashboard Research
Simple ways of representing data (for example, what types of data visualization are used and how graphs are designed for easy reading)
Hierarchy of information (for example, how labels are written, how important numbers are highlighted)
Design for familiarity - Many users would likely have prior experience with other dashboarding tools. By building upon those commonalities, we could create a smooth learning curve and empower them to quickly get up and running with Censius
I brainstormed dashboard flow in a call with the founder and lead data scientist to Identify and finalize all the data points that we’d like the analytics dashboard to show.
Design
Dashboard Tab
Censius keeps your dashboards front and center for optimal workflow. The left navigation bar provides quick access, allowing you to create, access, and manage your dashboards with ease.
To helps users stay focused with I introduced categories like "Favorites," "Created by You," and "Shared with You." This intuitive organization cuts through the clutter and ensures you can navigate to the most relevant insights quickly.
After users define their dashboard details, they land on an open canvas-like screen where can add widgets and visualize model data. The canvas becomes their familiar and inviting playground for exploration where they can customise the layout of the dashboard and visible information of widgets.
Dashboard Playground
Top Navigation:
The top navigation bar stays constant throughout the dashboard creation process. It displays the project name, model association, and the current dashboard name along with the data fetch timestamp that empowers them to make informed decisions about the data they're analyzing.
Users can apply queries on the components available on the canvas using the global data segment button that applies all the presented components on the canvas and using the date component users can visualize how they would have looked like in a historical time frame
Right Navigation:
Users can drag-and-drop the widget and select it on the canvas, then right navigation transforms into a data control panel. Here, users can update the data displayed within the widget, ensuring their visualizations represent the specific metrics they care about.
image of the right panel
Additionally, I replaced the global left navigation with a hamburger menu, to allow users more screen real estate to focus on their dashboard creation for easier visualization arrangement and manipulation, ultimately improving the user experience with ease.
Accidental loss of progress can be a major source of user frustration. To address this in Censius, I designed a strategic warning pop-up. This pop-up serves a critical UX function by Preventing Unsaved Work and Minimizing User Disruption.
Dashboard Warning
Editing on the canvas
The intuitive data editor empowers individuals to slice and dice data effortlessly, fostering deeper analysis and eliminating the need for external tools.
After so many crucial products and UI/UX decisions involved in determining the types of widgets users could create we decided to start with four types of widgets to start with -
Text Box - displays textual information
Table - organize and present numerical data in a clear and structured format.
Chart - Data visualization using Pie, Bar, Line, and Scatter Charts for simplicity, familiarity, and ease of consumption.
Number - a single number with a descriptive label to highlight key metrics
We implemented a snap-to-a-grid system on the canvas that helps users automatically align and resize widgets to create a visually balanced layout. While designing the components I kept 8px marginal space from their edge that helps users bring space when placed juxtaposed
on the selected state of the widget, the outer border has handles to resize the component and has a label indicator on the top left for identification and organization, the widgets are movable after the selection of the widget’s label indicator that has a handle to move
For our engineering team, I documented the interactions of all components and their behavior to ensure design consistency throughout the development cycle and prevent any deviations from the user experience.
Chart library selection
I collaborated closely with the engineering team to select a charting library for our visualizations. I started by providing the functionality requirements and nice-to-haves. The engineers compared several charting libraries and also contemplated the pros and cons of building graphs from scratch. We decided on Apache e-Charts for all the visualizations the platform uses including dashboards. it had all the functionality we needed and was customizable, well-documented, and actively maintained.
Styles, Components and documentation
As I started designing the screens, I started building a collection of reusable building blocks of styles and component library. This meant creating styles and components on-the-fly and integrating them into various screens while simultaneously refining them based on any new use case and new user interaction if needed.
By iteratively modifying both components and layouts, I achieved a point of convergence where the Phoenix components effectively served the needs of the screens without further adjustments.
To speed up the front-end development workflow we opted for Tailwind CSS. it allowed developers to focus on building the user interface quickly and efficiently.
Inspired by Brad Frost's Atomic Design System, I created a library named Phoenix that ensured a modular foundation of style and components promoting code reusability, consistency, and maintainability across the entire application.
Few Components and Styles from Phoenix
I created and documented each component's and it's use case, behaviour and styling guidelines for all the screens, aiming for highest form of consistency in figma designs and actual web app.
Below is a example of Toast component from Component Library.
For every screen, I prepared multiple states: Ideal, Empty, and Loading with each component’s states, in some cases there was more than one of each type of state.
Below is a example of projects's screen
Projects screen states