[InsuLearner: Free Software for Computing Insulin Pump Settings for Type 1 Diabetes]


April 19th, 2023. Estimated Read Time: 15 minutes

several pair plots of synthetic carb and insulin data

Pair plots of synthetic data used later in the article.



Disclaimer: I am not a doctor and this is not medical advice. I am sharing my ideas openly in case they are helpful to a diabetes community that has given a lot to help me and my family. Check with your doctor before making any changes to insulin pump settings.

Overview & Motivation

This is a follow up to my article about how I compute insulin pump settings for my son who has Type 1 diabetes using machine learning. In that article, I described a methodology I developed for computing insulin pump settings, but I didn’t share my underlying code due its raw state at the time. Since there has been significant interest in my work on this - all over the world, amazingly - I packaged up my code and released it freely for anyone to use. Diabetes sucks, and I hope this tool can ease the burden for others like it has for my family. Below I discuss how to get the software, how to run it, and some analysis for understanding how to make it work on real data.

I highly recommend reading my previous article and this entire article first before attempting to use the software I describe here. The reason for this is the estimated settings can be negatively affected by a variety of common data issues such as missing data, significant amounts of noise in the data, etc. So knowing the fundamentals of this approach to computing settings (first article), being able to interpret results, and recognizing potential issues with real data (this article) can be important to get good settings estimation.


Table of Contents


InsuLearner

InsuLearner is a software package that contains my Python code to estimate personalized insulin pump settings from historical insulin and carbohydrate information. The code implements a machine learning methodology I developed and currently use to estimate insulin pump settings for my son. The method is system-agnostic, i.e. it works for insulin dosing settings that are used for insulin pumps, closed-loop systems, or even manual daily injections (MDI). The method also does not rely on having good initial settings. Instead, it takes advantage of an assumption commonly met by individuals with Type 1 diabetes: they count carbs and give enough insulin to eventually bring blood glucose into range over time.

Currently, the software interfaces with Tidepool accounts to retrieve insulin and carbohydrate data for inputs into the algorithm. And the software uses machine learning to output estimates for:

  • Carbohydrate Ratio (CIR)
  • Basal Rate
  • Insulin Sensitivity Factor (ISF)
  • (optional) Carbohydrate Sensitivity Factor (CSF)

The software is installed and run at the command line (e.g. in the Terminal application on Mac or the Powershell application on Windows), but these are simple tasks even for non-technical folks.

How to Install InsuLearner

To install InsuLearner, this is the command:

pip install insulearner

If is not pip on the system, it can be installed it using this command:

Linux & Mac: python -m ensurepip --upgrade
Windows: py -m ensurepip --upgrade

How to Run InsuLearner

Once installed InsuLearner can be run at the command line. Below are a couple example commands:

insulearner [tidepool_email] [tidepool_password] --num_days 60 --height_inches 72 --weight_lbs 200 --gender male

insulearner [tidepool_email] [tidepool_password] --num_days 30 –-agg_period_window_size_hours 72 --CSF 4.2

The software takes a number of parameters and most have defaults, but it’s important to understand them and how they affect the settings estimation. To print a message on the parameters in the command window use insulearner -h. Beyond Tidepool username and password the parameters available at the command line are:

  • –-num_days
    • This is the number of days prior to the current day for which to retrieve insulin and carbohydrate data from Tidepool. In the first example, 60 days are retrieved and in the second example, 30 days or data are retrieved.
  • -–height_inches, -–weight_lbs, -–gender
    • These are inputs to an algorithm I devised for estimating Carbohydrate Sensitivity Factor (CSF), which is required for computing the Insulin Sensitivity Factor (ISF)
    • Alternatively, if the CSF is known from experimentation, it can input it directly using -–CSF and forgo inputting height, weight, and gender.
  • –-estimate_agg_boundaries
    • This defaults to True and enables an autocorrelation-like algorithm I devised that estimates the best time in which to aggregate the data.
  • –-agg_period_window_size_hours
    • This defines the size of the windows in hours in which to aggregate carbohydrate and insulin data. It defaults to 24 hours, which works reasonably well for my son, but as I theorized in my first article, much longer time periods could yield better settings estimations depending on the nature of the data.
  • –-agg_period_hop_size_hours
    • This defaults to 24 hours and is the amount of time that the aggregation window shifts in time for each aggregated data point. So if this value is less than the window size, there will be overlap in the data aggregations.

InsuLearner Location

My github account contains the source code for InsuLearner. I highly recommend digging into the source code to better understand the details. The Python package server PyPI hosts the InsuLearner package. Any future updates to the code I will put at these locations.

Interpreting Results

It’s important to interpret the outputs of the software to ensure good settings estimation, especially the plot of the data and fitted model. I have run my code on dozens of people’s data, and I have seen common data issues for some individuals that will negatively affect settings. For example, how carbs are counted or whether data was uploaded fully can significantly affect the settings estimation and may be different from person to person. There are potentially ways to mitigate this through additional algorithm development, but I am making this available for free in my spare time and don’t have the bandwidth to do so (let me know if you want to contribute). In the following section, I’ll first describe the outputs of the software in detail below and then I’ll discuss a series of things I look for or do as safety checks in case of data issues.

Outputs

When InsuLearner is run at the command line, it will do two things:

  1. print a bunch of useful information in the command window
  2. open a plot with the data and a model fitted to the data

Here’s an example of the information printed in the command window:

2023-04-12 16:42:54,569 - __main__ - DEBUG - Args:
2023-04-12 16:42:54,569 - __main__ - DEBUG - estimation_window_size_days: 25
2023-04-12 16:42:54,569 - __main__ - DEBUG - height_inches: 64.0
2023-04-12 16:42:54,569 - __main__ - DEBUG - weight_lbs: 125.0
2023-04-12 16:42:54,569 - __main__ - DEBUG - gender: female
2023-04-12 16:42:54,569 - __main__ - DEBUG - CSF: None
2023-04-12 16:42:54,569 - __main__ - DEBUG - estimate_agg_boundaries: True
2023-04-12 16:42:54,569 - __main__ - DEBUG - agg_period_window_size_hours: 72
2023-04-12 16:42:54,569 - __main__ - DEBUG - agg_period_hop_size_hours: 24
2023-04-12 16:42:54,569 - __main__ - INFO - CSF estimated to be 6.41 for height_inches 64.0, weight_lbs 125.0, and gender female
2023-04-12 16:42:54,569 - __main__ - INFO - Running for dates 2023-03-13 00:00:00 to 2023-04-07 00:00:00
2023-04-12 16:42:56,670 - InsuLearner.tidepool.tidepool_user_model - WARNING - Unparsed events: defaultdict([class 'int'], {'physicalActivity': 20})
2023-04-12 16:42:56,672 - InsuLearner.tidepool.tidepool_user_model - WARNING - De-duplicating CBG data...
2023-04-12 16:42:56,678 - InsuLearner.tidepool.tidepool_user_model - INFO - Zero duplicates found in CBG data.
2023-04-12 16:42:56,678 - InsuLearner.tidepool.tidepool_user_model - WARNING - De-duplicating Bolus data...
2023-04-12 16:42:56,679 - InsuLearner.tidepool.tidepool_user_model - INFO - Zero duplicates found in Bolus data.
2023-04-12 16:42:56,680 - InsuLearner.tidepool.tidepool_user_model - WARNING - De-duplicating Basal data...
2023-04-12 16:42:56,683 - InsuLearner.tidepool.tidepool_user_model - INFO - Zero duplicates found in Basal data.
2023-04-12 16:42:56,683 - InsuLearner.tidepool.tidepool_user_model - WARNING - De-duplicating Food data...
2023-04-12 16:42:56,683 - InsuLearner.tidepool.tidepool_user_model - INFO - Zero duplicates found in Food data.
2023-04-12 16:42:56,689 - InsuLearner.tidepool.tidepool_user_model - DEBUG - Circadian Hour Activity 6. Circadian Hour BG Velocity 9
2023-04-12 16:42:56,719 - __main__ - DEBUG - Mean of CGM Mean, 167.39
2023-04-12 16:42:56,719 - __main__ - DEBUG - Mean of CGM Geo Mean, 154.22
2023-04-12 16:42:56,719 - __main__ - DEBUG - Total Period Insulin Mean: 186.79
2023-04-12 16:42:56,719 - __main__ - DEBUG - 22 Data Rows
2023-04-12 16:42:56,720 - __main__ - INFO - Linear Model: Fit R^2 0.49. Intercept 100.98. Slope g/U [5.33]
2023-04-12 16:42:56,720 - __main__ - INFO - Total Period Basal=100.98U. (Mean %Daily Total: 54.77%)
2023-04-12 16:42:56,720 - __main__ - INFO -

    Settings Estimates:

2023-04-12 16:42:56,720 - __main__ - INFO -     Estimated CIR=5.33 g/U.
2023-04-12 16:42:56,720 - __main__ - INFO -     Estimated Hourly Basal=1.402 U/hr
2023-04-12 16:42:56,720 - __main__ - INFO -     CSF=6.41 mg/dL / g
2023-04-12 16:42:56,720 - __main__ - INFO -     Estimated ISF=34.15 mg/dL/ U

The printed output contains a bunch of information about how the data was collected and the parameters that were input. This includes the pump or other device event parsing failures, detected duplicate events, the detected hour of least blood glucose velocity, and other statistics. The insulin pump settings estimates are there at the last few lines. However, it’s also very important to look closely at the plot. Here’s an example of a plot when I run my son’s data:

image of settings plot learned from data

Safety and Sanity Checks

So now that the InsuLearner software is running and getting outputs, it’s important to sanity and safety check the outputs. First I’ll discuss obvious problematic settings outputs to sanity check. Then I’ll cover a few things I think about and do when I do this for my son.

Erroneous Settings

I did not develop any guardrails in the fitting of these models (perhaps that is future work, if anyone wants to help contribute). What that means is I sanity check that the settings computed by InsuLearner make common sense. For example:

  • Does the fitted line have a positive slope, ie CIR is non-negative? It should.
  • Is the y-intercept (basal insulin) of the line non-negative, i.e. basal insulin is non-negative? It should be.
  • Is the y-intercept (basal insulin) closer to 0% or 50% or 100% of the mean total daily insulin (red star on the plot)? Probably - though not certainly - it should be nearer to 50% than the extremes of 0% or 100%.
  • If estimated, is the CSF greater than 0 (larger body) and less than 20 (smaller body)? It probably should be.

If the answers to any of these are no or there are any other obvious problems, I would not use the computed settings. I would check my data carefully and then try running the software again.

Counting Carbs and Insulin

Another thing I’ll do is ask myself how accurate is my account of carbohydrates and insulin? The more accurate these are the better the estimation. For most people on insulin pumps, insulin is tracked very closely and should be a non-issue. Although, be aware that if a significant amount of insulin was erroneously recorded by the pump for some reason without impacting blood glucose (e.g. pump site failure where insulin was delivered but on top of the skin), this can also negatively affect settings estimation.

But for most people, more data record noise will come from carb values than insulin values. This machine learning method can tolerate some noise to incorrect carbs, but if I am very loose in tracking carbs I should expect worse settings estimates. If counting carbs isn’t something I did regularly, one way to help this would be to count carbs very closely for some period prior to running this software. Then I would run the software on for just the time period when I counted carbs closely. I would guess at a minimum 2 weeks, but longer is better like 4-8 weeks.

Missing Data

If I feel I keep solid carb and insulin records, the next thing to check is whether those records are properly stored in Tidepool. If there are records that are not uploaded, duplicated, or incorrect, this will also negatively affect settings estimation. I might examine the data on the Tidepool website in the time period that InsuLearner will process to sanity check the available and accuracy of the data for each day.

I designed the software to raise an error if the code detects an obvious data problem, for example if there is no data found on any single day within the retrieval window. However, if I have only some missing data on a particular day, for example, the software will not know this. So it’s very important to ensure that the data is available and correct.

Noise in the Data - Nonlinearity

So even after I feel that I have accurate records and no missing data, I still examine the plot from InsuLearner closely. The reason is that the insulin pump settings are using a simple linear model to represent a complex reality that is our biology. It may be that for some people this linear model is a good approximation and for others it is not. This does not mean that the settings computed with InsuLearner aren’t the best possible settings for someone in the latter group. It just means that more care should be given to computing and using settings from InsuLearner in this case.

To make sense of the plots, I think it helps to start from what data should look like under ideal circumstances from the perspective of the settings, and then consider how different types of noise might interfere to give us what we actually see in the plot. First, the insulin pump settings assume a linear relationship between carbohydrates and insulin (i.e. if someone needs 1U for 10g carbs, they’ll need 2U insulin for 20g carbs). In other words, when we see the InsuLearner plot, ideally we would see a straight line like the plot below. (Note: I created this synthetic dataset using this code in the InsuLearner software.)

image of settings plot learned from data

The perfectly linear plot of synthetic data above assumes that someone needs and receives the exact same amount of basal insulin each day and that the insulin required and given for carbs is perfectly linear. But with real data, these assumptions will likely be violated and all kinds of additional effects will push each of those data points around on the plot to form the less linear data clouds we actually see. For example, people may need more or less insulin (without associated carbs) due to a degrading pump site, exercise, sickness, hormones, etc., which will move the data points vertically. And when we don’t get the exact carb counts or our body metabolizes lower or higher percentages of the carbs we consume, for example, those data points will move around horizontally. So when there is a plot of real data, it looks a lot less like a line, such as in my son’s plot from above.

So what does this mean? How can it be actionable for computing good pump settings? Mostly this affects the confidence in whether the outputs of the software are overly dominated by noise.

First, I have a hypothesis that there is higher confidence in the outputs of the InsuLearner algorithm for data that appears more linear. The linearity of the data is measured directly via the Coefficient of Determination of the linear regression fit, which is printed as R^2 in both the command window output and the plot. With perfectly positive linear data, this value has a maximum of 1.0. And in a very noisy cloud of data without any linear relationship, this value would be close to 0. Anecdotally, I start to be cautious of settings estimated where this value falls below 0.5.

Second, just because someone’s data is not very linear does not mean the InsuLearner settings are incorrect. As long as the data is representative of the individual’s biology, a fitted linear model that goes through the “middle” of this kind of data - even if it’s more of a cloud than a line - may still likely be the best predictor of insulin needs. In this case with a closed-loop system, for example, the fitted model minimizes the distance the closed-loop system has to “reach” above and below the line on the insulin axis (y-axis) to maintain blood glucose control. But the takeaway is I take more care when using the settings from InsuLearner if the data is not very linear.

Noise in the Data - Bagging

Last, there are a couple other things I do to further increase confidence in the outputs of InsuLearner. These are related to concepts such as bagging data in machine learning. Fortunately, they are very easy to do since they just require rerunning the InsuLearner with different parameters.

First, I’ll consider how stable the estimated settings are over different periods of time. For example, I might change the parameter --num_days to 15, 30, or 45 and compare each of them. If I have an intuitive sense that my son has had stable insulin needs in those periods, I look for consistency in the estimation settings values. If they vary considerably, this could imply the noise in the data is having an outsized effect on the results.

And second, I’ll look at the estimated settings with different aggregation window sizes using the --agg_period_window_size_hours parameter option. I often use aggregation windows of 24 hours with my son, which means that the insulin and carbohydrates are summed within 1-day periods. But in my first article, I mentioned that aggregating over longer periods of time the insulin and carbs may be a better “match” of carbs to insulin and potentially give better settings. With other people I have found that when aggregating over much larger periods of time (e.g. upwards of 30 days), their settings seem to converge on something stable that they have reported worked well for them.

Conclusion

Like in the first article, there is a lot more that I didn’t cover here. If there’s interest I can provide more analysis I’ve done that may be useful for estimating settings or interpreting diabetes data in general. If that’s the case, let me know by sending me a message. I hope this is helpful for anyone looking to dial in their pump settings.