Set A Minimum Daily Step Goal

I had some major lower back and leg issues in 2015 that lasted at least a year. After doing some physical therapy, I was much better off, enough so that I could play Ultimate competitively again.

On Father’s Day weekend 2020, I tweaked my back again. This was a true Father’s Day weekend. My wife was working so I watched the kids the whole weekend. I was fairly deconditioned due to skipping the gym to try to avoid getting the then-novel Covid-19. Biking was sketchy for my back since the initial injury, but that weekend I took the kids around town in a bike trailer. The weight of the kids and their stuff, combined with the fact that I was minimally exercising at the time, was too much for my back to handle. I felt it tighten up, and then kept pushing. By the time we made it home, I knew I was in trouble.

Stress may play a role in pain (see On Pain and Hope), and 2020 was a stressful time for me personally (two young children, helping run a startup that just had its sales pipeline dry up and a cofounder leave), and the world (U.S. election, pandemic, etc.)

My back wasn’t quite as bad off as the first time, but with the pain that I had, most days I was hardly getting out of the house. By December of that year, after starting to see the physical therapist again, I was feeling marginally better, and I decided to get a step tracker to try to walk more consistently to avoid future injury.

Now that I’ve kept it up for several years and feeling better than ever, I wanted to write up some thoughts and tips. If you’re looking to do something like this, I hope that this article gets you off to a good start.

First watch

My first fitness watch was the Garmin Vivofit 4. Here’s my Amazon review of it

The reason I got this watch was the best-in-class battery life (1 year or more) since I disliked taking past watches off to charge them. This one could stay on my wrist for basically a year.

Surprisingly, the best feature for me was the watch’s step goal streak counter. Streaks greatly motivate me. Every day I hit the step goal, it would cheerily chirp about it and show the streak counter advancing by one. Every five days in a row would result in a special animation. Small things, but they kept me going.

At first the watch set an automatically calculated step goal. This was motivating since I started at a low baseline. However, it was calibrated to increase when I hit the goal, and decrease when I missed it. This ended up being hard to reason about and resulted in needing to do more steps every day that I hit the goal, which was not very motivating. (“My reward for hitting my goal today is to make tomorrow even harder?”) So I took a look at my recent steps achieved and set an achievable goal of (I think) 5,000 steps.

For the last few years my step goal has been 6000 steps. I have hit a few streaks of 250 consecutive days, and likely hit the step goal 360 days out of the year for a minimum of 2 million steps per year.

Set a floor, not a ceiling

I thought about increasing the step goal further, but instead decided to try to consistently hit the goal instead. I treated the step goal as a step minimum, not a stretch goal. Or, to put it another way, I set a floor, not a ceiling.

If you’re interested in doing something like this, my key piece of advice is to start and stay lower than you think you need to.

Going for a high daily goal commits you to a lot each day. It makes it a lot harder to stick with the habit. Sure, I could do the vaunted 10,000 daily step day on occasion, but the effort (and time!) needed to achieve this day after day is very high. Time constraint was one of the things I learned while doing the fitness challenge.

When people pick exercise goals, they often imagine working out under ideal conditions. I would recommend imagining how many steps you’ll want to get when:

  • it’s going to rain all day
  • it’s 100 degrees outside and 90% humidity
  • there are a couple of inches of snow on the ground and your phone is turning off since it’s so cold
  • you have a slight injury or pain
  • work or family obligations are higher than normal
  • you’re traveling

Also, this is the step minimum. You may be doing other exercise like swimming or cycling or strength work that won’t show up.

Setting a step floor is probably most helpful for folks who work from home or work desk jobs. I feel that it keeps me honest. “Hmm, yeah I only got like 2000 steps today, need to move around a little!”

Typically I pick up a few thousand steps just by walking around the house, doing errands, etc. After doing the step goal for a while, I found out that I walk around 100 steps a minute. I know a few routes around my house that take 5, 10, 15, 20, 30, 45, 60 minutes, so it makes it easier to get close to the goal based on how many more steps I need.

Setting yourself up to succeed

To maintain a long streak, you’ve got to minimize your chances to fail. Identify the most common reasons for failure and try to mitigate.

Forgetfulness is probably my biggest enemy. One thing leads to another and it just sort of slips away. Setting a few alarms/reminders at night and snoozing until I hit the goal has been the most helpful for me. After a while, it becomes part of a nightly mental check.

Since I can’t log steps if the watch is out of battery, I monitor the watch battery and bring the charger on trips or charge it before going if it’s getting close to empty.

Based on the walking volume, I try to get good shoes and purchase new ones every few months. If you get random aches (knees, back, etc.) then it might be a good time to switch.

After being cold a few times and reluctant to walk, I got this gear for the elements like warm gloves and this balaclava after reading this review:

I work at an airport in Indiana. As you can I imagine it is super cold in winter and the winds oh my. It is an airport so nothing blocking those winds. We all survive my dressing in layers and not just a few. On average we are wearing 5 layers minimum. This hat is the best combo the built in neck gator protects my nose and mouth. So that it does not hurt to breath. The hoodie in it keeps the wind out my face. Nothing worse then those super cold nights like we just had with the artic polar. I was working in -25 and the winds make you tear up the cold made those watery eyes freeze yes my eyelashes and all froze like ice cycles….

After a few years and a few Vivofit watches, I wanting something more durable. I ended up upgrading to the Garmin Instinct 2 and that has been a positive change due to the heart rate tracking while offering roughly month-long battery life on a single charge.

If I accidentally forgot or missed the goal by a few steps, I try not to take failure too seriously. I got way more steps than I would have otherwise, I’m helping keep my body healthy, and look at it as a chance to start a new (potentially even longer) streak.

I could see a treadmill being in my future since it would enable me to get steps in inclement weather more easily.

Conclusion

Having a daily step floor has worked for me pretty well. I’ve missed a few days here and there, but feel like it’s been a positive change and hope to continue it for the rest of my life.

How I Fix Issues On Open Source Projects

Here’s a post detailing how I typically think about fixing issues for open source projects.

Identify an issue

There are two main ways that I identify issues on code repos.

When I evaluate a new (to me) project, I need to figure out whether it will meet my needs. Often this will be functionality, but I also want to determine how well supported the project is. I will usually look at:

  • the README / wiki
  • when the repo was created and when it was last updated
  • a few of the most recent commits
  • the repo’s issues / pull requests tab (including recently closed ones)

These give me a good sense of the level of activity on the project. For example:

  • Is it maintained by one person, a team, or has it been mostly abandoned? (To be fair, some older projects will have lower activity since they are more stable.)
  • Are there are a lot of issues or pull requests that have lingered without resolution?
  • Does it seem to have enough performance, security, or other -ilities?
  • Also, the repo itself may be deprecated or point to other alternatives to consider.

As part of this investigation, I typically skim through some of the open issues to check if there are any critical things that I need to be aware of or that might impact my application. This post has a good example of where this habit was useful for identifying a known issue that impacted my project.

The second way to identify issues is to actually use the code. There are a handful of items that I commonly find:

  • unclear documentation or typos
  • broken documentation links
  • setup instruction improvements
  • unclear error messages
  • incompatibility with other packages or newer language versions

These are all helpful to fix. It helps out other developers for a small time commitment. Observing the amount of small frictions also provides useful information about the repo.

Find or create an issue

My goal at this point is to start or advance a conversation about the issue.

If there’s an existing open issue, I prefer using that to keep things centralized.

If the issue is quick, obvious, or low risk fix, then I would usually create a pull request with that change. It can be quick: fork, make the change (even using the GitHub editor), and then submit a PR.

Otherwise, I typically create a new issue. This gives me a chance to start a conversation around whether this is something that needs to be fixed, whether a pull request would be welcome, etc. Other people might have have run into this issue and know a fix, or will run into it in the future.

If I’m creating the issue, I write up as clear of a description of the issue as I can. General notes:

  • Start by thanking the maintainers for their work and say how the project is useful to you or how it appears promising
  • Describe the issue at a high level
  • Minimal reproduction of the issue
  • Include any system details (other libraries, versions, platform, etc.)
  • If I’ve looked into the code, I may provide some ideas of where the problem lies

If the path to fixing is not clear, I like to ask if they also think this is an issue and whether they’d be open to a fix. This is useful since sometimes I won’t hear back, and sometimes I might learn something that would save time (it’s not a good fit for the project, it’s actually really hard, it is planned for the next version or already released, etc.)

Getting buy-in for a potential fix also increases the speed / likelihood of it eventually getting merged. I think it’s more likely that someone will review your changes if they’ve already agreed they would be useful.

General tone notes:

  • Never be demanding / condescending. Open source is typically unpaid work for someone, and may not be their current focus. The status quo is that most issues are not commented on, or even resolved, so any communication is appreciated.
  • I assume I am fallible and my issue could be something that’s due to my specific setup, ignorance, or other issues.

If you’re responding to an existing issue, you can follow some of these steps as well, especially if they were missing from the original post. I usually try to either add some details or move the conversation along. Things like “+1!” aren’t usually helpful (can use emoji reactions for that kind of feedback.)

Consider fixing the problem

If the problem is a blocker or important and I have an angle to fix, I often try to fix the problem. Sometimes I don’t have enough knowledge. It’s possible that there’s another repo or a fork that will get around the issue, so I might not have to fix it. Even so, opening the issue is useful since someone else might be able to fix it.

Typically the reference to this library from our code will be in a package file like package.json, Gemfile, etc. The eventual goal is to be able to update to the latest version of the library. The more special cases we have in our package file (pointing to a fork, pointing to a specific branch/revision), the harder it will be to upgrade the library and the more likely we are to fall behind on security updates or new features.

In practice, the iteration usually looks like:

  • Point to a fork of the project with the fix
  • Create a PR of the project
  • Point to the master branch of the project once it’s merged
  • Point to the latest revision once it is released

So the first step is to create a fork of the project. You can usually point your main project development environment to that fork for testing. Worst case, we can test the change locally and potentially get a fix weeks or months before it’s officially merged. Best case, our code will get merged and make the world a better place.

Fixing the problem

Usually I first run the tests for the project. This helps make sure that we don’t break the code when fixing our change. If there are broken tests or docs, can fix those and make a separate PR. Here is a good example of a PR that fixes tests.

When making the fix, I try to consider how to write tests or change tests to cover the behavior in question. It’s typically a faster feedback loop than testing from my main application directly. This also adds to confidence, which helps get a more timely merge.

I like to consider whether there are any potentially breaking changes like API changes. This saves the maintainer a step and develops a bit of empathy for the review process.

When creating a pull request, reference the original issue. This will usually save time since the issue should be documented and you can discuss the solution and any tradeoffs.

Building confidence

After you’ve made a good change, the outcome of the rest of the project is more about communication and trust-building. This includes the communication before the fix is ready.

My general approach involves imagining I am the maintainer of a system that hundreds or thousands of projects depend on. The projects want the next release of the project to work! And I am on the hook if something goes wrong. What message would you rather receive?

  1. Here are some changes to fix X. It works on my machine!

  2. Here are some changes to fix X. Here’s what I did and here’s why. I believe this should work since I added some tests and have been running this on my production project for the last couple of days with no issues. One thing I’m not sure about is Y, but I think this should not be a blocker given that it is also an issue in Z. I believe this does not contain any breaking API changes.

I far prefer the second message, as it demonstrates more thoughtfulness and knowledge of the system. (This reminds me a bit of some of the principles of “Turn the Ship Around!”)

Shepherd the PR

Sometimes your PR will be accepted quickly. But the reality is that most open source maintainers are busy or this project may not be a high priority for them.

So your goal at this point is to shepherd the PR through the merge process. Typically this is going to look like:

  • responding to review comments / questions / requested revisions
  • always having a next step or responsible person identified

If someone says they’ll look at it tonight, tomorrow, this weekend, etc., I like to follow up with them a day or two after the last date of the range passes. This gives them a bit of leeway so it doesn’t feel like you’re hounding them, while still being clear that they own the next step so it doesn’t fall through the cracks.

Also, you may need to follow up on getting a release. For example, going from v1.3 -> v1.4 in the library. Just merging to the main branch is often insufficient, since often people will point to the latest released version of the code.

Caching OpenAI Embeddings API Calls In-memory

I recently posted about a job posting search engine I prototyped that used OpenAI’s Embeddings API.

As I tested this out in a Google Colab notebook, I guessed that the same text would always result in the same embedding. I compared embeddings of the same text and found that they were indeed identical. I also added a space or words to the text and saw that it resulted in a different embedding.

I started by saving the embeddings in the dataframe. This worked, but I would have to call the API again if I wanted the same embedding again (which happened a couple of times as my code was not robust enough the first couple of runs.) I also wanted a way to also have search queries that were previously requested return faster.

Since I was going to embed many job postings and might run the notebook multiple times, I wanted to cache the results to save a little money and increase the speed of future runs. This was helpful when I was iterating on the code and running over many postings, since some of the postings caused my code to error.

One solution to this is to store the embeddings in a database, perhaps a vector database. This would be more persistent, and would be a more production-friendly approach. For the time being, I decided to keep things simple and just cache the results in memory until I saw that the overall approach would work.

After some research, I found that there are some decorators that can be used to cache the results of a function. In Python 3.9+, the functools module has a @cache decorator. However, I was using Python 3.8. The docs note that this is equivalent to using the lru_cache decorator with maxsize=None, so I tried that instead and it seemed to work.

# Python < 3.9 version
@lru_cache(maxsize=None)
def cached_get_embedding(string: str, engine: str):
  # print first 50 characters of string
  print(f'Hitting OpenAI embedding endpoint with "{string[0:50]}..."')
  return get_embedding(string, engine=engine)
# Python >= 3.9 version
@cache
def cached_get_embedding(string: str, engine: str):
  # print first 50 characters of string
  print(f'Hitting OpenAI embedding endpoint with "{string[0:50]}..."')
  return get_embedding(string, engine=engine)

Then you can replace any get_embedding calls with cached_get_embedding calls. The first time you call the function, it will print and hit the API. Subsequent calls will return the cached result and not print anything.

Another way of doing this would be to use OpenAI’s get_embedding inside your own function called get_embedding that uses the cache decorators or looks up the result in a database. Then you don’t need to change any other code in your project and get the benefits of caching. (Has a slightly higher chance of being surprising/confusing though.)

Since the embeddings seemed whitespace-sensitive, you may also want to remove leading/trailing/inner whitespace before calling the API if that whitespace would not be meaningful for your case to reduce cache misses.

Overall this worked well for my use case. Wanted to share it since it seemed like an elegant or Pythonic way of caching API calls.

Creating A Job Posting Search Engine Using OpenAI Embeddings

I recently worked on a job posting search engine and wanted to share how I approached it and some findings.

Motivation

I had a data set of job postings and wanted to provide a way to find jobs using natural language queries. So a user could say something like “job posting for remote Ruby on Rails engineer at a startup that values diversity” and the search engine would return relevant job postings.

This would enable the user to search for jobs without having to know what filters to use. For example, if you wanted to search for remote jobs, typically you would have to check the “remote” box. But if you could just say “remote” in your query, that would be much easier. Also, you could query for more abstract terms like “has good work/life balance” or some of the attributes that something like { key: values } would give.

Approach

We could potentially use something like Elasticsearch or create our own job search engine with rules, but I wanted to see how well embeddings would work. These models are typically trained on internet-scale data, so they might capture some nuances of job postings that would be difficult for us to model.

When you embed a string of text, you get a vector that represents the meaning of the text. You can then compare the embeddings of two strings to see how similar they are. So my approach was to first get embeddings for a set of job postings. This could be done once per posting. Then, when a user enters a query, I would embed the user’s query and find the job posting vectors that were closest using cosine similarity.

One nice thing about ordering by similarity is that the most relevant job posting should be first, and then other similar job postings would be next. This matches how other search engines work.

OpenAI recently came out with the text-embedding-ada-002 embedding engine, which is significantly cheaper and higher performing than previous versions. Notably, the token length was also increased to 8191 tokens, which meant we can embed whole job postings. So I decided to use this for creating the embeddings.

The job postings data set that I have had some additional data, like company name. So I wanted to embed that so we can use that information when comparing to the user’s query:

# truncate to 8000 characters since more is not likely to yield signal and makes it less likely we'll run into token length issues
# could also do this by using tiktoken and truncating to 8191 tokens for that engine
df['for_embedding'] = df \
  .apply(lambda x: f"Job posting\nCompany: {x['company_name']}\nTitle: {x['title']}\nBody: {x['body'].strip()}"[:8000],
         axis=1)
df['embedding'] = df['for_embedding'].apply(lambda x: cached_get_embedding(x, engine='text-embedding-ada-002'))

Results

For my example query at the beginning of the post (“job posting for remote Ruby on Rails engineer at a startup that values diversity”), the search engine returned the following job posting body as the top result (emphasis mine):

… We are a fast-paced, user-first, technology company that’s passionate about building responsibly. We believe the future of work is a regenerative corporate environment where giving and receiving is in balance. When we build we don’t just think about maximizing profit, we believe you can be wildly profitable while also being socially and environmentally conscious. Our fully-remote team is comprised of 13 awesome people (and quickly growing!) in New York, Texas, and North Carolina. We are committed to developing diverse teams. Our current team is 35% POC and 60% women, and we continuously strive to add more diversity on our team.

Job Requirements and Responsibilities:

  • Strong front end experience and familiarity working in a Rails system
  • Design, build and test end-to-end features using Rails

Candidate Qualifications:

  • Familiarity with our stack: Rails and Angular sitting on top of Heroku using Postgres, Elasticsearch, Redis, and a variety of AWS services.
  • You have startup experience and you enjoy working in small teams

What You Get:

  • Fully remote role, so you can work from home
  • Stock Options

Pretty great fit! (Here’s a link to it, in case you’re interested!)

Some other interesting queries I ran:

“job posting for software engineer at consultancy in Washington State”

The first result was a job posting for consultant in Bellevue, which is in Washington State. The posting didn’t mention Washington State specifically anywhere. This is a good example of something that would be hard to do with traditional document search, but works well with embeddings trained on internet data. There must be some signal in the embeddings that captures the fact that Bellevue is located in Washington State.

“job posting for software engineer at <company name>”

The top results for this were indeed job postings for that company. This reinforces the decision to embed some metadata about the job posting.

“remote machine learning and product engineer”

One useful result had “You’d work on product-oriented research for generative natural language detection, and tackle cutting-edge deep learning and NLP problems with an emphasis on classification and adversarial methods.” Seems interesting!

Queries around eligibility (visa, citizenship, etc.)

Seemed to work OK. It was sometimes hard to tell if it was filtering these or if it just mentioned this. Also was hard to tell sometimes what country the citizenship was referring to.

Asking for specific salary ranges

This didn’t seem to consistently work well. Many postings didn’t list salary information. Also, it would sometimes get confused by other compensation numbers or revenue numbers (“$10M ARR”).

Overall

Overall, this was a fun project and I was impressed with the results. It only cost me a few dollars to create the embeddings, and the search engine was pretty fast. Also, it only took a couple of hours thanks to using an off-the-shelf embedding engine.

Resources

I found the following resources helpful for implementing this approach:

Using TamperMonkey to Clean Up Websites

I’ve written a few Tampermonkey userscripts to improve websites that I regularly use, and I wanted to share some patterns that I have found useful.

Generally I iterate on the scripts in the Tampermonkey Chrome Extension editor, and then push to GitHub for versioning and backup.

Example

A good example is the script to clean up my preferred weather site (Weather Underground). The script removes ads, as well as removing a sidebar that takes up room but doesn’t add much value.

Before:

Before the Tampermonkey userscript

After:

After the Tampermonkey userscript

Setup

Most of the time in these scripts, I’m finding DOM elements to hide or manipulate them. I could use element selectors, but I typically import jQuery to make this easier and more powerful:

// @require      https://code.jquery.com/jquery-3.6.0.min.js

Apparently $ as an alias for jQuery doesn’t automatically work in Tampermonkey, so add it:

const $ = window.$;

The Tampermonkey template for scripts uses an IIFE (Immediately Invoked Function Expression) to avoid polluting the global namespace. I like to add a use strict directive to avoid some simple JavaScript mistakes and log out that the script is running to make debugging a little easier.

(function() {
  console.log('in wunderground script');
  'use strict';
  ...

Hiding page elements

Almost every script I make has a hideStuff() function. As the name implies, it hides elements on the page. Usually this is going to be for elements that I don’t want or need, or for ads that aren’t blocked by my ad blocker.

function hideStuff() {
  // use whole screen for hourly forecast table
  $('.has-sidebar').removeClass('has-sidebar');
  $('.region-sidebar').hide();

  // hide ads
  $('ad-wx-ws').hide();
  $('ad-wx-mid-300-var').hide();
  $('ad-wx-mid-leader').hide();

  // bottom ad content
  $('lib-video-promo').hide();
  $('lib-cat-six-latest-article').hide();
}

I usually call it in a setInterval. This helps handle cases where the page takes a while to load, or in case elements are loaded asynchronously. This could also work well for single-page apps where the page doesn’t reload.

setInterval(hideStuff, 250);

Sometimes if the page loads quickly I’ll put a couple of setTimeouts with small timeouts at the beginning and then a longer setInterval. It doesn’t really cost much either way, so I usually play around with the timing until it works well.

Keyboard shortcuts

I enjoy using keyboard shortcuts to zip around, but many sites don’t have them. In some more advanced scripts, I’ll add key handlers for custom keyboard shortcuts.

For example, here I’ve added shortcuts for the next and previous day, and to switch between the hourly and 10-day forecasts:

$("body").keypress(function(e) {
  if (e.key === '>' || e.key === '.') {
    $('button[aria-label="Next Day"]').click();
  } else if (e.key === '<' || e.key === ',') {
    $('button[aria-label="Previous Day"]').click();
  } else if (e.key === 'd') {
    $('a span:contains("10-Day")').click();
  } else if (e.key === 'h') {
    $('a span:contains("Hourly")').click();
  }
});

This could break if the page structure changes, but most pages don’t change that often. If they do, I’ll just update the script. Overall I feel like this is pretty easy to read.

My Shortcut.com script has a more involved example of this for adding labels and creating stories, including overriding some existing keybindings. For Feedbin, I implemented a way to scroll stories down half a page (only when the keyboard focus is in the “story” pane).

Conclusion

Overall I think this approach works well to make some of my favorite sites more usable.

It would be great to be able to automatically sync Tampermonkey and the Github repo. Has anyone seen an approach that works well for this?