How I Fix Issues On Open Source Projects

Here’s a post detailing how I typically think about fixing issues for open source projects.

Identify an issue

There are two main ways that I identify issues on code repos.

When I evaluate a new (to me) project, I need to figure out whether it will meet my needs. Often this will be functionality, but I also want to determine how well supported the project is. I will usually look at:

  • the README / wiki
  • when the repo was created and when it was last updated
  • a few of the most recent commits
  • the repo’s issues / pull requests tab (including recently closed ones)

These give me a good sense of the level of activity on the project. For example:

  • Is it maintained by one person, a team, or has it been mostly abandoned? (To be fair, some older projects will have lower activity since they are more stable.)
  • Are there are a lot of issues or pull requests that have lingered without resolution?
  • Does it seem to have enough performance, security, or other -ilities?
  • Also, the repo itself may be deprecated or point to other alternatives to consider.

As part of this investigation, I typically skim through some of the open issues to check if there are any critical things that I need to be aware of or that might impact my application. This post has a good example of where this habit was useful for identifying a known issue that impacted my project.

The second way to identify issues is to actually use the code. There are a handful of items that I commonly find:

  • unclear documentation or typos
  • broken documentation links
  • setup instruction improvements
  • unclear error messages
  • incompatibility with other packages or newer language versions

These are all helpful to fix. It helps out other developers for a small time commitment. Observing the amount of small frictions also provides useful information about the repo.

Find or create an issue

My goal at this point is to start or advance a conversation about the issue.

If there’s an existing open issue, I prefer using that to keep things centralized.

If the issue is quick, obvious, or low risk fix, then I would usually create a pull request with that change. It can be quick: fork, make the change (even using the GitHub editor), and then submit a PR.

Otherwise, I typically create a new issue. This gives me a chance to start a conversation around whether this is something that needs to be fixed, whether a pull request would be welcome, etc. Other people might have have run into this issue and know a fix, or will run into it in the future.

If I’m creating the issue, I write up as clear of a description of the issue as I can. General notes:

  • Start by thanking the maintainers for their work and say how the project is useful to you or how it appears promising
  • Describe the issue at a high level
  • Minimal reproduction of the issue
  • Include any system details (other libraries, versions, platform, etc.)
  • If I’ve looked into the code, I may provide some ideas of where the problem lies

If the path to fixing is not clear, I like to ask if they also think this is an issue and whether they’d be open to a fix. This is useful since sometimes I won’t hear back, and sometimes I might learn something that would save time (it’s not a good fit for the project, it’s actually really hard, it is planned for the next version or already released, etc.)

Getting buy-in for a potential fix also increases the speed / likelihood of it eventually getting merged. I think it’s more likely that someone will review your changes if they’ve already agreed they would be useful.

General tone notes:

  • Never be demanding / condescending. Open source is typically unpaid work for someone, and may not be their current focus. The status quo is that most issues are not commented on, or even resolved, so any communication is appreciated.
  • I assume I am fallible and my issue could be something that’s due to my specific setup, ignorance, or other issues.

If you’re responding to an existing issue, you can follow some of these steps as well, especially if they were missing from the original post. I usually try to either add some details or move the conversation along. Things like “+1!” aren’t usually helpful (can use emoji reactions for that kind of feedback.)

Consider fixing the problem

If the problem is a blocker or important and I have an angle to fix, I often try to fix the problem. Sometimes I don’t have enough knowledge. It’s possible that there’s another repo or a fork that will get around the issue, so I might not have to fix it. Even so, opening the issue is useful since someone else might be able to fix it.

Typically the reference to this library from our code will be in a package file like package.json, Gemfile, etc. The eventual goal is to be able to update to the latest version of the library. The more special cases we have in our package file (pointing to a fork, pointing to a specific branch/revision), the harder it will be to upgrade the library and the more likely we are to fall behind on security updates or new features.

In practice, the iteration usually looks like:

  • Point to a fork of the project with the fix
  • Create a PR of the project
  • Point to the master branch of the project once it’s merged
  • Point to the latest revision once it is released

So the first step is to create a fork of the project. You can usually point your main project development environment to that fork for testing. Worst case, we can test the change locally and potentially get a fix weeks or months before it’s officially merged. Best case, our code will get merged and make the world a better place.

Fixing the problem

Usually I first run the tests for the project. This helps make sure that we don’t break the code when fixing our change. If there are broken tests or docs, can fix those and make a separate PR. Here is a godo example of a PR that fixes tests.

When making the fix, I try to consider how to write tests or change tests to cover the behavior in question. It’s typically a faster feedback loop than testing from my main application directly. This also adds to confidence, which helps get a more timely merge.

I like to consider whether there are any potentially breaking changes like API changes. This saves the maintainer a step and develops a bit of empathy for the review process.

When creating a pull request, reference the original issue. This will usually save time since the issue should be documented and you can discuss the solution and any tradeoffs.

Building confidence

After you’ve made a good change, the outcome of the rest of the project is more about communication and trust-building. This includes the communication before the fix is ready.

My general approach involves imagining I am the maintainer of a system that hundreds or thousands of projects depend on. The projects want the next release of the project to work! And I am on the hook if something goes wrong. What message would you rather receive?

  1. Here are some changes to fix X. It works on my machine!

  2. Here are some changes to fix X. Here’s what I did and here’s why. I believe this should work since I added some tests and have been running this on my production project for the last couple of days with no issues. One thing I’m not sure about is Y, but I think this should not be a blocker given that it is also an issue in Z. I believe this does not contain any breaking API changes.

I far prefer the second message, as it demonstrates more thoughtfulness and knowledge of the system. (This reminds me a bit of some of the principles of “Turn the Ship Around!”)

Shepherd the PR

Sometimes your PR will be accepted quickly. But the reality is that most open source maintainers are busy or this project may not be a high priority for them.

So your goal at this point is to shepherd the PR through the merge process. Typically this is going to look like:

  • responding to review comments / questions / requested revisions
  • always having a next step or responsible person identified

If someone says they’ll look at it tonight, tomorrow, this weekend, etc., I like to follow up with them a day or two after the last date of the range passes. This gives them a bit of leeway so it doesn’t feel like you’re hounding them, while still being clear that they own the next step so it doesn’t fall through the cracks.

Also, you may need to follow up on getting a release. For example, going from v1.3 -> v1.4 in the library. Just merging to the main branch is often insufficient, since often people will point to the latest released version of the code.

Categories: main

« Caching OpenAI Embeddings API Calls In-memory

Comments