Pyrefly v1.0 is here!

Today we are pleased to share that Pyrefly, our open source type checker and language server for Python, has reached stable version 1 status, meaning we are confident that Pyrefly is ready for production use.

Today we are pleased to share that Pyrefly, our open source type checker and language server for Python, has reached stable version 1 status, meaning we are confident that Pyrefly is ready for production use.
A type checker, as its name suggests, catches type mismatches: things like passing a str to a function that expects an int. But to understand your code's types, a type checker also has to understand its structure: control flow, scoping, class hierarchies, and more. This lets it detect a surprisingly wide range of issues that have nothing to do with int vs. str.
Here are five real categories of bugs that Pyrefly catches, none of which are straightforward type mismatches.
Coding agents are writing more Python than ever. Tools like Claude, Copilot, Cursor, and Codex generate entire features with little-to-no user interaction. But in large projects, this generated code is prone to type errors, mismatched signatures, and subtle API misuse. Incorporating static analysis directly into the agentic loop can mean the difference between returning from your break with a production-ready feature or needing several more correction cycles.
Type checking sits right in the sweet spot for agents. It's fast enough for iterating small fixes, robust enough to catch issues of varying complexity, and actionable enough for an agent to make changes. In this post, we walk through how to integrate Pyrefly into your agentic workflow so that every piece of generated code can get type checked automatically.
TL;DR: We recommend:
AGENTS.md directive to ensure the project checks clean before finishing a feature.
We frequently hear from developers who are excited about the new generation of checkers (Ty and Pyrefly) and want to know how they stack up against each other and the existing, established tools (Mypy and Pyright). In this comparison, we'll focus purely on performance (time to run a full check) and talk a little bit about how design choices, architecture, and features impact that latency.
Evaluating a type checker's performance presents a challenge due to many variables, including diverse evaluation metrics and varying results across different operating systems and hardware configurations. Furthermore, unlike the official test suite for typing specification conformance, there is no universally adopted benchmark for performance used by all type checker maintainers.
Nonetheless, in this blog post, we will attempt to compare speed and memory usage when checking several dozen packages from the command line. We use this performance data to catch regressions in Pyrefly changes that impact OSS packages — we previously only measured type checking performance on internal projects with Pyre1.
Before we start, we'd like to caution that these numbers are only a snapshot at the time of publication and will be out of date quickly. Performance numbers can swing wildly from release to release, because the type checkers are under active development.
Jupyter notebooks have become an essential tool for Python developers. Their interactive, cell-based workflow makes them ideal for rapid prototyping, data exploration, and scientific computing: areas where you want to tweak a small part of the code and see the updated results inline, without waiting for the whole program to run. Notebooks are the primary way many data scientists and ML engineers write Python, and interactive workflows are highlighted in new data science oriented IDEs like Positron.
But notebooks have historically been second-class citizens when it comes to IDE features. Language servers, which implement the Language Server Protocol (LSP) to provide features like go-to-definition, hover, and diagnostics across editors, were designed with regular source files in mind. The language server protocol did not include notebook synchronization methods until five years after it was created, and the default Jupyter Notebook experience is missing many of the aforementioned IDE features.
In this post, we'll discuss how language servers have been adapted to work with notebooks, how the LSP spec evolved to support them natively, and how we implemented notebook support in Pyrefly.

At Pyrefly, we've always believed that type coverage is one of the most important indicators of code quality. Over the past year, we've worked closely with teams across large Python codebases here at Meta - improving performance, tightening soundness, and making type checking a seamless part of everyday development.
But one question kept coming up: What would it take to reach 100% type coverage?
Today, we're excited to share a breakthrough.
Pyrefly is a next-generation Python type checker and language server, designed to be extremely fast and featuring advanced refactoring and type inference capabilities. This isn’t the Pyrefly team’s first time building a type checker for Python: Pyrefly is a successor to Pyre, the previous type checker our team developed.
A lot of Pyrefly’s design comes directly from our experience with Pyre. Some things worked well at scale, while other things were harder to live with day-to-day. After running a type checker on massive Python codebases for a long time, we got a clearer sense of which trade-offs actually mattered to users.
This post is a write-up of a few lessons from Pyre that influenced how we approached Pyrefly.
When you write typed Python, you expect your type checker to follow the rules of the language. But how closely do today's type checkers actually follow the Python typing specification?
In this post, we look at what typing spec conformance means, how different type checkers compare, and what the conformance numbers don't tell you.
At time of writing, pandas is one of the most widely used Python libraries. It is downloaded about half-a-billion times per month from PyPI, is supported by nearly all Python data science packages, and is generally required learning in data science curriculums. Despite modern alternatives existing, pandas' impact cannot be minimised or understated.
In order to improve the developer experience for pandas' users across the ecosystem, we at Quansight Labs (with support from the Pyrefly team at Meta) decided to focus on improving pandas' typing. Why? Because better type hints mean:
By supporting the pandas community, pandas' public API is now type-complete (as measured by Pyright), up from 47% when we started the effort last year. We'll tell the story of how it happened - but first, we need to talk more about type completeness, and how we measure it.
Empty containers like [] and {} are everywhere in Python. It's super common to see functions start by creating an empty container, filling it up, and then returning the result.
Take this, for example:
def my_func(ys: dict[str, int]):
x = {}
for k, v in ys.items():
if some_condition(k):
x.setdefault("group0", []).append((k, v))
else:
x.setdefault("group1", []).append((k, v))
return x
This seemingly innocent coding pattern poses an interesting challenge for Python type checkers. Normally, when a type checker sees x = y without a type hint, it can just look at y to figure out x's type. The problem is, when y is an empty container (like x = {} above), the checker knows it's a list or a dict, but has no clue what's going inside.
The big question is: How is the type checker supposed to analyze the rest of the function without knowing x's type?