Home / Guides

Inside the NPPES database: how the U.S. provider registry is structured

Updated June 19, 2026

drfind is built openly on NPPES — the public registry that every U.S. health care provider is in. We don’t hide that; we think it’s the honest way to do it. But the raw data is a very different thing from the clean answer you get here. This is a first-hand tour of what the NPPES database actually is — file by file and column by column — and why we built a single search box in front of it.

What NPPES is

NPPES — the National Plan and Provider Enumeration System — is run by the Centers for Medicare & Medicaid Services (CMS). It assigns and stores the National Provider Identifier (NPI) for every U.S. provider and organization. CMS publishes the entire registry as the NPPES Data Dissemination: a free, public, FOIA-disclosable download of 8M+ active NPIs. It is the authoritative source — everything downstream, drfind included, ultimately derives from it.

The download is one zip — and many files

The monthly “full replacement” file is a ~1 GB ZIP that unzips to roughly 9 GB. Inside isn’t one tidy spreadsheet — it’s a bundle:

File Approx. size What it holds
npidata_pfile~9 GBThe main file — one row per NPI, 330+ columns
pl_pfile~100 MBAdditional practice locations
othername_pfile~50 MBOther / former organization names
endpoint_pfile~120 MBHealth-IT (FHIR) endpoints
CodeValues.pdf~3 MBWhat the coded values mean — as a PDF

There are also weekly incremental (“delta”) files. And one thing you’ll need is conspicuously not in the zip: the NUCC taxonomy code set that turns specialty codes into words — you have to fetch that separately from nucc.org.

The main file: 330+ columns, mostly empty

The npidata file is one row per provider and over 330 columns wide — the vast majority blank for any given provider. The columns cluster into a handful of groups:

Column group What’s in it
IdentityNPI, and Entity Type Code (1 = individual, 2 = organization)
NamesLegal name or org name, prefix/suffix, credential, gender
Taxonomy ×15Up to 15 specialty slots — each with a code, license number, state, and a “primary” flag (~75 columns)
Practice addressTwo lines, city, state, ZIP, phone, fax
Mailing addressA second, separate address block
Authorized officialFor organizations — name, title, phone
Dates & statusEnumeration date, last update, deactivation and reactivation dates
Legacy identifiersUp to 50 old “other provider identifier” slots — most of the column count lives here

Why the raw data is hard to use

This is the honest part — the things that make a casual “just download it” turn into a project:

  1. It’s ~9 GB. It won’t open in Excel. You have to stream-parse it in code.
  2. Specialties are codes, not words. A pediatrician is 208000000X; a cardiologist 207RC0000X. To show a label you need the separate NUCC crosswalk (~870 codes) — which isn’t in the download.
  3. Up to 15 taxonomies per provider. You have to find the primary one via a parallel “Primary Taxonomy Switch” column.
  4. “Active” is computed, not a field. A provider counts as active unless they have a deactivation date with no later reactivation date — you derive status from two columns.
  5. People and organizations share one file. Type 1 uses name fields; Type 2 uses an org-name field; your parser has to branch on entity type.
  6. It’s split across files. Extra practice locations, other names, and endpoints all live in separate files you’d have to join on NPI.
  7. It goes stale. A full file monthly plus weekly deltas means staying current is a recurring ingest job, not a one-time download.
  8. The official lookup is one record at a time. npiregistry.cms.hhs.gov is authoritative, but it’s a multi-field form that returns a single result — no plain-English search, no “every cardiologist in this city.”

How drfind handles all of this

We do the unglamorous part so you don’t. We stream-parse the ~9 GB file, resolve every NUCC code to a plain-English specialty, pick each provider’s primary taxonomy, compute active status, keep individuals and organizations straight, and refresh against the weekly and monthly files. Then we put one box in front of all of it. You type “pediatricians in Houston, TX” — or paste an NPI — and get a clean, current answer. The exact same official data, none of the plumbing.

Search the registry → For teams

FAQ

Where does the raw data come from?

The CMS NPPES Data Dissemination, a free public download. It is FOIA-disclosable government data.

How big is it?

About a 1 GB ZIP that unzips to roughly 9 GB — one main file of 330+ columns and 8M+ rows, plus several side files.

How often does it change?

CMS publishes a full replacement file monthly and incremental "delta" files weekly.

Is drfind official?

No. drfind is independent. We mirror the official source and link back to it; for authoritative checks, use npiregistry.cms.hhs.gov.