\b[0-9BCDFGHX][0-9A-HJ-NP-Z]{6}\b
False positives exist! E.g. “G0BBLDGˮ would be identified as an ICD10 PCS code with the expression even though it isnʼt one.
But no false negatives (tested against all 2023 CMS-approved codes):
In [13]: import re
In [14]: with open("icd10pcs_codes_2023.txt") as pcs:
...: pcs_codes = pcs.readlines()
...:
In [15]: len(pcs_codes)
Out[15]: 78530
In [16]: matches = 0
In [17]: for line in pcs_codes:
...: if re.search(r'\b[0-9BCDFGHX][0-9A-HJ-NP-Z]{6}\b',
...: ...:
In [18]: matches
Out[18]: 78530
matches += 1
\b[A-TV-Z]\d[A-Z\d]\.?[A-Z\d]{0,4}\b
This follows the specification*:
- 3 - 7 characters
- Character 1 is alpha (all letters except U are used) Character 2 is numeric
- Characters 3 7 are alpha or numeric
- Use of decimal after 3 characters
Brute force validated as well, though there were 3 false negatives:
In [1]: import re
In [2]: with open("icd10cm_codes_2024.txt") as cm:
...: icd10cm = cm.readlines()
...:
In [3]: len(icd10cm)
Out[3]: 74044
In [5]: matches = 0
In [6]: for line in icd10cm:
...: if re.search(r'\b[A-TV-Z]\d[A-Z\d]\.?[A-Z\d]{0,4}\b
...: matches += 1
...: