Python正则表达式：我用这10个模式，搞定了90%的文本处理需求

2026-02-28

大家好，我是正在实战各种AI项目的程序员晚枫。

今天聊一个让新手望而生畏、但学会后威力无穷的技能——正则表达式（Regular Expression）。

你可能觉得正则很难记、很晦涩。但其实只要掌握最常用的10个模式，就能搞定90%的文本处理需求。

这篇文章总结了我在数据处理中最常用的正则技巧，帮你快速上手。

为什么要学正则？

假设你要从一段文本中提取所有邮箱地址：

# 不用正则（痛苦）
def extract_emails(text):
    emails = []
    # ... 写几十行代码来处理各种情况
    return emails

# 用正则（一行搞定）
import re
emails = re.findall(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', text)

正则就是文本处理的瑞士军刀。

Python中的正则模块

import re

# 常用函数
re.match()     # 从开头匹配
re.search()    # 搜索第一个匹配
re.findall()   # 找到所有匹配
re.sub()       # 替换
re.split()     # 分割

10个必备正则模式

模式1：匹配数字

import re

# 匹配整数
re.findall(r'\d+', 'Age: 25, Score: 90')  # ['25', '90']

# 匹配小数
re.findall(r'\d+\.\d+', 'Price: 19.99')  # ['19.99']

# 匹配负数
re.findall(r'-?\d+', 'Temp: -5 to 30')  # ['-5', '30']

模式2：匹配邮箱

pattern = r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'

text = "Contact us at support@example.com or sales@company.co.uk"
emails = re.findall(pattern, text)
print(emails)  # ['support@example.com', 'sales@company.co.uk']

模式3：匹配手机号（中国大陆）

pattern = r'1[3-9]\d{9}'

text = "Call me at 13800138000 or 15912345678"
phones = re.findall(pattern, text)
print(phones)  # ['13800138000', '15912345678']

模式4：匹配URL

pattern = r'https?://[^\s<>"{}|\\^`[\]]+'

text = "Visit https://www.example.com or http://test.org"
urls = re.findall(pattern, text)
print(urls)  # ['https://www.example.com', 'http://test.org']

模式5：提取HTML标签内容

# 提取title标签内容
pattern = r'<title>(.*?)</title>'

html = "<title>My Website</title>"
match = re.search(pattern, html)
if match:
    print(match.group(1))  # My Website

模式6：验证密码强度

def check_password(password):
    """至少8位，包含大小写字母和数字"""
    pattern = r'^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$'
    return bool(re.match(pattern, password))

print(check_password("Hello123"))  # True
print(check_password("hello"))     # False

模式7：格式化字符串

# 将驼峰命名转为下划线命名
def camel_to_snake(name):
    pattern = r'(?<!^)(?=[A-Z])'
    return re.sub(pattern, '_', name).lower()

print(camel_to_snake("myVariableName"))  # my_variable_name

模式8：清理文本

text = "  Hello!!!   World???  "

# 去除多余空格
cleaned = re.sub(r'\s+', ' ', text).strip()
print(cleaned)  # "Hello!!! World???"

# 去除标点
no_punct = re.sub(r'[^\w\s]', '', cleaned)
print(no_punct)  # "Hello World"

模式9：解析日志

log_line = "2024-01-15 10:30:45 ERROR Connection timeout"

pattern = r'(\d{4}-\d{2}-\d{2}) (\d{2}:\d{2}:\d{2}) (\w+) (.+)'
match = re.match(pattern, log_line)

if match:
    date, time, level, message = match.groups()
    print(f"[{level}] {date} {time}: {message}")
    # [ERROR] 2024-01-15 10:30:45: Connection timeout

模式10：批量重命名

import os

# 将所有 .txt 文件加上日期前缀
for filename in os.listdir('.'):
    if filename.endswith('.txt'):
        new_name = re.sub(r'^(.*)\.txt$', r'2024_\1.txt', filename)
        os.rename(filename, new_name)

正则语法速查表

符号	含义
`.`	任意字符（除换行）
`\d`	数字 [0-9]
`\w`	单词字符 [a-zA-Z0-9_]
`\s`	空白字符
`*`	0次或多次
`+`	1次或多次
`?`	0次或1次
`{n}`	恰好n次
`{n,m}`	n到m次
`^`	字符串开头
`$`	字符串结尾
`[]`	字符集
`()`	分组
`\|`	或

Python正则表达式：我用这10个模式，搞定了90%的文本处理需求

为什么要学正则？

Python中的正则模块

10个必备正则模式

模式1：匹配数字

模式2：匹配邮箱

模式3：匹配手机号（中国大陆）

模式4：匹配URL

模式5：提取HTML标签内容

模式6：验证密码强度

模式7：格式化字符串

模式8：清理文本

模式9：解析日志

模式10：批量重命名

正则语法速查表

推荐：AI Python零基础实战营

相关阅读