Python 的标准库是其「自带电池」哲学的最佳体现。从数据结构增强、函数式编程工具、文件路径处理,到并发编程、子进程管理、数据序列化、日志系统等,Python 标准库提供了生产级质量的模块,无需安装第三方依赖即可直接使用。本文系统梳理日常开发中最常用的十余个标准库模块,按照数据结构、函数工具、文件系统、并发、系统交互、数据格式、日志与配置、类型系统等维度组织,每个模块提供详细的 API 说明和实战示例。
一、collections —— 增强的数据结构 collections 模块提供了 Python 内置容器类型(list、dict、tuple、set)之外的专业数据结构,这些类型在特定场景下比内置类型更高效、语义更清晰。
1.1 namedtuple —— 轻量级数据类 namedtuple 创建的类兼具 tuple 的不可变性和对象式字段访问的便利性,内存开销远小于普通类。它适合表示坐标、RGB 颜色、数据库记录等轻量数据结构。
from collections import namedtuplePoint = namedtuple('Point' , ['x' , 'y' , 'z' ]) Point = namedtuple('Point' , 'x y z' ) p = Point(1 , 2 , 3 ) p = Point(x=1 , y=2 , z=3 ) print (p.x, p[0 ]) print (p.y, p[1 ]) print (p.z, p[2 ]) x, y, z = p print (p._fields) print (p._asdict()) p2 = p._replace(x=10 ) Point = namedtuple('Point' , 'x y z' , defaults=(0 , 0 )) p = Point(1 ) p = Point._make([1 , 2 , 3 ])
生产环境中的典型用法:
User = namedtuple('User' , 'id name email created_at' ) users = [User(1 , 'Alice' , 'a@example.com' , '2024-01-01' ), User(2 , 'Bob' , 'b@example.com' , '2024-01-02' )] print (users[0 ].name) from typing import NamedTupleclass User (NamedTuple ): id : int name: str email: str created_at: str
1.2 Counter —— 计数器 Counter 是 dict 的子类,用于计数。它对可哈希对象的出现次数进行统计:
from collections import Counterwords = ['apple' , 'banana' , 'apple' , 'orange' , 'banana' , 'apple' ] c = Counter(words) print (c) c = Counter('abracadabra' ) print (c) c = Counter({'red' : 4 , 'blue' : 2 }) c = Counter(red=4 , blue=2 ) print (c.most_common(2 )) print (c['green' ]) c.update(['red' , 'green' ]) print (c['red' ]) c['green' ] -= 1 print (c['green' ]) c += Counter() c1 = Counter(a=3 , b=1 ) c2 = Counter(a=1 , b=2 ) print (c1 + c2) print (c1 - c2) print (c1 & c2) print (c1 | c2)
实际场景:日志分析中统计 IP 频率、文本词频统计等。
1.3 deque —— 双端队列 deque(double-ended queue)支持在两端以 O(1) 的时间复杂度插入和删除元素,特别适合实现队列、栈和滑动窗口:
from collections import dequedq = deque([1 , 2 , 3 ], maxlen=10 ) dq.append(4 ) dq.appendleft(0 ) dq.pop() dq.popleft() dq = deque(maxlen=3 ) dq.extend([1 , 2 , 3 , 4 , 5 ]) print (dq) dq.extend([6 , 7 ]) dq.extendleft([1 , 2 ]) dq = deque([1 , 2 , 3 , 4 , 5 ]) dq.rotate(2 ) dq.rotate(-1 )
一个经典的滑动窗口实现(计算移动平均):
from collections import dequedef moving_average (iterable, window_size ): it = iter (iterable) dq = deque(maxlen=window_size) result = [] for x in it: dq.append(x) if len (dq) == window_size: result.append(sum (dq) / window_size) return result import randomdata = [random.randint(0 , 100 ) for _ in range (20 )] print (moving_average(data, 5 ))
1.4 defaultdict —— 带默认值的字典 defaultdict 在访问不存在的 key 时,自动调用提供的工厂函数生成默认值,避免了反复的 if key in d 检查:
from collections import defaultdictd = defaultdict(int ) d['a' ] += 1 print (d['a' ]) print (d['b' ]) d = defaultdict(list ) d['users' ].append('Alice' ) d['users' ].append('Bob' ) print (d) d = defaultdict(set ) d['tags' ].add('python' ) d['tags' ].add('numpy' ) d = defaultdict(lambda : 'unknown' ) print (d['missing' ]) items = [('fruit' , 'apple' ), ('fruit' , 'banana' ), ('veg' , 'carrot' ), ('fruit' , 'orange' )] groups = defaultdict(list ) for category, item in items: groups[category].append(item) print (dict (groups))def nested_dict (): return defaultdict(nested_dict) tree = nested_dict() tree['company' ]['department' ]['team' ] = 'AI' print (tree['company' ]['department' ]['team' ])
1.5 OrderedDict —— 保序字典 在 Python 3.7+ 中,普通 dict 已经保证插入顺序。但 OrderedDict 仍提供了一些 dict 没有的方法:
from collections import OrderedDictod = OrderedDict() od['a' ] = 1 od['b' ] = 2 od['c' ] = 3 od.move_to_end('a' ) print (list (od.keys())) od.move_to_end('a' , last=False ) print (list (od.keys())) od.popitem() od.popitem(last=False ) od1 = OrderedDict([('a' , 1 ), ('b' , 2 )]) od2 = OrderedDict([('b' , 2 ), ('a' , 1 )]) print (od1 == od2) print (dict (od1) == dict (od2))
1.6 ChainMap —— 链式映射 ChainMap 将多个映射(字典)串联起来,形成一个逻辑视图。查找时按顺序依次搜索,但不复制数据:
from collections import ChainMapdefaults = {'host' : 'localhost' , 'port' : 8080 , 'debug' : False } env = {'host' : 'prod.example.com' , 'port' : 443 } cli = {'debug' : True } config = ChainMap(cli, env, defaults) print (config['host' ]) print (config['port' ]) print (config['debug' ]) print (config.maps) config = config.new_child({'port' : 9090 }) print (config['port' ]) print (config.parents['port' ])
itertools 提供了一系列用于构建和组合迭代器的函数,分为三类:无限迭代器、终止于最短输入序列的迭代器、组合生成迭代器。这些函数全部用 C 实现,速度极快,且内存效率极高(惰性求值)。
2.1 无限迭代器 import itertoolscounter = itertools.count(10 , 2 ) print ([next (counter) for _ in range (5 )]) colors = itertools.cycle(['r' , 'g' , 'b' ]) print ([next (colors) for _ in range (7 )]) print (list (itertools.repeat('x' , 5 )))
2.2 链式与分组 a = [1 , 2 , 3 ] b = [4 , 5 ] c = [6 , 7 , 8 ] print (list (itertools.chain(a, b, c))) nested = [[1 , 2 ], [3 , 4 ], [5 ]] print (list (itertools.chain.from_iterable(nested))) data = [('A' , 1 ), ('A' , 2 ), ('B' , 3 ), ('B' , 4 ), ('C' , 5 )] for key, group in itertools.groupby(data, key=lambda x: x[0 ]): print (key, list (group)) unsorted = [('A' , 2 ), ('B' , 3 ), ('A' , 1 )] for key, group in itertools.groupby(unsorted, lambda x: x[0 ]): print (key, list (group)) unsorted.sort(key=lambda x: x[0 ]) for key, group in itertools.groupby(unsorted, lambda x: x[0 ]): print (key, list (group))
2.3 笛卡尔积与排列组合 import itertoolsprint (list (itertools.product([1 , 2 ], ['A' , 'B' ])))print (list (itertools.product([1 , 2 ], repeat=3 )))print (list (itertools.permutations([1 , 2 , 3 ], 2 )))print (list (itertools.combinations([1 , 2 , 3 ], 2 )))print (list (itertools.combinations_with_replacement([1 , 2 , 3 ], 2 )))
2.4 实用工具:islice, accumulate, takewhile, dropwhile from itertools import islicegen = (x**2 for x in range (100 )) print (list (islice(gen, 10 , 20 ))) counter = itertools.count() print (list (islice(counter, 5 , 10 ))) import itertools, operatora = [1 , 2 , 3 , 4 , 5 ] print (list (itertools.accumulate(a))) print (list (itertools.accumulate(a, operator.mul))) print (list (itertools.accumulate([3 ,1 ,4 ,1 ,5 ,9 ], max ))) data = [1 , 3 , 5 , 7 , 2 , 4 , 6 ] print (list (itertools.takewhile(lambda x: x < 6 , data))) print (list (itertools.dropwhile(lambda x: x < 6 , data))) print (list (itertools.filterfalse(lambda x: x % 2 , range (10 )))) print (list (itertools.pairwise([1 , 2 , 3 , 4 ]))) print (list (itertools.starmap(pow , [(2 ,5 ), (3 ,2 ), (10 ,3 )]))) it = iter ([1 , 2 , 3 , 4 ]) it1, it2 = itertools.tee(it, 2 ) print (list (it1)) print (list (it2))
2.5 zip_longest from itertools import zip_longesta = [1 , 2 , 3 ] b = ['a' , 'b' ] print (list (zip (a, b))) print (list (zip_longest(a, b, fillvalue='MISSING' )))
functools 是函数式编程的瑞士军刀,提供了对函数进行装饰、包装、部分应用和缓存的能力。
3.1 lru_cache —— 函数结果缓存 lru_cache 是性能优化的利器。它记录函数的输入和输出,当相同参数再次调用时直接返回缓存结果,跳过计算:
from functools import lru_cacheimport time@lru_cache(maxsize=128 ) def fibonacci (n ): """不使用缓存,fib(35) 需要数秒;使用后瞬间完成""" if n <= 1 : return n return fibonacci(n - 1 ) + fibonacci(n - 2 ) start = time.perf_counter() print (fibonacci(200 )) print (time.perf_counter() - start)print (fibonacci.cache_info()) fibonacci.cache_clear() @lru_cache(maxsize=None ) def compute_expensive (x ): pass
缓存机制在算法竞赛和回溯/DP 场景中极其实用,等同于自动化的 memoization。
3.2 partial —— 部分应用 partial 预先填入一个函数的部分参数,返回一个接受了其余参数的新函数。这在回调和多参数固定中非常有用:
from functools import partialdef power (base, exponent ): return base ** exponent square = partial(power, exponent=2 ) cube = partial(power, exponent=3 ) print (square(5 )) print (cube(5 )) import concurrent.futuresdef process (url, timeout, retries ): return len (url) with concurrent.futures.ThreadPoolExecutor() as executor: process_with_args = partial(process, timeout=10 , retries=3 ) results = executor.map (process_with_args, ['http://a.com' , 'http://b.com' ]) from functools import partialmethodclass Server : def request (self, method, path, **kwargs ): print (f"{method} {path} {kwargs} " ) get = partialmethod(request, 'GET' ) post = partialmethod(request, 'POST' ) s = Server() s.get('/users' ) s.post('/login' , body='{...}' )
3.3 reduce —— 归约 from functools import reduceimport operatordata = [1 , 2 , 3 , 4 , 5 ] print (reduce(operator.add, data)) print (reduce(operator.mul, data)) print (reduce(lambda a, b: a * b, data, 1 )) def flatten_one_level (acc, item ): if isinstance (item, list ): acc.extend(item) else : acc.append(item) return acc nested = [[1 , 2 ], 3 , [4 , 5 ], 6 ] print (reduce(flatten_one_level, nested, []))
3.4 wraps —— 保持被装饰函数的元信息 编写装饰器时,如果不使用 wraps,被装饰函数的 __name__、__doc__、__module__ 等元信息会被替换为装饰器内部函数的:
from functools import wrapsdef timing (func ): @wraps(func ) def wrapper (*args, **kwargs ): import time start = time.perf_counter() result = func(*args, **kwargs) elapsed = time.perf_counter() - start print (f"{func.__name__} took {elapsed:.4 f} s" ) return result return wrapper @timing def slow_sum (n ): """Sum numbers from 1 to n.""" return sum (range (1 , n + 1 )) print (slow_sum.__name__) print (slow_sum.__doc__)
3.5 cached_property (Python 3.8+) 将方法结果缓存为实例属性,在初次计算后后续访问直接返回缓存值:
from functools import cached_propertyclass Dataset : def __init__ (self, filepath ): self .filepath = filepath @cached_property def data (self ): """只读取和解析一次,结果缓存在 self.__dict__ 中""" print ("Loading and parsing data..." ) import json with open (self .filepath) as f: return json.load(f) ds = Dataset('data.json' ) print (ds.data) print (ds.data)
cached_property 与 property + lru_cache 的区别在于它直接修改 __dict__,在实例作用域缓存,可被实例属性覆盖。
3.6 singledispatch —— 单分派泛型函数 from functools import singledispatch@singledispatch def serialize (obj ): """默认行为:不支持的类型""" raise TypeError(f"Unsupported type: {type (obj)} " ) @serialize.register(int ) def _ (obj ): return str (obj) @serialize.register(float ) def _ (obj ): return f"{obj:.6 f} " @serialize.register(list ) def _ (obj ): return '[' + ', ' .join(serialize(x) for x in obj) + ']' @serialize.register(dict ) def _ (obj ): return '{' + ', ' .join(f'{k} : {serialize(v)} ' for k,v in obj.items()) + '}' print (serialize(42 )) print (serialize([1 , 2.5 , 3 ]))
四、pathlib —— 面向对象的路径处理 pathlib 从 Python 3.4 起引入,提供了 Path 类来替代 os.path 的字符串操作。Path 对象重载了 / 运算符用于路径拼接,天然跨平台(Windows 使用 \,Unix 使用 /)。
4.1 基本操作 from pathlib import Pathp = Path('/home/user/Documents/report.pdf' ) print (p.name) print (p.stem) print (p.suffix) print (p.suffixes) print (p.parent) print (p.parts) print (p.anchor) print (p.as_uri()) data_dir = Path('/data' ) train_dir = data_dir / 'train' / 'images' print (train_dir) print (p.with_suffix('.txt' )) print (p.with_stem('summary' )) cwd = Path.cwd() home = Path.home()
4.2 文件与目录操作 p = Path('some_file.txt' ) print (p.exists()) print (p.is_file()) print (p.is_dir()) print (p.is_symlink()) if p.exists(): stat = p.stat() print (stat.st_size) print (stat.st_mtime) from datetime import datetime print (datetime.fromtimestamp(stat.st_mtime)) dir_path = Path('a/b/c' ) dir_path.mkdir(parents=True , exist_ok=True ) content = p.read_text(encoding='utf-8' ) lines = p.read_text().splitlines() p.write_text('Hello, World!' , encoding='utf-8' ) data = p.read_bytes() for item in Path('.' ).iterdir(): print (item) for py_file in Path('.' ).rglob('*.py' ): print (py_file) for py_file in Path('.' ).glob('**/*.py' ): print (py_file) for f in Path('.' ).rglob('data/**/*.csv' ): print (f) p.unlink() p.unlink(missing_ok=True ) dir_path.rmdir() import shutilshutil.rmtree(dir_path)
4.3 os.path vs pathlib 对照
操作
os.path
pathlib
拼接
os.path.join('a', 'b')
Path('a') / 'b'
当前目录
os.getcwd()
Path.cwd()
是否存在
os.path.exists(p)
Path(p).exists()
是否文件
os.path.isfile(p)
Path(p).is_file()
是否目录
os.path.isdir(p)
Path(p).is_dir()
文件名
os.path.basename(p)
Path(p).name
父目录
os.path.dirname(p)
Path(p).parent
扩展名
os.path.splitext(p)[1]
Path(p).suffix
解绝对路径
os.path.abspath(p)
Path(p).resolve()
路径遍历的实际例子:
from pathlib import Pathfrom collections import Counterc = Counter(p.suffix for p in Path('/usr/lib' ).rglob('*' ) if p.is_file()) print (c.most_common(5 ))
五、concurrent.futures —— 并发执行 concurrent.futures 提供了高层次的异步执行接口,通过 ThreadPoolExecutor(线程池)和 ProcessPoolExecutor(进程池)统一了多线程和多进程的编程模型。
5.1 ThreadPoolExecutor 适用于 I/O 密集型任务(网络请求、文件读写、数据库操作等),但由于 GIL 的存在,CPU 密集型任务不能通过多线程加速:
from concurrent.futures import ThreadPoolExecutor, as_completedimport timedef download (url ): """模拟 I/O 密集型任务""" time.sleep(0.5 ) return f"Downloaded {url} " urls = ['http://example.com/a' , 'http://example.com/b' , 'http://example.com/c' , 'http://example.com/d' ] with ThreadPoolExecutor(max_workers=4 ) as executor: results = list (executor.map (download, urls)) for url, result in zip (urls, results): print (f"{url} → {result} " ) with ThreadPoolExecutor(max_workers=4 ) as executor: future_to_url = {executor.submit(download, url): url for url in urls} for future in as_completed(future_to_url): url = future_to_url[future] try : result = future.result() print (f"{url} → {result} " ) except Exception as exc: print (f"{url} 生成异常: {exc} " ) with ThreadPoolExecutor(max_workers=4 ) as executor: futures = [executor.submit(download, url) for url in urls] for future, url in zip (futures, urls): print (f"{url} → {future.result()} " )
5.2 ProcessPoolExecutor 适用于 CPU 密集型任务(数值计算、图像处理等)。每个进程有独立的 Python 解释器和 GIL,真正实现了并行:
from concurrent.futures import ProcessPoolExecutorimport mathdef cpu_intensive (n ): """计算素数的个数 — CPU 密集型""" count = 0 for num in range (2 , n): is_prime = True for i in range (2 , int (math.sqrt(num)) + 1 ): if num % i == 0 : is_prime = False break if is_prime: count += 1 return count numbers = [10000 , 15000 , 20000 , 25000 , 30000 , 35000 ] with ProcessPoolExecutor(max_workers=4 ) as executor: results = executor.map (cpu_intensive, numbers) for n, count in zip (numbers, results): print (f"n={n} : {count} primes" )
5.3 GIL 的影响 Python 的全局解释器锁(GIL)确保同一时刻只有一个线程执行 Python 字节码。这意味着:
I/O 操作 会释放 GIL,因此多线程在 I/O 密集场景下可以显著加速。
CPU 密集的纯 Python 代码 持有 GIL,多线程无法并行,甚至可能因上下文切换变慢。应该使用 ProcessPoolExecutor。
NumPy / TensorFlow 等 C 扩展 在执行 C 代码时会释放 GIL,因此 NumPy 多线程操作确实可以受益。
import timeimport threadingdef countdown (n ): while n > 0 : n -= 1 start = time.perf_counter() countdown(5_000_000 ) countdown(5_000_000 ) print (f"串行: {time.perf_counter() - start:.3 f} s" )start = time.perf_counter() t1 = threading.Thread(target=countdown, args=(5_000_000 ,)) t2 = threading.Thread(target=countdown, args=(5_000_000 ,)) t1.start(); t2.start() t1.join(); t2.join() print (f"并行: {time.perf_counter() - start:.3 f} s" )
5.4 处理超时与取消 from concurrent.futures import ThreadPoolExecutor, TimeoutErrordef potentially_slow (n ): import time time.sleep(n) return f"Done after {n} s" with ThreadPoolExecutor() as executor: future = executor.submit(potentially_slow, 10 ) try : result = future.result(timeout=2 ) except TimeoutError: print ("超时了,取消任务" ) future.cancel()
六、subprocess —— 进程管理 subprocess 模块替代了旧式的 os.system 和 os.spawn* 系列函数,提供了完整的子进程创建、I/O 管理和状态控制能力。
6.1 run —— 推荐的高级 API 自 Python 3.5 起,subprocess.run 是运行子进程的推荐方式:
import subprocessresult = subprocess.run(['ls' , '-la' , '/tmp' ], capture_output=True , text=True ) print (result.returncode) print (result.stdout) print (result.stderr) result = subprocess.run(['git' , 'status' ], capture_output=True , text=True ) if result.returncode == 0 : print (result.stdout) else : print (f"失败: {result.stderr} " ) try : subprocess.run(['false' ], check=True ) except subprocess.CalledProcessError as e: print (f"命令失败,返回码: {e.returncode} " ) try : subprocess.run(['sleep' , '10' ], timeout=2 ) except subprocess.TimeoutExpired: print ("命令超时" ) result = subprocess.run(['grep' , 'error' ], input ='line1: ok\nline2: error: file not found\nline3: ok' , capture_output=True , text=True ) print (result.stdout) result = subprocess.run(['echo' , '$HOME' ], capture_output=True , text=True , env={'HOME' : '/custom/home' , **dict (os.environ)})
6.2 Popen —— 底层 API 当需要与子进程交互(如流式读取输出)时,使用 Popen:
import subprocessproc = subprocess.Popen(['ping' , '-c' , '5' , 'google.com' ], stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True ) for line in proc.stdout: print (f"[实时] {line.strip()} " ) proc.wait() print (f"退出码: {proc.returncode} " )p1 = subprocess.Popen(['ls' , '-la' ], stdout=subprocess.PIPE, text=True ) p2 = subprocess.Popen(['grep' , '.py' ], stdin=p1.stdout, stdout=subprocess.PIPE, text=True ) p1.stdout.close() output, _ = p2.communicate() print (output)proc = subprocess.Popen(['python' , 'long_running_script.py' ], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL) print (f"子进程 PID: {proc.pid} " )
6.3 安全注意事项 永远不要使用 shell=True 拼接用户输入 ,这会导致命令注入漏洞:
user_file = "; rm -rf /" subprocess.run(f"ls {user_file} " , shell=True ) subprocess.run(['ls' , user_file])
七、json / csv / sqlite3 —— 数据序列化与持久化 7.1 json import jsondata = {'name' : 'Alice' , 'age' : 30 , 'skills' : ['Python' , 'C++' ], 'active' : True } json_str = json.dumps(data, indent=2 , ensure_ascii=False ) print (json_str)parsed = json.loads(json_str) print (parsed['name' ]) with open ('data.json' , 'w' , encoding='utf-8' ) as f: json.dump(data, f, indent=2 , ensure_ascii=False ) with open ('data.json' , 'r' , encoding='utf-8' ) as f: loaded = json.load(f) from datetime import datetime, dateimport jsonclass CustomEncoder (json.JSONEncoder): def default (self, obj ): if isinstance (obj, (datetime, date)): return obj.isoformat() if hasattr (obj, '__dict__' ): return obj.__dict__ return super ().default(obj) data = {'created_at' : datetime.now(), 'event' : 'start' } json_str = json.dumps(data, cls=CustomEncoder) print (json_str)def custom_decoder (dct ): """将 ISO 日期字符串转换回 datetime""" for key, value in dct.items(): if isinstance (value, str ) and 'T' in value: try : dct[key] = datetime.fromisoformat(value) except (ValueError, TypeError): pass return dct parsed = json.loads(json_str, object_hook=custom_decoder) print (type (parsed['created_at' ]))
7.2 csv import csvwith open ('data.csv' , 'r' , encoding='utf-8' ) as f: reader = csv.reader(f) header = next (reader) for row in reader: print (row) with open ('data.csv' , 'r' , encoding='utf-8' ) as f: reader = csv.DictReader(f) for row in reader: print (row['name' ], row['age' ]) rows = [ ['name' , 'age' , 'city' ], ['Alice' , '28' , 'Beijing' ], ['Bob' , '32' , 'Shanghai' ], ] with open ('output.csv' , 'w' , newline='' , encoding='utf-8' ) as f: writer = csv.writer(f) writer.writerows(rows) with open ('output.csv' , 'w' , newline='' , encoding='utf-8' ) as f: fieldnames = ['name' , 'age' , 'city' ] writer = csv.DictWriter(f, fieldnames=fieldnames) writer.writeheader() writer.writerow({'name' : 'Alice' , 'age' : '28' , 'city' : 'Beijing' }) reader = csv.reader(f, delimiter='\t' ) reader = csv.reader(f, quotechar='"' , quoting=csv.QUOTE_MINIMAL)
7.3 sqlite3 SQLite 是零配置、自包含的嵌入式关系数据库,Python 标准库自带 sqlite3 模块。它适合单用户场景、原型开发、桌面应用和移动端:
import sqlite3conn = sqlite3.connect('example.db' ) cursor = conn.cursor() cursor.execute(''' CREATE TABLE IF NOT EXISTS users ( id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT NOT NULL, email TEXT UNIQUE, created_at TEXT DEFAULT (datetime('now')) ) ''' )cursor.execute('INSERT INTO users (name, email) VALUES (?, ?)' , ('Alice' , 'alice@example.com' )) cursor.execute('INSERT INTO users (name, email) VALUES (?, ?)' , ('Bob' , 'bob@example.com' )) users = [('Charlie' , 'charlie@example.com' ), ('Diana' , 'diana@example.com' )] cursor.executemany('INSERT INTO users (name, email) VALUES (?, ?)' , users) conn.commit() cursor.execute('SELECT id, name, email FROM users WHERE name LIKE ?' , ('A%' ,)) for row in cursor.fetchall(): print (row) cursor.execute('SELECT * FROM users WHERE id = ?' , (1 ,)) user = cursor.fetchone() print (user)conn.row_factory = sqlite3.Row cursor = conn.cursor() cursor.execute('SELECT * FROM users' ) for row in cursor: print (dict (row)) conn.execute('BEGIN' ) try : cursor.execute('UPDATE users SET email = ? WHERE id = ?' , ('new@example.com' , 1 )) conn.commit() except Exception: conn.rollback() raise with conn: conn.execute('DELETE FROM users WHERE id = ?' , (3 ,)) cursor.close() conn.close()
八、logging —— 日志系统 Python 的 logging 模块提供了完整的日志框架,包含 Logger(记录器)、Handler(处理器)、Formatter(格式化器)和 Filter(过滤器)四大组件。
8.1 基本用法 import logginglogging.basicConfig( level=logging.INFO, format ='%(asctime)s [%(levelname)s] %(name)s: %(message)s' , datefmt='%Y-%m-%d %H:%M:%S' , handlers=[ logging.FileHandler('app.log' ), logging.StreamHandler(), ] ) logger = logging.getLogger(__name__) logger.debug('调试信息' ) logger.info('程序启动' ) logger.warning('警告信息' ) logger.error('错误信息' ) logger.critical('严重错误' )
8.2 日志级别 日志级别从低到高:
级别
数值
用途
DEBUG
10
详细诊断信息
INFO
20
程序正常运行信息
WARNING
30
潜在问题提醒
ERROR
40
因严重问题未能完成某功能
CRITICAL
50
整个程序无法继续运行
import logginglogger = logging.getLogger('myapp.database' ) logger.setLevel(logging.DEBUG) file_handler = logging.FileHandler('database.log' ) file_handler.setLevel(logging.WARNING) console_handler = logging.StreamHandler() console_handler.setLevel(logging.DEBUG) detailed_fmt = logging.Formatter( '%(asctime)s [%(levelname)-8s] %(name)s:%(lineno)d - %(message)s' , datefmt='%Y-%m-%d %H:%M:%S' ) simple_fmt = logging.Formatter('[%(levelname)-8s] %(message)s' ) file_handler.setFormatter(detailed_fmt) console_handler.setFormatter(simple_fmt) logger.addHandler(file_handler) logger.addHandler(console_handler) logger.propagate = False logger.debug('连接池大小: 10' ) logger.error('数据库连接失败: timeout' )
8.4 使用字典配置(dictConfig) 对于生产项目,推荐使用字典式配置,可以将配置存储为 JSON 或 YAML 文件:
import logging.configLOGGING_CONFIG = { 'version' : 1 , 'disable_existing_loggers' : False , 'formatters' : { 'verbose' : { 'format' : '%(asctime)s [%(levelname)s] %(name)s:%(lineno)d: %(message)s' , 'datefmt' : '%Y-%m-%d %H:%M:%S' , }, 'simple' : { 'format' : '[%(levelname)s] %(message)s' , }, }, 'handlers' : { 'console' : { 'class' : 'logging.StreamHandler' , 'level' : 'DEBUG' , 'formatter' : 'simple' , }, 'file' : { 'class' : 'logging.handlers.RotatingFileHandler' , 'level' : 'INFO' , 'formatter' : 'verbose' , 'filename' : 'app.log' , 'maxBytes' : 10 * 1024 * 1024 , 'backupCount' : 5 , }, }, 'loggers' : { 'myapp' : { 'handlers' : ['console' , 'file' ], 'level' : 'DEBUG' , 'propagate' : False , }, 'myapp.database' : { 'level' : 'WARNING' , }, }, 'root' : { 'handlers' : ['console' ], 'level' : 'WARNING' , }, } logging.config.dictConfig(LOGGING_CONFIG) logger = logging.getLogger('myapp' ) logger.info('应用启动' )
九、argparse —— 命令行参数解析 argparse 让 Python 脚本支持专业的命令行接口,包括位置参数、可选参数、子命令、类型验证和帮助信息。
9.1 基础用法 import argparseparser = argparse.ArgumentParser( description='处理图像文件并生成缩略图' , epilog='示例: python thumbnail.py -i input.jpg -s 200 --quality 85' ) parser.add_argument('input_file' , help ='输入图像文件路径' ) parser.add_argument('-o' , '--output' , default='output.jpg' , help ='输出文件路径(默认: output.jpg)' ) parser.add_argument('-s' , '--size' , type =int , default=128 , help ='缩略图尺寸(像素)' ) parser.add_argument('-q' , '--quality' , type =int , default=85 , choices=range (1 , 101 ), metavar='1-100' , help ='JPEG 质量 (1-100)' ) parser.add_argument('--grayscale' , action='store_true' , help ='转换为灰度图' ) parser.add_argument('-v' , '--verbose' , action='count' , default=0 , help ='详细输出(-v 详细, -vv 更详细)' ) args = parser.parse_args() print (f"输入: {args.input_file} " )print (f"输出: {args.output} , 尺寸: {args.size} px, 质量: {args.quality} " )print (f"灰度模式: {args.grayscale} " )print (f"详细程度: {args.verbose} " )
9.2 子命令 类似 git commit、git push 的嵌套命令结构:
import argparseparser = argparse.ArgumentParser(description='数据管理工具' ) subparsers = parser.add_subparsers(dest='command' , required=True ) import_parser = subparsers.add_parser('import' , help ='导入数据' ) import_parser.add_argument('source' , help ='数据源路径' ) import_parser.add_argument('--format' , choices=['csv' , 'json' , 'xml' ], default='csv' ) export_parser = subparsers.add_parser('export' , help ='导出数据' ) export_parser.add_argument('target' , help ='导出目标路径' ) export_parser.add_argument('--compress' , action='store_true' , help ='压缩输出' ) stats_parser = subparsers.add_parser('stats' , help ='显示统计信息' ) stats_parser.add_argument('--columns' , nargs='+' , help ='只统计指定列' ) args = parser.parse_args() if args.command == 'import' : print (f"导入 {args.source} ,格式 {args.format } " ) elif args.command == 'export' : print (f"导出到 {args.target} ,压缩={'是' if args.compress else '否' } " ) elif args.command == 'stats' : cols = args.columns or ['all' ] print (f"统计列: {cols} " )
十、dataclasses —— 减少样板代码 dataclass 装饰器(Python 3.7+)自动生成 __init__、__repr__、__eq__ 等方法,省去大量重复代码。它与 namedtuple 的主要区别:可变,可设置默认值,支持继承和方法定义。
10.1 基本使用 from dataclasses import dataclassfrom typing import List @dataclass class User : name: str age: int email: str = '' tags: List [str ] = None def __post_init__ (self ): """在 __init__ 之后调用,用于复杂验证和默认值处理""" if self .tags is None : self .tags = [] u1 = User('Alice' , 28 , 'alice@example.com' ) u2 = User('Alice' , 28 , 'alice@example.com' ) print (u1) print (u1 == u2) @dataclass(frozen=True ) class Point : x: float y: float p = Point(1.0 , 2.0 )
10.2 field 函数的配置 from dataclasses import dataclass, field@dataclass class Task : title: str priority: int = field(default=1 , metadata={'min' : 0 , 'max' : 10 }) tags: list = field(default_factory=list ) created_at: str = field(init=False ) def __post_init__ (self ): import datetime self .created_at = datetime.datetime.now().isoformat()
十一、typing —— 类型提示 11.1 基本类型注解 from typing import List , Dict , Set , Tuple , Optional , Union , Any , Callable def greet (name: str ) -> str : return f"Hello, {name} " def process (items: List [int ] ) -> Dict [str , int ]: return {"count" : len (items), "sum" : sum (items)} def find_user (user_id: int ) -> Optional [Dict [str , str ]]: if user_id == 1 : return {"name" : "Alice" } return None def parse (value: str ) -> Union [int , float , str ]: try : return int (value) except ValueError: try : return float (value) except ValueError: return value def apply (func: Callable [[int , int ], int ], a: int , b: int ) -> int : return func(a, b) print (apply(lambda x, y: x + y, 3 , 4 ))
11.2 泛型 —— TypeVar, Generic, Protocol from typing import TypeVar, Generic , ProtocolT = TypeVar('T' ) def first (items: List [T] ) -> T: return items[0 ] print (first([1 , 2 , 3 ])) print (first(['a' , 'b' ])) Number = TypeVar('Number' , int , float ) def add (a: Number, b: Number ) -> Number: return a + b K = TypeVar('K' ) V = TypeVar('V' ) class Pair (Generic [K, V]): def __init__ (self, key: K, value: V ): self .key = key self .value = value pair: Pair[str , int ] = Pair("age" , 28 )
11.3 Protocol —— 结构化子类型 Protocol 用于定义接口,而无需显式继承(类似 Go 的 interface):
from typing import Protocolclass SupportsClose (Protocol ): def close (self ) -> None : ... class FileReader : def close (self ) -> None : print ("Closing file" ) class SocketConnection : def close (self ) -> None : print ("Closing socket" ) def cleanup (resource: SupportsClose ) -> None : resource.close() cleanup(FileReader()) cleanup(SocketConnection())
十二、常用工具模块速查 12.1 os / os.path import osos.getcwd() os.chdir('/path' ) os.listdir('.' ) os.makedirs('a/b/c' , exist_ok=True ) os.rename('old' , 'new' ) os.remove('file.txt' ) os.rmdir('empty_dir' ) os.environ os.environ.get('HOME' ) os.path.join('a' , 'b' , 'c' ) os.path.exists('path' ) os.path.isfile('p' ) os.path.isdir('p' ) os.path.abspath('p' ) os.path.basename('/a/b/c.txt' ) os.path.dirname('/a/b/c.txt' ) os.path.split('/a/b/c.txt' ) os.path.splitext('f.txt' )
12.2 shutil —— 高级文件操作 import shutilshutil.copy('src' , 'dst' ) shutil.copy2('src' , 'dst' ) shutil.copytree('src_dir' , 'dst_dir' ) shutil.rmtree('dir' ) shutil.move('src' , 'dst' ) shutil.disk_usage('/' ) shutil.make_archive('backup' , 'zip' , 'src_dir' )
12.3 tempfile —— 临时文件 import tempfilewith tempfile.NamedTemporaryFile(mode='w' , suffix='.txt' , delete=True ) as f: f.write('Some data' ) print (f.name) with tempfile.TemporaryDirectory() as tmpdir: print (tmpdir)
12.4 hashlib —— 哈希摘要 import hashlibmd5 = hashlib.md5(b'hello' ).hexdigest() sha = hashlib.sha256(b'hello' ).hexdigest() hasher = hashlib.sha256() with open ('large_file.bin' , 'rb' ) as f: while chunk := f.read(8192 ): hasher.update(chunk) digest = hasher.hexdigest() print (hashlib.algorithms_available)
12.5 base64 import base64encoded = base64.b64encode(b'hello world' ) print (encoded.decode())decoded = base64.b64decode(encoded) print (decoded) encoded_url = base64.urlsafe_b64encode(b'\xfb\xff' ) print (encoded_url)
12.6 enum —— 枚举类型 from enum import Enum, IntEnum, autoclass Color (Enum ): RED = 1 GREEN = 2 BLUE = 3 print (Color.RED) print (Color.RED.name) print (Color.RED.value) print (Color(2 )) class Status (IntEnum ): PENDING = auto() RUNNING = auto() SUCCESS = auto() if Status.SUCCESS == 3 : print ("成功" )
十三、总结 Python 标准库覆盖了绝大多数日常开发需求,熟练掌握这些模块可以:
减少第三方依赖 :大多数场景下 collections + itertools + functools 的组合已足够强大。
写出地道的 Python :pathlib 替代字符串路径拼接,dataclass 替代手动 __init__,lru_cache 替代手写缓存。
正确处理并发 :理解 ThreadPoolExecutor vs ProcessPoolExecutor 的选择依据(I/O 密集 vs CPU 密集)。
安全的子进程调用 :使用 subprocess.run 的列表参数形式,避免 shell=True 注入风险。
结构化日志 :利用 Logger/Handler/Formatter 层次结构,配合 dictConfig 实现灵活的生产环境日志管理。
类型驱动开发 :通过 typing 模块配合 mypy / pyright 实现静态类型检查,减少运行时错误。
标准库是 Python 生态的基石,也是理解 Python 哲学(简单优于复杂、显式优于隐式)的最佳入口。
参考资料 :