Commit ea6709d0 by 前钰

Upload New File

parent 9259429f
{
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"本次作业以垃圾邮件分类任务为基础,要求提取文本特征并使用朴素贝叶斯算法进行垃圾邮件识别(调用已有工具包或自行实现)。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 任务介绍\n",
"电子邮件是互联网的一项重要服务,在大家的学习、工作和生活中会广泛使用。但是大家的邮箱常常被各种各样的垃圾邮件填充了。有统计显示,每天互联网上产生的垃圾邮件有几百亿近千亿的量级。因此,对电子邮件服务提供商来说,垃圾邮件过滤是一项重要功能。而朴素贝叶斯算法在垃圾邮件识别任务上一直表现非常好,至今仍然有很多系统在使用朴素贝叶斯算法作为基本的垃圾邮件识别算法。\n",
"\n",
"\n",
"\n",
"本次实验\n",
"\n",
"基本作业(80分):\n",
"1. 提取正文部分的文本特征;\n",
"2. 划分训练集和测试集 \n",
"3. 使用朴素贝叶斯算法完成垃圾邮件的分类与预测,要求计算测试集准确率Accuracy、精准率Precision、召回率Recall\n",
"4. 对比特征数目(词表大小)对模型效果的影响;\n",
"5. 提交代码和实验报告。\n",
"\n",
"扩展作业(80分):\n",
"1. 尝试使用邮件头信息协助判断垃圾邮件 \n",
"2. 尝试自行实现朴素贝叶斯算法细节; "
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.8.5 64-bit (virtualenv)",
"name": "python385jvsc74a57bd007efdcd4b820c98a756949507a4d29d7862823915ec7477944641bea022f4f62"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment