Commit f2127926 by 前钰

Delete question.ipynb

parent 282d7555
{ ++ /dev/null
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 基于集成学习的 Amazon 用户评论质量预测\n",
"\n",
"## 案例简介\n",
"在进行线上商品挑选时,评论往往是我们十分关注的一个方面。然而目前电商网站的评论质量参差不齐,甚至有水军刷好评或者恶意差评的情况出现,严重影响了顾客的购物体验。因此,对于评论质量的预测成为电商平台越来越关注的话题,如果能自动对评论质量进行评估,就能根据预测结果避免展现低质量的评论。本案例中我们将基于集成学习的方法对Amazon现实场景中的评论质量进行预测。\n",
"\n",
"## 作业说明\n",
"本案例中需要大家完成两种集成学习算法的实现(Bagging、AdaBoost),其中基分类器要求使用 SVM 和决策树两种,因此,一共需要对比四组结果(AUC 作为评价指标):\n",
"1.Bagging + SVM\n",
"\n",
"2.Bagging + 决策树\n",
"\n",
"3.AdaBoost + SVM\n",
"\n",
"4.AdaBoost + 决策树\n",
"\n",
"注意集成学习的核心算法需要手动进行实现,基分类器可以调库。\n",
"\n",
"### 基本作业(80分)\n",
"1.根据数据格式设计特征的表示\n",
"\n",
"2.汇报不同组合下得到的 AUC\n",
"\n",
"3.结合不同集成学习算法的特点分析结果之间的差异\n",
"\n",
"(使用 sklearn 等第三方库的集成学习算法会酌情扣分)\n",
"\n",
"### 扩展作业(20分)\n",
"1.尝试其他基分类器(如 k-NN、朴素贝叶斯,神经网络)分析不同特征的影响\n",
"\n",
"2.分析集成学习算法参数的影响"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"train_df = pd.read_csv('train.xlsx', sep='\\t')\n",
"test_df = pd.read_csv('test.xlsx', sep='\\t',index_col=False)\n",
"train_df.describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"分析数据集\n",
"\n",
"reviewID是用户ID\n",
"\n",
"asin是商品ID\n",
"\n",
"reviewText是评论内容\n",
"\n",
"overall是用户对商品的打分\n",
"\n",
"votes_up是认为评论有用的点赞数\n",
"\n",
"votes_all是该评论得到的总点赞数\n",
"\n",
"label是标签"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "pytorch_gpu",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.11.3"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment