Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
人
人工智能系统实战第三期
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
liyinkai
人工智能系统实战第三期
Commits
f2127926
Commit
f2127926
authored
Sep 25, 2023
by
前钰
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Delete question.ipynb
parent
282d7555
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
1 additions
and
88 deletions
+1
-88
question.ipynb
人工智能系统实战第三期/实战代码/基于集成学习的Amazon用户评论质量预测/question.ipynb
+1
-88
No files found.
人工智能系统实战第三期/实战代码/基于集成学习的Amazon用户评论质量预测/question.ipynb
deleted
100644 → 0
View file @
282d7555
{
++ /dev/null
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 基于集成学习的 Amazon 用户评论质量预测\n",
"\n",
"## 案例简介\n",
"在进行线上商品挑选时,评论往往是我们十分关注的一个方面。然而目前电商网站的评论质量参差不齐,甚至有水军刷好评或者恶意差评的情况出现,严重影响了顾客的购物体验。因此,对于评论质量的预测成为电商平台越来越关注的话题,如果能自动对评论质量进行评估,就能根据预测结果避免展现低质量的评论。本案例中我们将基于集成学习的方法对Amazon现实场景中的评论质量进行预测。\n",
"\n",
"## 作业说明\n",
"本案例中需要大家完成两种集成学习算法的实现(Bagging、AdaBoost),其中基分类器要求使用 SVM 和决策树两种,因此,一共需要对比四组结果(AUC 作为评价指标):\n",
"1.Bagging + SVM\n",
"\n",
"2.Bagging + 决策树\n",
"\n",
"3.AdaBoost + SVM\n",
"\n",
"4.AdaBoost + 决策树\n",
"\n",
"注意集成学习的核心算法需要手动进行实现,基分类器可以调库。\n",
"\n",
"### 基本作业(80分)\n",
"1.根据数据格式设计特征的表示\n",
"\n",
"2.汇报不同组合下得到的 AUC\n",
"\n",
"3.结合不同集成学习算法的特点分析结果之间的差异\n",
"\n",
"(使用 sklearn 等第三方库的集成学习算法会酌情扣分)\n",
"\n",
"### 扩展作业(20分)\n",
"1.尝试其他基分类器(如 k-NN、朴素贝叶斯,神经网络)分析不同特征的影响\n",
"\n",
"2.分析集成学习算法参数的影响"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"train_df = pd.read_csv('train.xlsx', sep='\\t')\n",
"test_df = pd.read_csv('test.xlsx', sep='\\t',index_col=False)\n",
"train_df.describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"分析数据集\n",
"\n",
"reviewID是用户ID\n",
"\n",
"asin是商品ID\n",
"\n",
"reviewText是评论内容\n",
"\n",
"overall是用户对商品的打分\n",
"\n",
"votes_up是认为评论有用的点赞数\n",
"\n",
"votes_all是该评论得到的总点赞数\n",
"\n",
"label是标签"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "pytorch_gpu",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.11.3"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment