Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
人
人工智能系统实战第三期
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Charles
人工智能系统实战第三期
Commits
7bdcf60e
Commit
7bdcf60e
authored
Sep 25, 2023
by
前钰
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Upload New File
parent
63bf3a3c
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
250 additions
and
0 deletions
+250
-0
question.ipynb
人工智能系统实战第三期/实战代码/机器学习项目实战/基于回归分析的大学综合得分预测/question.ipynb
+250
-0
No files found.
人工智能系统实战第三期/实战代码/机器学习项目实战/基于回归分析的大学综合得分预测/question.ipynb
0 → 100644
View file @
7bdcf60e
{
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 基于回归分析的大学综合得分预测\n",
"\n",
"## 1.案例简介\n",
"\n",
"大学排名的问题具有显著的重要性,同时也充满了挑战和争议。一所大学的全方位能力包括科研、师资和学生等多个因素。现在,全球有多达百家的评估机构致力于评估并排列大学的综合评分,然而,这些机构的评分结果经常存在不一致的情况。在这些机构当中,世界大学排名中心(Center for World University Rankings,简称CWUR)以其评估教育质量、校友就业、研究产出和引用,而非依赖调查和大学提交的数据的方式而著名,其影响力十分显著。\n",
"在本项目中,我们将依据CWUR提供的全球知名大学的各项排名(包括师资和科研等)来进行工作。一方面,我们将通过数据可视化来探究各个大学的独特性。另一方面,我们希望利用机器学习模型(例如线性回归)来预测大学的综合得分。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.作业说明\n",
"我们将使用Kaggle的数据集,利用线性回归模型,依据大学各项排名的指标来预测其综合得分。可以使用 sk-learn 等第三方库,不要求自己实现线性回归.\n",
"\n",
"基础任务(80分):\n",
"- 1.观察和可视化数据,揭示数据的特性。\n",
"- 2.训练集和测试集应按照7:3的比例随机划分,采用RMSE(均方根误差)作为模型的评估标准,计算并获取测试集上的线性回归模型的RMSE值。\n",
"- 3.对线性回归模型中的系数进行分析。\n",
"- 4.尝试使用其他类型的回归模型,并比较其效果。\n",
"\n",
"进阶任务(20分):\n",
"- 1.尝试将地区的离散特征融入到线性回归模型中,然后比较并分析结果。\n",
"- 2.利用R2指标和VIF指标进行模型评价和特征筛选, 尝试是否可以增加模型精度。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. 数据展示"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" <th>1</th>\n",
" <th>2</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>world_rank</th>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>institution</th>\n",
" <td>Harvard University</td>\n",
" <td>Massachusetts Institute of Technology</td>\n",
" <td>Stanford University</td>\n",
" </tr>\n",
" <tr>\n",
" <th>region</th>\n",
" <td>USA</td>\n",
" <td>USA</td>\n",
" <td>USA</td>\n",
" </tr>\n",
" <tr>\n",
" <th>national_rank</th>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>quality_of_education</th>\n",
" <td>7</td>\n",
" <td>9</td>\n",
" <td>17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>citations</th>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>broad_impact</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>patents</th>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" <td>15</td>\n",
" </tr>\n",
" <tr>\n",
" <th>score</th>\n",
" <td>100.0</td>\n",
" <td>91.67</td>\n",
" <td>89.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>year</th>\n",
" <td>2012</td>\n",
" <td>2012</td>\n",
" <td>2012</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>14 rows × 3 columns</p>\n",
"</div>"
],
"text/plain": [
" 0 \\\n",
"world_rank 1 \n",
"institution Harvard University \n",
"region USA \n",
"national_rank 1 \n",
"quality_of_education 7 \n",
"... ... \n",
"citations 1 \n",
"broad_impact NaN \n",
"patents 5 \n",
"score 100.0 \n",
"year 2012 \n",
"\n",
" 1 \\\n",
"world_rank 2 \n",
"institution Massachusetts Institute of Technology \n",
"region USA \n",
"national_rank 2 \n",
"quality_of_education 9 \n",
"... ... \n",
"citations 4 \n",
"broad_impact NaN \n",
"patents 1 \n",
"score 91.67 \n",
"year 2012 \n",
"\n",
" 2 \n",
"world_rank 3 \n",
"institution Stanford University \n",
"region USA \n",
"national_rank 3 \n",
"quality_of_education 17 \n",
"... ... \n",
"citations 2 \n",
"broad_impact NaN \n",
"patents 15 \n",
"score 89.5 \n",
"year 2012 \n",
"\n",
"[14 rows x 3 columns]"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"import numpy as np \n",
" \n",
"pd.set_option('display.max_rows', 10) # 设置显示最大行 \n",
"np.set_printoptions(threshold=10)\n",
"\n",
"\n",
"data_df = pd.read_csv('./cwurData.csv') # 读入 csv 文件为 pandas 的 DataFrame\n",
"data_df.head(3).T # 观察前几列并转置方便观察"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"---\n",
"---\n",
"# <center>答案区</center>"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tips\n",
"\n",
"# 1. 数据可视化和分析\n",
"# 2. 注意特征工程,特征的相关性分析,特征的组合,重建和舍弃\n",
"# 3. 注意不同回归模型的选择,模型的参数调整,模型的比较"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment