当前位置:网站首页>李沐 《动手学深度学习》学习笔记 (4)第一章 预备知识 第二节 数据预处理
李沐 《动手学深度学习》学习笔记 (4)第一章 预备知识 第二节 数据预处理
2022-07-21 05:12:00 【Artificial Idiots】
第二节 数据预处理
1.2.1 读取数据集
创建简易的数据集
import os
os.makedirs(os.path.join('..', 'Chapter1', 'data'), exist_ok=True)
data_file = os.path.join('..', 'Chapter1', 'data', 'house_tiny.csv')
with open(data_file, 'w') as f:
f.write('NumRooms,Alley,Price\n') # 列名
f.write('NA,Pave,127500\n') # 每⾏表⽰⼀个数据样本
f.write('2,FUCK,106000\n')
f.write('4,NA,178100\n')
f.write('NA,NA,140000\n')
读取数据集
import pandas as pd
data = pd.read_csv(data_file)
print(data)
NumRooms Alley Price
0 NaN Pave 127500
1 2.0 FUCK 106000
2 4.0 NaN 178100
3 NaN NaN 140000
1.2.2 处理缺失值
#pandas对于数据的位置索引iloc,相当于python中的直接对多维数组的循秩访问
inputs, outputs = data.iloc[:, 0:2], data.iloc[:, 2]
inputs = inputs.fillna(inputs.mean()) #用同一列中的均值替换Nan,只对连续的值有效
print(inputs)
NumRooms Alley
0 3.0 Pave
1 2.0 FUCK
2 4.0 NaN
3 3.0 NaN
#对于离散的值,将Nan和每一种数据都当成一个独立的列表示
#dummy:仿制品
inputs = pd.get_dummies(inputs, dummy_na = True)
print(inputs)
NumRooms Alley_FUCK Alley_Pave Alley_nan
0 3.0 0 1 0
1 2.0 1 0 0
2 4.0 0 0 1
3 3.0 0 0 1
1.2.3 转换成张量格式
from mxnet import np
X, y = np.array(inputs.values), np.array(outputs.values)
X, y
(array([[3., 0., 1., 0.],
[2., 1., 0., 0.],
[4., 0., 0., 1.],
[3., 0., 0., 1.]], dtype=float64),
array([127500, 106000, 178100, 140000], dtype=int64))
边栏推荐
- PostgreSQL database master-slave deployment
- [论文翻译][2015][28]Bayesian Estimation of the DINA Model With Gibbs Sampling(基于Gibbs采样的DINA模型贝叶斯参数估计方法)
- Install PostgreSQL on centos7
- Detailed explanation of SQL Server index Foundation___ Concept and principle
- Wonderful journey of quantum mechanics - Operator / Schrodinger equation / probability current density
- Actual combat of flutter - customized keyboard (I)
- Power integrity from an AC Perspective
- Partition tables in azure synapse Analytics (dedicated SQL pool)
- ACmix:卷积与self-Attention的融合
- MySQL online upgrade scheme
猜你喜欢
随机推荐
Actual combat of flutter - customized keyboard (I)
01 learn how to understand the SQL server execution plan - basic knowledge
Your first Jenkins project, start here
Past and present life of signal reflection
Impala-shell exports the more than 9 million level table on kudu (below)_ Complete transmission
Detailed explanation of SQL Server index Foundation___ Concept and principle
Overview of DTS GIC interrupt controller
How to judge code quality
耦合深度自动编码器实现单图像超分辨率 论文解读(翻译)Coupled autoencoder for single image super-resolution
MySQL online upgrade scheme
Basic concepts and internal principles of dart
基于EasyCV复现DETR和DAB-DETR,Object Query的正确打开方式
Wonderful journey of quantum mechanics - Operator / Schrodinger equation / probability current density
Flutter's learning path - Summary
一文搞懂静态库/动态库链接问题
高质量文章导航
005_SS_ Palette Image-to-Image Diffusion Models
Actual combat of shutter statefulwidget
【使用Kotlin编写您编写的第一个程序】
Swin_Transformer_minivit代码解读