图书介绍

Mining of massive datasetspdf电子书版本下载

Anand Rajaraman ; Jeffrey D. Ullman 著
出版社： Cambridge University Press
ISBN：1107015357
出版时间：2012
标注页数：316页
文件大小：39MB
文件页数：328页
主题词：

PDF下载

点此进入-本书在线PDF格式电子书下载【推荐-云解压-方便快捷】直接下载PDF格式图书。移动端-PC端通用
种子下载[BT下载速度快] 温馨提示：（请使用BT下载软件FDM进行下载）软件下载地址页直链下载[便捷但速度慢] [在线试读本书] [在线获取解压码]

点击复制MD5值：3562735dbeb78da26e1e98186164cb48

下载说明

Mining of massive datasetsPDF格式电子书版下载

下载的文件为RAR压缩包。需要使用解压软件进行解压得到PDF格式图书。

点击复制85GB完整离线版磁力链接到迅雷FDM等BT下载工具进行下载详情点击-查看共享计划

建议使用BT下载工具Free Download Manager进行下载,简称FDM(免费,没有广告,支持多平台）。本站资源全部打包为BT种子。所以需要使用专业的BT下载软件进行下载。如 BitComet qBittorrent uTorrent等BT下载工具。迅雷目前由于本站不是热门资源。不推荐使用！后期资源热门了。安装了迅雷也可以迅雷进行下载！

（文件页数要大于标注页数，上中下等多册电子书除外）

注意：本站所有压缩包均有解压码： 点击下载压缩包解压工具

图书目录

1 Data Mining 1

1.1 What is Data Mining？ 1

1.2 Statistical Limits on Data Mining 4

1.3 Things Useful to Know 7

1.4 Outline of the Book 15

1.5 Summary of Chapter 1 16

1.6 References for Chapter 1 17

2 Large-Scale File Systems and Map-Reduce 18

2.1 Distributed File Systems 18

2.2 Map-Reduce 21

2.3 Algorithms Using Map-Reduce 26

2.4 Extensions to Map-Reduce 37

2.5 Efficiency of Cluster-Computing Algorithms 42

2.6 Summary of Chapter 2 49

2.7 References for Chapter 2 51

3 Finding Similar Items 53

3.1 Applications of Near-Neighbor Search 53

3.2 Shingling of Documents 57

3.3 Similarity-Preserving Summaries of Sets 60

3.4 Locality-Sensitive Hashing for Documents 67

3.5 Distance Measures 71

3.6 The Theory of Locality-Sensitive Functions 77

3.7 LSH Families for Other Distance Measures 83

3.8 Applications of Locality-Sensitive Hashing 88

3.9 Methods for High Degrees of Similarity 96

3.10 Summary of Chapter 3 104

3.11 References for Chapter 3 106

4 Mining Data Streams 108

4.1 The Stream Data Model 108

4.2 Sampling Data in a Stream 112

4.3 Filtering Streams 115

4.4 Counting Distinct Elements in a Stream 118

4.5 Estimating Moments 122

4.6 Counting Ones in a Window 127

4.7 Decaying Windows 133

4.8 Summary of Chapter 4 136

4.9 References for Chapter 4 137

5 Link Analysis 139

5.1 PageRank 139

5.2 Efficient Computation of PageRank 153

5.3 Topic-Sensitive PageRank 159

5.4 Link Spam 163

5.5 Hubs and Authorities 167

5.6 Summary of Chapter 5 172

5.7 References for Chapter 5 175

6 Frequent Itemsets 176

6.1 The Market-Basket Model 176

6.2 Market Baskets and the A-Priori Algorithm 183

6.3 Handling Larger Datasets in Main Memory 192

6.4 Limited-Pass Algorithms 199

6.5 Counting Frequent Items in a Stream 205

6.6 Summary of Chapter 6 209

6.7 References for Chapter 6 211

7 Clustering 213

7.1 Introduction to Clustering Techniques 213

7.2 Hierarchical Clustering 217

7.3 K-means Algorithms 226

7.4 The CURE Algorithm 234

7.5 Clustering in Non-Euclidean Spaces 237

7.6 Clustering for Streams and Parallelism 241

7.7 Summary of Chapter 7 247

7.8 References for Chapter 7 250

8 Advertising on the Web 252

8.1 Issues in On-Line Advertising 252

8.2 On-Line Algorithms 255

8.3 The Matching Problem 258

8.4 The Adwords Problem 261

8.5 Adwords Implementation 270

8.6 Summary of Chapter 8 273

8.7 References for Chapter 8 275

9 Recommendation Systems 277

9.1 A Model for Recommendation Systems 277

9.2 Content-Based Recommendations 281

9.3 Collaborative Filtering 291

9.4 Dimensionality Reduction 297

9.5 The NetFlix Challenge 305

9.6 Summary of Chapter 9 306

9.7 References for Chapter 9 308

Index 310