图书介绍

Mining of massive datasetspdf电子书版本下载

Mining of massive datasets
  • Anand Rajaraman ; Jeffrey D. Ullman 著
  • 出版社: Cambridge University Press
  • ISBN:1107015357
  • 出版时间:2012
  • 标注页数:316页
  • 文件大小:39MB
  • 文件页数:328页
  • 主题词:

PDF下载


点此进入-本书在线PDF格式电子书下载【推荐-云解压-方便快捷】直接下载PDF格式图书。移动端-PC端通用
种子下载[BT下载速度快] 温馨提示:(请使用BT下载软件FDM进行下载)软件下载地址页 直链下载[便捷但速度慢]   [在线试读本书]   [在线获取解压码]

下载说明

Mining of massive datasetsPDF格式电子书版下载

下载的文件为RAR压缩包。需要使用解压软件进行解压得到PDF格式图书。

建议使用BT下载工具Free Download Manager进行下载,简称FDM(免费,没有广告,支持多平台)。本站资源全部打包为BT种子。所以需要使用专业的BT下载软件进行下载。如 BitComet qBittorrent uTorrent等BT下载工具。迅雷目前由于本站不是热门资源。不推荐使用!后期资源热门了。安装了迅雷也可以迅雷进行下载!

(文件页数 要大于 标注页数,上中下等多册电子书除外)

注意:本站所有压缩包均有解压码: 点击下载压缩包解压工具

图书目录

1 Data Mining 1

1.1 What is Data Mining? 1

1.2 Statistical Limits on Data Mining 4

1.3 Things Useful to Know 7

1.4 Outline of the Book 15

1.5 Summary of Chapter 1 16

1.6 References for Chapter 1 17

2 Large-Scale File Systems and Map-Reduce 18

2.1 Distributed File Systems 18

2.2 Map-Reduce 21

2.3 Algorithms Using Map-Reduce 26

2.4 Extensions to Map-Reduce 37

2.5 Efficiency of Cluster-Computing Algorithms 42

2.6 Summary of Chapter 2 49

2.7 References for Chapter 2 51

3 Finding Similar Items 53

3.1 Applications of Near-Neighbor Search 53

3.2 Shingling of Documents 57

3.3 Similarity-Preserving Summaries of Sets 60

3.4 Locality-Sensitive Hashing for Documents 67

3.5 Distance Measures 71

3.6 The Theory of Locality-Sensitive Functions 77

3.7 LSH Families for Other Distance Measures 83

3.8 Applications of Locality-Sensitive Hashing 88

3.9 Methods for High Degrees of Similarity 96

3.10 Summary of Chapter 3 104

3.11 References for Chapter 3 106

4 Mining Data Streams 108

4.1 The Stream Data Model 108

4.2 Sampling Data in a Stream 112

4.3 Filtering Streams 115

4.4 Counting Distinct Elements in a Stream 118

4.5 Estimating Moments 122

4.6 Counting Ones in a Window 127

4.7 Decaying Windows 133

4.8 Summary of Chapter 4 136

4.9 References for Chapter 4 137

5 Link Analysis 139

5.1 PageRank 139

5.2 Efficient Computation of PageRank 153

5.3 Topic-Sensitive PageRank 159

5.4 Link Spam 163

5.5 Hubs and Authorities 167

5.6 Summary of Chapter 5 172

5.7 References for Chapter 5 175

6 Frequent Itemsets 176

6.1 The Market-Basket Model 176

6.2 Market Baskets and the A-Priori Algorithm 183

6.3 Handling Larger Datasets in Main Memory 192

6.4 Limited-Pass Algorithms 199

6.5 Counting Frequent Items in a Stream 205

6.6 Summary of Chapter 6 209

6.7 References for Chapter 6 211

7 Clustering 213

7.1 Introduction to Clustering Techniques 213

7.2 Hierarchical Clustering 217

7.3 K-means Algorithms 226

7.4 The CURE Algorithm 234

7.5 Clustering in Non-Euclidean Spaces 237

7.6 Clustering for Streams and Parallelism 241

7.7 Summary of Chapter 7 247

7.8 References for Chapter 7 250

8 Advertising on the Web 252

8.1 Issues in On-Line Advertising 252

8.2 On-Line Algorithms 255

8.3 The Matching Problem 258

8.4 The Adwords Problem 261

8.5 Adwords Implementation 270

8.6 Summary of Chapter 8 273

8.7 References for Chapter 8 275

9 Recommendation Systems 277

9.1 A Model for Recommendation Systems 277

9.2 Content-Based Recommendations 281

9.3 Collaborative Filtering 291

9.4 Dimensionality Reduction 297

9.5 The NetFlix Challenge 305

9.6 Summary of Chapter 9 306

9.7 References for Chapter 9 308

Index 310

精品推荐