图书介绍

大数据分析 R语言实现pdf电子书版本下载

大数据分析  R语言实现
  • (英)西蒙?沃克威克 著
  • 出版社: 南京:东南大学出版社
  • ISBN:9787564173616
  • 出版时间:2017
  • 标注页数:490页
  • 文件大小:61MB
  • 文件页数:503页
  • 主题词:程序语言-程序设计-英文

PDF下载


点此进入-本书在线PDF格式电子书下载【推荐-云解压-方便快捷】直接下载PDF格式图书。移动端-PC端通用
种子下载[BT下载速度快] 温馨提示:(请使用BT下载软件FDM进行下载)软件下载地址页 直链下载[便捷但速度慢]   [在线试读本书]   [在线获取解压码]

下载说明

大数据分析 R语言实现PDF格式电子书版下载

下载的文件为RAR压缩包。需要使用解压软件进行解压得到PDF格式图书。

建议使用BT下载工具Free Download Manager进行下载,简称FDM(免费,没有广告,支持多平台)。本站资源全部打包为BT种子。所以需要使用专业的BT下载软件进行下载。如 BitComet qBittorrent uTorrent等BT下载工具。迅雷目前由于本站不是热门资源。不推荐使用!后期资源热门了。安装了迅雷也可以迅雷进行下载!

(文件页数 要大于 标注页数,上中下等多册电子书除外)

注意:本站所有压缩包均有解压码: 点击下载压缩包解压工具

图书目录

Preface 1

Chapter 1:The Era of Big Data 7

Big Data-The monster re-defined 7

Big Data toolbox-dealing with the giant 11

Hadoop-the elephant in the room 12

Databases 15

Hadoop Spark-ed up 16

R-The unsung Big Data hero 17

Summary 24

Chapter 2:Introduction to R Programming Language and Statistical Environment 25

Learning R 25

Revisiting R basics 28

Getting R and RStudio ready 28

Setting the URLs to R repositories 30

R data structures 32

Vectors 32

Scalars 35

Matrices 35

Arrays 37

Data frames 38

Lists 41

Exporting R data objects 42

Applied data science with R 47

Importing data from different formats 48

Exploratory Data Analysis 50

Data aggregations and contingency tables 53

Hypothesis testing and statistical inference 56

Tests of differences 57

Independent t-test example(with power and effect size estimates) 57

ANOVA example 60

Tests of relationships 63

An example of Pearson's r correlations 63

Multiple regression example 65

Data visualization packages 70

Summary 71

Chapter 3:Unleashing the Power of R from Within 73

Traditional limitations of R 74

Out-of-memory data 74

Processing speed 75

To the memory limits and beyond 76

Data transformations and aggregations with the ff and ffbase packages 76

Generalized linear models with the ff and ffbase packages 87

Logistic regression example with ffbase and biglm 89

Expanding memory with the bigmemory package 97

Parallel R 106

From bigmemory to faster computations 107

An apply()example with the big.matrix object 108

A for()loop example with the ffdf object 108

Using apply()and for()loop examples on a data.frame 109

A parallel package example 110

A foreach package example 113

The future of parallel processing in R 115

Utilizing Graphics Processing Units with R 115

Multi-threading with Microsoft R Open distribution 117

Parallel machine learning with H2O and R 118

Boosting R performance with the data.table package and other tools 118

Fast data import and manipulation with the data.table package 118

Data import with data.table 119

Lightning-fast subsets and aggregations on data.table 120

Chaining,more complex aggregations,and pivot tables with data.table 123

Writing better R code 126

Summary 127

Chapter 4:Hadoop and MapReduce Framework for R 129

Hadoop architecture 130

Hadoop Distributed File System 130

MapReduce framework 131

A simple MapReduce word count example 132

Other Hadoop native tools 134

Learning Hadoop 136

A single-node Hadoop in Cloud 137

Deploying Hortonworks Sandbox on Azure 138

A word count example in Hadoop using Java 159

A word count example in Hadoop using the R language 169

RStudio Server on a Linux RedHat/CentOS virtual machine 169

Installing and configuring RHadoop packages 177

HDFS management and MapReduce in R-a word count example 179

HDInsight-a multi-node Hadoop cluster on Azure 194

Creating your first HDInsight cluster 194

Creating a new Resource Group 195

Deploying a Virtual Network 197

Creating a Network Security Group 200

Setting up and configuring an HDInsight cluster 203

Starting the cluster and exploring Ambari 211

Connecting to the HDInsight cluster and installing RStudio Server 215

Adding a new inbound security rule for port 8787 218

Editing the Virtual Network's public IP address for the head node 221

Smart energy meter readings analysis example-using R on HDInsight cluster 229

Summary 241

Chapter 5:R with Relational Database Management Systems(RDBMSs) 243

Relational Database Management Systems(RDBMSs) 244

A short overview of used RDBMSs 244

Structured Query Language(SQL) 245

SQLite with R 247

Preparing and importing data into a local SQLite database 248

Connecting to SQLite from RStudio 250

MariaDB with R on a Amazon EC2 instance 255

Preparing the EC2 instance and RStudio Server for use 255

Preparing MariaDB and data for use 257

Working with MariaDB from RStudio 266

PostgreSQL with R on Amazon RDS 281

Launching an Amazon RDS database instance 281

Preparing and uploading data to Amazon RDS 290

Remotely querying PostgreSQL on Amazon RDS from RStudio 304

Summary 314

Chapter 6:R with Non-Relational(NoSQL)Databases 315

Introduction to NoSQL databases 315

Review of leading non-relational databases 316

MongoDB with R 319

Introduction to MongoDB 319

MongoDB data models 319

Installing MongoDB with R on Amazon EC2 322

Processing Big Data using MongoDB with R 325

Importing data into MongoDB and basic MongoDB commands 326

MongoDB with R using the rmongodb package 333

MongoDB with R using the RMongo package 346

MongoDB with R using the mongolite package 350

HBase with R 355

Azure HDInsight with HBase and RStudio Server 355

Importing the data to HDFS and HBase 363

Reading and querying HBase using the rhbase package 367

Summary 372

Chapter 7:Faster than Hadoop-Spark with R 373

Spark for Big Data analytics 374

Spark with R on a multi-node HDInsight cluster 375

Launching HDInsight with Spark and R/RStudio 375

Reading the data into HDFS and Hive 383

Getting the data into HDFS 385

Importing data from HDFS to Hive 386

Bay Area Bike Share analysis using SparkR 393

Summary 411

Chapter 8:Machine Learning Methods for Big Data in R 413

What is machine learning? 414

Supervised and unsupervised machine learning methods 415

Classification and clustering algorithms 416

Machine learning methods with R 417

Big Data machine learning tools 418

GLM example with Spark and R on the HDInsight cluster 419

Preparing the Spark cluster and reading the data from HDFS 419

Logistic regression in Spark with R 425

Naive Bayes with H2O on Hadoop with R 437

Running an H2O instance on Hadoop with R 437

Reading and exploring the data in H2O 441

Naive Bayes on H2O with R 446

Neural Networks with H2O on Hadoop with R 458

How do Neural Networks work? 458

Running Deep Learning models on H2O 461

Summary 469

Chapter 9:The Future of R-Big,Fast,and Smart Data 471

The current state of Big Data analytics with R 471

Out-of-memory data on a single machine 471

Faster data processing with R 473

Hadoop with R 475

Spark with R 476

R with databases 477

Machine learning with R 478

The future of R 478

Big Data 479

Fast data 480

Smart data 481

Where to go next 482

Summary 482

Index 483

精品推荐