기능목록>검색 : NDDL 포털

MARC 닫기

LDR
05941cam a2200661Ii 4500
001
000000532826
005
20210114162023
006
m d
007
cr unu||||||||
008
190509s2019 enka o 000 0 eng d
015
▼a GBB995016
▼2 bnb
016
▼a 019365492
▼2 Uk
019
▼a 1091701284
▼a 1096526626
020
▼a 1838648836
020
▼a 9781838648831
▼q (electronic bk.)
020
▼z 9781838644130
035
▼a 2094759
▼b (N$T)
035
▼a  (OCoLC)1100643398
▼z  (OCoLC)1091701284
▼z  (OCoLC)1096526626
037
▼a CL0501000047
▼b Safari Books Online
040
▼a  UMI
▼b  eng
▼e  rda
▼e  pn
▼c  UMI
▼d  TEFOD
▼d  EBLCP
▼d  MERUC
▼d  UKMGB
▼d  OCLCF
▼d  YDX
▼d  UKAHL
▼d  OCLCQ
▼d  N$T
▼d  248023
050
▼a QA76.73.S59
082
▼a 004.2
▼2 23
100
▼a Lai, Rudy,
▼e author.
245
▼a  Hands-on big data analytics with PySpark:
▼b  analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs /:
▼c  Rudy Lai, Bartłomiej Potaczek.
260
▼a  Birmingham, UK:
▼b  Packt Publishing,
▼c  2019.
300
▼a 1 online resource:
▼b illustrations.
336
▼a  text
▼b  txt
▼2  rdacontent
337
▼a  computer
▼b  c
▼2  rdamedia
338
▼a  online resource
▼b  cr
▼2  rdacarrier
505
▼a Cover; Title Page; Copyright and Credits; About Packt; Contributors; Table of Contents; Preface; Chapter 1: Pyspark and Setting up Your Development Environment; An overview of PySpark; Spark SQL; Setting up Spark on Windows and PySpark; Core concepts in Spark and PySpark; SparkContext; Spark shell; SparkConf; Summary; Chapter 2: Getting Your Big Data into the Spark Environment Using RDDs; Loading data on to Spark RDDs; The UCI machine learning repository; Getting the data from the repository to Spark; Getting data into Spark; Parallelization with Spark RDDs; What is parallelization?
505
▼a Basics of RDD operationSummary; Chapter 3: Big Data Cleaning and Wrangling with Spark Notebooks; Using Spark Notebooks for quick iteration of ideas; Sampling/filtering RDDs to pick out relevant data points; Splitting datasets and creating some new combinations; Summary; Chapter 4: Aggregating and Summarizing Data into Useful Reports; Calculating averages with map and reduce; Faster average computations with aggregate; Pivot tabling with key-value paired data points; Summary; Chapter 5: Powerful Exploratory Data Analysis with MLlib; Computing summary statistics with MLlib
505
▼a Using Pearson and Spearman correlations to discover correlationsThe Pearson correlation; The Spearman correlation; Computing Pearson and Spearman correlations; Testing our hypotheses on large datasets; Summary; Chapter 6: Putting Structure on Your Big Data with SparkSQL; Manipulating DataFrames with Spark SQL schemas; Using Spark DSL to build queries; Summary; Chapter 7: Transformations and Actions; Using Spark transformations to defer computations to a later time; Avoiding transformations; Using the reduce and reduceByKey methods to calculate the results
505
▼a Performing actions that trigger computationsReusing the same rdd for different actions; Summary; Chapter 8: Immutable Design; Delving into the Spark RDD's parent/child chain; Extending an RDD; Chaining a new RDD with the parent; Testing our custom RDD; Using RDD in an immutable way; Using DataFrame operations to transform; Immutability in the highly concurrent environment; Using the Dataset API in an immutable way; Summary; Chapter 9: Avoiding Shuffle and Reducing Operational Expenses; Detecting a shuffle in a process; Testing operations that cause a shuffle in Apache Spark
505
▼a Changing the design of jobs with wide dependenciesUsing keyBy() operations to reduce shuffle; Using a custom partitioner to reduce shuffle; Summary; Chapter 10: Saving Data in the Correct Format; Saving data in plain text format; Leveraging JSON as a data format; Tabular formats -- CSV; Using Avro with Spark; Columnar formats -- Parquet; Summary; Chapter 11: Working with the Spark Key/Value API; Available actions on key/value pairs; Using aggregateByKey instead of groupBy(); Actions on key/value pairs; Available partitioners on key/value data; Implementing a custom partitioner; Summary
520
▼a In this book, you'll learn to implement some practical and proven techniques to improve aspects of programming and administration in Apache Spark. Techniques are demonstrated using practical examples and best practices. You will also learn how to use Spark and its Python API to create performant analytics with large-scale data.
588
▼a Online resource; title from title page (Safari, viewed May 9, 2019).
590
▼a Added to collection customer.56279.3
650
▼a SPARK (Computer program language)
650
▼a Application software
▼x Development.
650
▼a Big data.
650
▼a Electronic data processing.
650
▼a Python (Computer program language)
650
▼a  Application software
▼x  Development.
▼2  fast
▼0  (OCoLC)fst00811707
650
▼a  Big data.
▼2  fast
▼0  (OCoLC)fst01892965
650
▼a  Electronic data processing.
▼2  fast
▼0  (OCoLC)fst00906956
650
▼a  Python (Computer program language)
▼2  fast
▼0  (OCoLC)fst01084736
650
▼a  SPARK (Computer program language)
▼2  fast
▼0  (OCoLC)fst01922197
655
▼a Electronic books.
700
▼a Potaczek, Bartłomiej,
▼e author.
776
▼i  Print version:
▼a  Lai, Rudy.
▼t  Hands-On Big Data Analytics with Pyspark : Analyze Large Datasets and Discover Techniques for Testing, Immunizing, and Parallelizing Spark Jobs.
▼d  Birmingham : Packt Publishing Ltd, ©2019,
▼z  9781838644130
856
▼3 EBSCOhost
▼u http://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&db=nlabk&AN=2094759
938
▼a  Askews and Holts Library Services
▼b  ASKH
▼n  BDZ0039952975
938
▼a  ProQuest Ebook Central
▼b  EBLB
▼n  EBL5744445
938
▼a  YBP Library Services
▼b  YANK
▼n  16142491
938
▼a  EBSCOhost
▼b  EBSC
▼n  2094759
990
▼a 강리원
991
▼a eBook
994
▼a 92
▼b N$T

자료유형 :	eBook
ISBN :	1838648836
ISBN :	9781838648831
ISBN :
개인저자 :	Lai, Rudy, author.
서명/저자사항 :	Hands-on big data analytics with PySpark: analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs /: Rudy Lai, Bartłomiej Potaczek.
발행사항 :	Birmingham, UK: Packt Publishing, 2019.
형태사항 :	1 online resource: illustrations.
내용주기 :	Cover; Title Page; Copyright and Credits; About Packt; Contributors; Table of Contents; Preface; Chapter 1: Pyspark and Setting up Your Development Environment; An overview of PySpark; Spark SQL; Setting up Spark on Windows and PySpark; Core concepts in Spark and PySpark; SparkContext; Spark shell; SparkConf; Summary; Chapter 2: Getting Your Big Data into the Spark Environment Using RDDs; Loading data on to Spark RDDs; The UCI machine learning repository; Getting the data from the repository to Spark; Getting data into Spark; Parallelization with Spark RDDs; What is parallelization?
내용주기 :	Basics of RDD operationSummary; Chapter 3: Big Data Cleaning and Wrangling with Spark Notebooks; Using Spark Notebooks for quick iteration of ideas; Sampling/filtering RDDs to pick out relevant data points; Splitting datasets and creating some new combinations; Summary; Chapter 4: Aggregating and Summarizing Data into Useful Reports; Calculating averages with map and reduce; Faster average computations with aggregate; Pivot tabling with key-value paired data points; Summary; Chapter 5: Powerful Exploratory Data Analysis with MLlib; Computing summary statistics with MLlib
내용주기 :	Using Pearson and Spearman correlations to discover correlationsThe Pearson correlation; The Spearman correlation; Computing Pearson and Spearman correlations; Testing our hypotheses on large datasets; Summary; Chapter 6: Putting Structure on Your Big Data with SparkSQL; Manipulating DataFrames with Spark SQL schemas; Using Spark DSL to build queries; Summary; Chapter 7: Transformations and Actions; Using Spark transformations to defer computations to a later time; Avoiding transformations; Using the reduce and reduceByKey methods to calculate the results
내용주기 :	Performing actions that trigger computationsReusing the same rdd for different actions; Summary; Chapter 8: Immutable Design; Delving into the Spark RDD's parent/child chain; Extending an RDD; Chaining a new RDD with the parent; Testing our custom RDD; Using RDD in an immutable way; Using DataFrame operations to transform; Immutability in the highly concurrent environment; Using the Dataset API in an immutable way; Summary; Chapter 9: Avoiding Shuffle and Reducing Operational Expenses; Detecting a shuffle in a process; Testing operations that cause a shuffle in Apache Spark
내용주기 :	Changing the design of jobs with wide dependenciesUsing keyBy() operations to reduce shuffle; Using a custom partitioner to reduce shuffle; Summary; Chapter 10: Saving Data in the Correct Format; Saving data in plain text format; Leveraging JSON as a data format; Tabular formats -- CSV; Using Avro with Spark; Columnar formats -- Parquet; Summary; Chapter 11: Working with the Spark Key/Value API; Available actions on key/value pairs; Using aggregateByKey instead of groupBy(); Actions on key/value pairs; Available partitioners on key/value data; Implementing a custom partitioner; Summary
요약 :	In this book, you'll learn to implement some practical and proven techniques to improve aspects of programming and administration in Apache Spark. Techniques are demonstrated using practical examples and best practices. You will also learn how to use Spark and its Python API to create performant analytics with large-scale data.
일반주제명 :	SPARK (Computer program language) --
일반주제명 :	Application software -- Development. --
일반주제명 :	Big data. --
일반주제명 :	Electronic data processing. --
일반주제명 :	Python (Computer program language) --
일반주제명 :	Application software -- Development. --
일반주제명 :	Big data. --
일반주제명 :	Electronic data processing. --
일반주제명 :	Python (Computer program language) --
일반주제명 :	SPARK (Computer program language) --
개인저자 :	Potaczek, Bartłomiej, author.
기타형태 저록 :	Print version: Lai, Rudy. Hands-On Big Data Analytics with Pyspark : Analyze Large Datasets and Discover Techniques for Testing, Immunizing, and Parallelizing Spark Jobs. Birmingham : Packt Publishing Ltd, ©2019, 9781838644130
언어	영어

국방전자도서관NDDL 포털

검색

바구니 담기 완료

신규서재 추가

내서재 담기

내보내기

상세정보

Hands-on big data analytics with PySpark : analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs

소장정보

예약

무인예약대출

이 자료와 함께 본 자료

서평

서평쓰기

태그

태그추가

QR코드

도서관 검색봇 서비스 앤디