Designing And Building Big Data Applications Pdf


By DemГіcrito J.
In and pdf
16.04.2021 at 23:05
7 min read
designing and building big data applications pdf

File Name: designing and building big data applications .zip
Size: 1753Kb
Published: 16.04.2021

Today's market is flooded with an array of Big Data tools and technologies.

Big Data Analytics Books

Metrics details. With the explosion of social media sites and proliferation of digital computing devices and Internet access, massive amounts of public data is being generated on a daily basis. In addition, careful mining of these data can reveal many useful indicators of socioeconomic and political events, which can help in establishing effective public policies.

The focus of this study is to review the application of big data analytics for the purpose of human development. The emerging ability to use big data techniques for development BD4D promises to revolutionalize healthcare, education, and agriculture; facilitate the alleviation of poverty; and help to deal with humanitarian crises and violent conflicts.

Besides all the benefits, the large-scale deployment of BD4D is beset with several challenges due to the massive size, fast-changing and diverse nature of big data. The most pressing concerns relate to efficient data acquisition and sharing, establishing of context e. In this study, we provide a review of existing BD4D work to study the impact of big data on the development of society. In addition to reviewing the important works, we also highlight important challenges and open issues.

In the modern world we are inundated with data, with companies such as Google and Facebook dealing with petabytes of data [ 1 ]. Google processes more than 24 petabytes of data per day, while Facebook, a company founded a decade ago, gets more than 10 million photos per hour.

The glut of data, buoyed by fast advancing technology, is increasing exponentially due to increased digitization of all aspects of modern life using technologies such as the Internet of Things IoT [ 2 ]—which uses sensors, for example in the shape of wearable devices, to provide data related to human activities and different behavioral patterns.

It is estimated that we are generating 2. A lot of progress has been made in developing the capability to process, store, and analyze big data: In addition to the big data computing capability in terms of processing and storing big data in a distributed fashion on a cluster of computers [ 4 ] , the rapid advances in using intelligent data analytics techniques—drawn from the emerging areas of artificial intelligence AI and machine learning ML —provide the ability to process massive amounts of diverse unstructured data that is now being generated daily to extract valuable actionable knowledge.

This provides a great opportunity to researchers to use this data for developing useful knowledge and insights [ 5 ]. From the perspective of big data for development BD4D , an important quandary is gaining access to important people-related data, which is often in the exclusive access of the government in the form of paper documents. Governments worldwide e. In addition, open source platforms have been developed that facilitate the creation and gathering of digital data from mobile platforms e.

While open data can be rightly regarded as a subset of all the available big data: the nuance is in the liquidity of big data [ 10 ]. Open data also promotes a culture of creativity and public wellbeing as is evident by different hackathons that are being organized to tap the potential of open data in terms of useful mobile applications e. In this report, the importance of open data is highlighted for seven particular sectors: education, health, transportation, consumer products, electricity, oil and gas, and consumer finance.

The Global Pulse program is aimed at forming a network of innovation centers, called the Pulse Labs , all over the world. In [ 14 ], Kirkpatrick, the director of the UN Global Pulse innovation initiative, presents the case for deploying big data techniques and analytics in the field of human development. It is highlighted that data—especially from mobile phone and social media—can be utilized in fighting hunger, disaster and poverty.

The report also discusses the issues and challenges faced by the UN in terms of data access, user privacy and the integration of big data techniques into the various UN humanitarian systems. The aim of this paper is to answer the important question: how can we harness the big data technologies to transform and revolutionize the developing world?

Towards this end, we will review the applications of big data techniques in the context of development and thereby highlight the potential development areas that can benefit from big data technology. We believe that consistent with the huge impact of big data on all other facets of modern society [ 1 , 3 ], big data also has an immense potential for the field of international human development.

We will consider questions such as:. How to access and use all of the data that is present out there on the isolated servers of the companies and organizations for the development purposes?

What are some of the well-known techniques for big data analytics that can be applied in the BD4D context? In this study, we have chosen not to approach the problem of BD4D only from a technological viewpoint, since development is a nuanced subject, we have chosen to adopt a multidisciplinary vantage point integrating technology, economics, social and development sciences.

For this paper, we have reviewed existing research literature, official documents, online projects, blogs and technical reports related directly or indirectly to BD4D. Apart from highlighting the immense potential of BD4D, our work also identifies some of the associated challenges and potential lurking harms that must be understood and countered.

Our paper is distinct from existing survey papers [ 5 , 12 ] in that apart from highlighting the particular development areas that can benefit from big data, we also discuss various techniques for big data analytics, while also describing open issues and directions for future work.

All of this data, if harnessed intelligently, can truly realize the notion of the information age [ 5 ]. Actionable information can be gathered from the data after performing intelligent processing and analytics on the available data. The techniques specially related to machine learning in order to gather, store, process and analyze this vast amount of data are the subject matter of this section.

We also try to link this discussion, and different examples considered here to explain various concepts, to the humanitarian development. The aim of this section is to provide readers with a brief background and related work of the relevant techniques to help them understand their applications when discussed in the perspective of humanitarian development. Machine learning ML , a sub-field of artificial intelligence AI , focuses on the task of enabling computational systems to learn from data about how to perform a desired task automatically.

Machine learning has many applications including decision making, forecasting or predicting and it is a key enabling technology in the deployment of data mining and big data techniques in the diverse fields of healthcare, science, engineering, business and finance. Broadly speaking, ML tasks can be categorized into the following major types:.

If the output or prediction belongs to a continuous set of values then such a problem is called regression , while if the output assumes discrete values then the problem is called classification. In the following we briefly present a few classification techniques. These has been widely used for the Internet traffic classification: e. Decision Trees DT define a popularly used intuitive method that can be used for learning and predicting about target features both for quantitative target attributes as well as nominal target attributes.

Although, DT do not always perform very competitively, their main advantage is their intuitive interpretation which is crucial even network operators have to analyze and interpret the classification method and results. Support Vector Machines SVM is a widely used supervised learning technique that is remarkable for being practical and theoretically sound, simultaneously.

The approach of SVM is rooted in the field of statistical learning theory, and is systematic: e. The basic method in unsupervised learning is clustering. This clustering is used to find the groups of inputs which have similarity in their characteristics. Intuitively, clustering is akin to unsupervised classification: while classification in supervised learning assumed the availability of a correctly labeled training set, the unsupervised task of clustering seeks to identify the structure of input data directly.

In this technique a learner, based on an input received, performs some action, potentially affecting the environment around it. This action is then rewarded or punished.

Deep learning DL is an ML technique that comprises deep and complex architectures [ 17 , 18 ]. These architectures consist of multiple processing layers, each capable of generating non-linear response corresponding to the data input. These layers consist of various small processers running in parallel to process the data provided. These processors are called neurons. DL has proved to be efficient in pattern recognition, image and natural language processing [ 19 ].

DL finds its applications in very broad spectrum of applications ranging from healthcare to the fashion industry [ 20 ], with many key technology giants like Google, IBM and Facebook deploying DL techniques to create intelligent products. It is a method for discovering interesting relations between variables in large databases. In this, we seek to learn about associations between the features present in examples.

Unlike classification supervised learning , which strictly and discretely tells the class of an example, relations or associations among various variables in an example database are considered in association rule learning.

We take an example case mentioned in [ 21 ] where a weather dataset is considered. The usual classification problem would be to tell whether, based on the values of given weather features or attributes like temperature, outlook and wind conditions in the dataset, a game would be played or not.

If, however, we consider association learning perspective then instead of always telling about the status of the game different rules among different features or variables can also be considered. As a example a rule can be established that if the outlook is sunny and the game is being played then the day is going to be non windy.

This type of learning technique can be particularly important for farmers in planning their activities for the best possible crop productions. In numeric prediction , we are not interested in predicting the discrete class or category to which the example belongs, but the numeric quantity associated with it. As an example consider, once again, the weather dataset mentioned to explain the association learning.

Now consider the classification problem where instead of predicting whether based on the given features a game will be played or not a numeric quantity, e. The same scenario, again, can be of importance to a farmer where a numeric quantity such as time, how long, or how much rain will fall on a particular day can be predicted. Data mining usually refers to automated pattern discovery and prediction from large volumes of data using ML techniques [ 21 ].

Data mining can also be used to refer to online analytical processing OLAP or SQL queries that entails retrospectively searching a large database for a specific query.

OLAP queries, also known as decision-support queries, are typically complex expensive queries that take a long time and touch large amounts of data. This knowledge can be in the form of brief and concise visual reports, a predicted value or a model of a larger data generating system [ 22 ].

Data science is an inderdisciplinary field in which different KDD techniques and processes are studied. With the advent of big data and Web 2. This unstructured data is different from the structured data in that it can not be stored in an organized fashion in the conventional relational databases.

In order to store and access unstructured data, a different approach and techniques are required. NoSQL or non-relational databases have been developed for the same purpose [ 23 ]. Companies like Amazon Dynamo [ 24 ] and Google Bigtable [ 25 ] adopt this approach for storing and accessing their data.

The main advantage, besides storing unstructured data, is that these NoSQL databases are distributed and hence easily scalable, fast and flexible as compared to their relational counterpart. One of the concerns in using NoSQL datases, though, is that they usually do not inherently support the ACID atomicity, consistency, integrity and durability set, as supported by the relational databases.

Predictive analytics refers to a technology that aims to provide a competitive advantage by predicting some future occurrences or behavior using data mining and ML techniques based on past experience in the form of collected data.

Predictive analytics encompasses data science, machine learning, predictive and statistical modeling and outputs empirical predictions based on given input empirical data [ 26 ]. The underlying premise is that future can be predicted on the basis of the past experience. Predictive analytics finds its application in various humanitarian development fields ranging from healthcare to education.

As we advance through the text we discuss the applications of predictive analytics in more detail in the upcoming sections. Crowdsourcing is different from outsourcing. In crowdsourcing, the nuance is, a task or a job is outsourced but not to a designated professional or organization but to general public in the from of an open call [ 27 ]. Crowdsourcing is a technique that can be deployed to gather data from various sources such as text messages, social media updates, blogs, etc.

This data can then be harmonized and analyzed in mapping disaster struck regions and to further enable the commencement of search operations. This technique helped during the Haiti earthquake [ 28 ]. Crowdsourcing, based on social media, is discussed in [ 29 ] in terms of the opportunities that it provides for disaster relief and the challenges that are being faced during this process. Internet of things IoT is a new trendy field fueled by the hype in big data, emergence of network science [ 30 ], proliferation of digital communication devices and ubiquitous Internet access to common population.

In IoT, different sensors and actuators are connected via a network to various computing systems providing data for actionable knowledge. In this way IoT, big data and network science are all related.

Architect and Build Big Data Applications

Explore Groups. Organisational membership. Become an Organisational Member. Discover all of them and learn how to join. RDA Outputs are the technical and social infrastructure solutions that enable data sharing, exchange and interoperability.

Chapter 9 Survey on Big Data Applications

Voice based services such as mobile banking, access to personal devices, and logging into soci Citation: Journal of Big Data 8 Content type: Research. Published on: 2 March

Massive amounts of sensor and textual data await the energy and transport sector stakeholders once the digital transformation of the sector reaches its tipping point. This chapter gives a definition of big data application scenarios through examples in different segments of the energy and transport sectors. A mere utilization of existing big data technologies as employed by online businesses will not be sufficient. Domain-specific big data technologies are needed for cyber-physical energy and transport systems, while the focus needs to move beyond big data to smart data technologies. Unless the need for privacy and confidentiality is satisfied, there will always be regulatory uncertainty and barriers to user acceptance of new data-driven offerings.

Big Data in the Energy and Transport Sectors

Big Data - Definition, Importance, Examples & Tools

Big data is a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data-processing application software. Data with many fields columns offer greater statistical power , while data with higher complexity more attributes or columns may lead to a higher false discovery rate. Big data was originally associated with three key concepts: volume , variety , and velocity.

Building Big Data Applications helps data managers and their organizations make the most of unstructured data with an existing data warehouse. It provides readers with what they need to know to make sense of how Big Data fits into the world of Data Warehousing. Readers will learn about infrastructure options and integration and come away with a solid understanding on how to leverage various architectures for integration. The book includes a wide range of use cases that will help data managers visualize reference architectures in the context of specific industries healthcare, big oil, transportation, software, etc.

Big data is the emerging field where innovative technology offers new ways to extract value from the tsunami of available information. As with any emerging area, terms and concepts can be open to different interpretations. The Big Data domain is no different. The Big Data Value Chain is introduced to describe the information flow within a big data system as a series of steps needed to generate value and useful insights from data. The value chain enables the analysis of big data technologies for each step within the chain.


and then, based on the survey of around cases of applications of big data and machine learning throughout. all phases of building design.


Home Curation Policy Privacy Policy. They are not general purpose applications that typically run on general purpose file systems. Architecture of Hbase With this hands-on guide, you'll learn how to architect, design, and deploy your own HBase applications by examining real-world solutions.

Metrics details. With the explosion of social media sites and proliferation of digital computing devices and Internet access, massive amounts of public data is being generated on a daily basis. In addition, careful mining of these data can reveal many useful indicators of socioeconomic and political events, which can help in establishing effective public policies. The focus of this study is to review the application of big data analytics for the purpose of human development. The emerging ability to use big data techniques for development BD4D promises to revolutionalize healthcare, education, and agriculture; facilitate the alleviation of poverty; and help to deal with humanitarian crises and violent conflicts.

The goal of this chapter is to shed light on different types of big data applications needed in various industries including healthcare, transportation, energy, banking and insurance, digital media and e-commerce, environment, safety and security, telecommunications, and manufacturing. In response to the problems of analyzing large-scale data, different tools, techniques, and technologies have bee developed and are available for experimentation. In our analysis, we focused on literature review articles accessible via the Elsevier ScienceDirect service and the Springer Link service from more recent years, mainly from the last two decades. For the selected industries, this chapter also discusses challenges that can be addressed and overcome using the semantic processing approaches and knowledge reasoning approaches discussed in this book. RQ1 : What are the main application areas of big data analytics and the specific data processing aspects that drive value for a selected industry domain?

Она смотрела на него с недоумением.

4 Comments

Harriette C.
17.04.2021 at 19:09 - Reply

The little brown essential handbook 9th edition pdf complete guide to the topik pdf

Santiago B.
23.04.2021 at 17:24 - Reply

with Training for Hadoop and the Enterprise Data Hub. Cloudera University's four​-day course for designing and building big data applications prepares you to.

Relropisor
24.04.2021 at 11:32 - Reply

is a perfect application that has large data sets. It adopts the master and slave architecture. HDFS cluster consists of DataNode and NameNode.

Aiglentina L.
24.04.2021 at 18:11 - Reply

Created August

Leave a Reply