Data remains as raw text until it is mined and the information contained within it is harnessed. Mining data to make sense out of it has applications in varied fields of industry and academia. In this article, we explore the best open source tools that can aid us in data mining.
Data mining, also known as knowledge discovery from databases, is a process of mining and analysing enormous amounts of data and extracting information from it. Data mining can quickly answer business questions that would have otherwise consumed a lot of time. Some of its applications include market segmentation – like identifying characteristics of a customer buying a certain product from a certain brand, fraud detection – identifying transaction patterns that could probably result in an online fraud, and market based and trend analysis – what products or services are always purchased together, etc. This article focuses on the various open source options available and their significance in different contexts.
A brief look at mining tasks
For those who are new to data mining, let’s take a brief look at some of the common mining tasks.
Pre-processing: This involves all the preliminary tasks that can help in getting started with any of the actual mining tasks. Pre-processing could be removing anomalies and noise from the data that’s about to be mined, filling in missing values, normalising the data or compressing data using techniques like generalisation and aggregation.
Clustering: This is partitioning a huge set of data into related sub-classes.
Classification: This is tagging or classifying data items into different user-defined categories.
Outlier analysis helps in identifying those data elements which are deviant or distant from the rest of the elements in a dataset. This can help in anomaly detection.
Associative analysis helps in bringing out hidden relationships among data items in a large data set. This can help in predicting the occurrence of a particular item in a transaction or an event whenever some other item is present. You can think of this as a conditional probability.
This story is from the February 2017 edition of Open Source For You.
Start your 7-day Magzter GOLD free trial to access thousands of curated premium stories, and 8,500+ magazines and newspapers.
Already a subscriber ? Sign In
This story is from the February 2017 edition of Open Source For You.
Start your 7-day Magzter GOLD free trial to access thousands of curated premium stories, and 8,500+ magazines and newspapers.
Already a subscriber? Sign In
Kubernetes: A Dependable and Popular Platform
Kubernetes is more than just a tool; it serves as a robust platform, streamlining the deployment of applications, as well as their scaling and operation in various environments.
APIs: Helping Applications Communicate and Collaborate
Application programming interfaces APIs) have become integral components that facilitate seamless communication and interaction between different software systems. They play a pivotal role in modern software development, contributing to interoperability, scalability, and innovation across diverse applications. We delve into the fundamentals of APIs, exploring their definition, functions, types, and the significant impact they have on the digital landscape.
Languages for AI/ML: A Quick Look at Python, R, and Julia
We explore three open source languages used for Al/ML—Python, R, and Julia—highlighting their key features and advantages. You will get to know the diverse options these offer for A/ML development, so that you can select the right language for your project.
How Much Open Source is Too Much Open Source?
Intel’s OpenVINO toolkit helps developers by streamlining code writing, freeing them to concentrate on other vital project aspects. Al Evangelist at Intel, Anisha Udayakumar, elucidates on OpenVINO's versatility.
The Cost of Inaction: Exploring the Consequences of Ignoring lloT Security Risks
As Industrial loT IloT) integration surges, so do security concerns. Let’s delve into the rising threat landscape and the role of the security model in fortifying lloT defences and safeguarding critical infrastructure.
Ensuring Ethics in AI and Mitigating Bias
As AI solutions proliferate, ensuring they are not biased with respect to gender, religion, financial status, etc, has become of paramount importance. The good news is that there is a lot of work being done on that front.
Open Source Tools for Generative Al: An Introduction
Open source generative Al tools are software programs and libraries that enable users to generate creative and novel output using Al algorithms. They are smart and powerful, and enable various forms of content generation.
PHP Geek, FOSS Enthusiast, CTO and a Paediatrician
‘PHP geek, free and open source software enthusiast, CTO chief technical officer) of SANIsoft’ that’s how Dr Tarique Sani likes to describe himself. He’s qualified to be a paediatrician, but his love for open source has turned him into a geek for the past two decades and more. He recalls the good old days...
The Transformative Impact of Generative AI on Organisations
Generative Al is impacting organisations for the better. End users, company employees, developers and operations teams are all benefiting from it.
"Open source allows us to lower costs, accelerate delivery, and customise solutions to meet the market's fast-paced demands"
Open source is crucial for cost reduction and accelerated delivery of tailored solutions to meet market demands. At OSI 2023, OSFY’s Yashasvini Razdan got a chance to speak to Dr Biswajit Mohapatra, Head, Customer Solutions at Amazon Web Services, who spoke about how open source empowered businesses with flexibility, experimentation, and agile methodologies for genuine customer satisfaction.