Making Python colorful using Rich Text Formatting
Rich is an open-source python library that is used for beautiful text-formatting. It contains a large variety of text-formatting options whether it is a font, color of text, or style of the text Rich got it all covered. Not only jupyter rich can make even your console colorful by using different color schemes and functions.
Rich works with no additional dependencies and is easy to use. It supports all major operating systems and works on all terminals of different operating systems.
In this article, we will explore Rich and see how we can make our jupyter notebook and console colorful. Also, we will explore different functions and options that the rich provide. …
Pywedge is an open-source python library which is a complete package that helps you in Visualizing the data, Pre-process the data and also create some baseline models which can be further tuned to make the best machine learning model for the data.
It is a hassle-free package that saves the time and effort of creating different types of plots to analyze the data. It contains 8 different types of visualization which can be used using the user-friendly GUI of pywedge. It takes care of all the pre-processing that the data may require whether it is cleaning the data, normalization, or handling the class imbalance pywedge covers it all. …
GAN is basically an unsupervised learning algorithm that works on a neural network and generates samples from an image and also differentiates it from the original images which are used. For this, it uses two neural networks first one is a generator, and the second one is a discriminator.
In this article, we will use GAN to create a cartoon of an image that we will provide to the model. We will generate different cartoons by running the model on different epochs.
Before starting kindly follow me on medium by clicking here and stay updated about my new articles in the field of data science. …
In linear regression, we try to find the best fit line which can predict the relationship between the target and the feature variable. In simpler terms, it helps us find the value of a dependent variable Y for every value of X with the help of some estimators known as slope and intercept.
In machine learning, linear regression is considered to be the most basic problem that we start with, as the Linear Regression model is very easily interpretable we any machine learning enthusiast starts by performing the Linear Regression model initially.
In this article we will see how we can use a neural network to solve Linear Regression but not using Keras, we will create a model only using native python and numpy. …
Feature variable plays an important role in creating predictive models whether it is Regression or Classification Model. Having a large number of features is not good because it may lead to overfitting, which will make our model specifically fit the data on which it is trained. Also having a large number of features will cause the curse of dimensionality i.e. the features will increase the dimensions of search space for the problem.
Feature Importance is a technique that provides us with a relevant score for every feature variable which we can use to decide which features are most important and which features are least important for predicting the target variable. …
OpenCV is an open-source python library used for computer vision and machine learning. It is mainly aimed at real-time computer vision and image processing. It is used to perform different operations on images which transform them using different techniques.
It majorly supports all languages like Python, C++, Android, Java, etc. It is easy to use and in demand due to its features. It is used in creating image processing or rendering application using different languages.
In this article, we will try to perform some image transformation using the CV2 version of OpenCV. …
Data Visualization plays an important role in data analysis because as soon as the human eyes see some charts or graphs they try finding the patterns in that graph.
Data Visualization is visually representing the data using different plots/graphs/charts to find out the pattern, outliers, and relation between different attributes of a dataset. It is a graphical representation of data.
We can perform data visualization using PySpark but before that, we need to set it up on our local machine. …
SQL is a language that is used to perform different operations on data like storing, manipulating, and retrieving. It works on relational databases in which data is stored in the form of rows and columns.
SQL commands can be classified into three types according to their properties:
As the name suggests DDL commands are used to define the data. The commands which are included in DDL are CREATE, INSERT, TRUNCATE, DROP, etc.
2. DML(Data Manipulation Language)
Data Manipulation commands are used to alter and update the data according to user requirements. …
Exploratory Data Analysis is the most crucial part, to begin with whenever we are working with a dataset. It allows us to analyze the data and let us explore the initial findings from data like how many rows and columns are there, what are the different columns, etc. EDA is an approach where we summarize the main characteristics of the data using different methods and mainly visualization.
Let’s start EDA using PySpark, before this if you have not yet installed PySpark, kindly visit the link below and get it configured on your Local Machine.
Once we have configured PySpark on our machine we can use Jupyter Notebook to start exploring it. In this article, we will perform EDA operations using PySpark, for this we will using the Boston Dataset which can be downloaded Kaggle. Let’s start by importing the required libraries and loading the dataset. …
Apache Spark is a platform for Big Data processing that provides the capability of processing a huge amount of scale data. It is a data analytics engine for big data processing, which has in-built models for SQL, machine learning, deep learning, and graph processing
PySpark was released to make an interface between Spark and Python in order to make use of both for fast data processing. PySpark is a Python API for Spark.
In simpler terms, PySpark is a general-purpose distributed computation engine that can run across multiple servers in a coherent way and they can read distributed data sets and process that data based on code you have written to run within the Spark (“engine”). …
About