Discuss@GL4L

Python Packages


#1

A module is a file containing Python definitions and statements. Modules specify functions, methods and new Python types which solved particular problems.

A package is a collection of modules in directories. There are many available packages for Python covering different problems. For example, “NumPy”, “matplotlib”, “seaborn”, and “scikit-learn” are very famous data science packages.

  • “NumPy” is used for efficiently working with arrays
  • “matplotlib” and “seaborn” are popular libraries used for data visualization
  • “scikit-learn” is a powerful library for machine learning

There are some packages available in Python by default, but there are also so many packages that we need and that we don’t have by default. If we want to use some package, we have to have it installed already or just install it using pip (package maintenance system for Python).

However, there is also something called “Anaconda”.

Anaconda Distribution is a free, easy-to-install package manager, environment manager and Python distribution with a collection of 1,000+ open source packages with free community support.

So, if you don’t want to install many packages, I’ll recommend you to use the “Anaconda”. There are so many useful packages in this distribution.

Import Statements

Once you have installed the needed packages, you can import them into your Python files. We can import an entire package, submodules or specific functions from it. Also, we can add an alias for a package. We can see the different ways of import statements from the examples below.

Simple Import Statement:
import numpy
numbers = numpy.array([3, 4, 20, 15, 7, 19, 0])

Import statement With an Alias:
import numpy as np # np is an alias for the numpy package
numbers = np.array([3, 4, 20, 15, 7, 19, 0]) # works fine
numbers = numpy.array([3, 4, 20, 15, 7, 19, 0]) # NameError: name ‘numpy’ is not defined

Import Submodule From a Package With an Alias:
import the “pyplot” submodule from the “matplotlib” package with alias “plt”
import matplotlib.pyplot as plt

Import Only One Function From a Package:
from numpy import array
numbers = array([3, 4, 20, 15, 7, 19, 0]) # works fine
numbers = numpy.array([3, 4, 20, 15, 7, 19, 0]) # NameError: name ‘numpy’ is not defined
type(numbers) # numpy.ndarray

We can also do something like this from numpy import * . The asterisk symbol here means to import everything from that module. This import statement creates references in the current namespace to all public objects defined by the numpy module. In other words, we can just use all available functions from numpy only with their names without prefix. For example, now we can use the NumPy’s absolute function like that absolute() instead of numpy.absolute() .
However, I’m not recommending you to use that because:

  • If we import all functions from some modules like that, the current namespace will be filled with so many functions and if someone looks our code, he or she can get confused from which package is a specific function.
  • If two modules have a function with the same name, the second import will override the function of the first.

NumPy

NumPy is a fundamental package for scientific computing with Python. It’s very fast and easy to use. This package helps us to make calculations element-wise (element by element).

The regular Python list doesn’t know how to do operations element-wise. Of course, we can use Python lists, but they’re slow, and we need more code to achieve a wanted result. A better decision in most cases is to use NumPy .

Unlike the regular Python list, the NumPy array always has one single type. If we pass an array with different types to the np.array() , we can choose the wanted type using the parameter dtype . If this parameter is not given, then the type will be determined as the minimum type required to hold the objects.

NumPy Array — Type Converting:
np.array([False, 42, “Data Science”]) # array([“False”, “42”, “Data Science”], dtype="<U12")
np.array([False, 42], dtype = int) # array([ 0, 42])
np.array([False, 42, 53.99], dtype = float) # array([ 0. , 42. , 53.99])

Invalid converting

np.array([False, 42, “Data Science”], dtype = float) # could not convert string to float: ‘Data Science’

NumPy array comes with his own attributes and methods. Remember that the operators in Python behave differently on the different data types? Well, in NumPy the operators behave element-wise.

Operators on NumPy Array:
np.array([37, 48, 50]) + 1 # array([38, 49, 51])
np.array([20, 30, 40]) * 2 # array([40, 60, 80])
np.array([42, 10, 60]) / 2 # array([ 21., 5., 30.])

np.array([1, 2, 3]) * np.array([10, 20, 30]) # array([10, 40, 90])
np.array([1, 2, 3]) - np.array([10, 20, 30]) # array([ -9, -18, -27])

If we check the type of a NumPy array the result will be numpy.ndarray . Ndarray means n-dimensional array. In the examples above we used 1-dimensional arrays, but nothing can stop us to make 2, 3, 4 or more dimensional array. We can do subsetting on an array independently of that how much dimensions this array has. I’ll show you some examples with a 2-dimensional array.

Subsetting 2-dimensional arrays:
numbers = np.array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[10, 11, 12]
])

numbers[2, 1] # 8
numbers[-1, 0] # 10
numbers[0] # array([1, 2, 3])
numbers[:, 0] # array([ 1, 4, 7, 10])
numbers[0:3, 2] # array([3, 6, 9])
numbers[1:3, 1:3] # array([[5, 6],[8, 9]])

If we want to see how many dimensional is our array and how much elements have each dimension, we can use the shape attribute. For 2-dimensional arrays, the first element of the tuple will be the number of rows and the second the number of the columns.

NumPy Shape Attribute:
numbers = np.array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[10, 11, 12],
[13, 14, 15]
])

numbers.shape # (5, 3)