# Tricks with Boolean SeriesΒΆ

This notebook discusses Boolean series, and various useful things you can do with them.

First letβs import our libraries:

```
import pandas as pd
import numpy as np
```

```
rng = np.random.default_rng(20201103)
```

## Computing ProbabilityΒΆ

If we have a logical series:

```
xb = pd.Series([True, False, True, True, True, False, False, True, False, True])
xb
```

```
0 True
1 False
2 True
3 True
4 True
5 False
6 False
7 True
8 False
9 True
dtype: bool
```

We can **count** the number of `True`

values with `sum`

:

```
xb.sum()
```

```
6
```

We can count the **fraction** of `True`

values, or the probability of `True`

, with `mean()`

:

```
xb.mean()
```

```
0.6
```

The NumPy equivalents also work:

```
np.mean(xb)
```

```
0.6
```

## Creating with Logical OperationsΒΆ

If you do a logical operation (`==`

, `<`

, `>`

, `<=`

, or `>=`

) on a series, comparing it to either a fixed value or another series, you will get a Boolean series.

This is very useful for creating the outcome vector for a logistic regression, as the boolean series will be treated as 1 (`True`

) and 0 (`False`

), and can be predicted with either StatsModels or SciKit-Learnβs logistic regression or other classifier.

For example, if we draw some random numbers:

```
xs = pd.Series(rng.random(size=1000))
xs
```

```
0 0.163352
1 0.333021
2 0.054169
3 0.662442
4 0.517284
...
995 0.882045
996 0.897472
997 0.161948
998 0.850922
999 0.222618
Length: 1000, dtype: float64
```

```
xs.hist()
```

```
<matplotlib.axes._subplots.AxesSubplot at 0x22e1f5277f0>
```

```
xs.describe()
```

```
count 1000.000000
mean 0.491382
std 0.285871
min 0.000805
25% 0.235146
50% 0.489305
75% 0.737922
max 0.997661
dtype: float64
```

We can get a series that is `True`

when the number is at least 0.8:

```
xs_ge = xs >= 0.8
xs_ge
```

```
0 False
1 False
2 False
3 False
4 False
...
995 True
996 True
997 False
998 True
999 False
Length: 1000, dtype: bool
```

We can verify that values start at 0.8:

```
xs[xs_ge].min()
```

```
0.8050954039782324
```

Since the values are drawn uniformly in the range \([0,1)\), then approximately 20% of the values should be at least 0.8 (\(P(X \ge 0.8) = 0.2\)). Letβs check:

```
xs_ge.mean()
```

```
0.183
```

## Logical OperationsΒΆ

The bitwise negation operator, `~`

, negates a boolean series:

```
~xb
```

```
0 False
1 True
2 False
3 False
4 False
5 True
6 True
7 False
8 True
9 False
dtype: bool
```

The `np.logical_not`

function also does:

```
np.logical_not(xb)
```

```
0 False
1 True
2 False
3 False
4 False
5 True
6 True
7 False
8 True
9 False
dtype: bool
```

We can combine them with bitwise and (`&`

) or or (`|`

).

Letβs find all the values between 0.8 and 0.9:

```
xs_in_range = xs_ge & (xs < 0.9)
xs[xs_in_range].describe()
```

```
count 87.000000
mean 0.849424
std 0.028517
min 0.805095
25% 0.826880
50% 0.845260
75% 0.878704
max 0.897472
dtype: float64
```

We can find everything *except* \([0.2,0.8)\):

```
xs_lohi = xs_ge | (xs < 0.2)
xs[xs_lohi].hist()
```

```
<matplotlib.axes._subplots.AxesSubplot at 0x22e1edadeb0>
```

Look, we cut out the middle!