When dealing with data analysis and data science, it is essential to have consistent column names to enable you do most of the selection of the columns for your analysis.
But in certain cases, the name of your columns may not be consistent and may have issues such as:
- Inconsistent Cases
- Presence of Spaces Between Words
- Unnecessary Characters in Column Names
- +Etc
Let us see how to deal with these issues. The following are the outline of what we will be discussing in this tutorial.
- How to check columns
- How to rename columns
- How to put underscore in all column names
- How to replace a character or empty space in column names
- How to uppercase/lowercase columns
- How to select all column names except one
- How to select column names of a particular order or phrase(df.filter)
- How to select a group of column name
Let us begin
In [1]:
# Load Dataset
import pandas as pd
In [2]:
# Load Dataset
df = pd.read_csv("raw_dataset.csv")
In [3]:
# Firt Rows
df.head()
Out[3]:
In [4]:
# Columns
df.columns
Out[4]:
In [5]:
## Features of Columns
dir(df.columns)
Out[5]:
In [6]:
### Get The Columns As an Array
df.columns.values
Out[6]:
In [7]:
### Get The Columns As List
df.columns.tolist()
Out[7]:
In [8]:
### To View Columns Names
df.columns.view()
Out[8]:
In [9]:
### To View a Summary of the Column Names
df.columns.summary()
Out[9]:
In [10]:
# Convert the Column Names To Series/ DataFrame
df.columns.to_series()
Out[10]:
In [11]:
# Convert the Column Names To DataFrame
df.columns.to_frame()
Out[11]:
In [12]:
# Check to see if column names contains a phrase
df.columns.contains('First Name')
Out[12]:
In [13]:
# Check to see if column names are duplicated
df.columns.duplicated()
Out[13]:
In [14]:
### Attributes and Methods of Str
dir(df.columns.str)
Out[14]:
In [15]:
### Making Column Name Lower Case
df.columns.str.lower()
Out[15]:
In [16]:
### Making Column Name Upper Case
df.columns.str.upper()
Out[16]:
In [17]:
### Making Column Name Title Case
df.columns.str.title()
Out[17]:
In [18]:
### Replacing Empty spaces with underscore
df.columns.str.replace(' ','_')
Out[18]:
In [19]:
### Renaming Column Name
df.rename(columns={'Age':'Date of Birth'})
Out[19]:
In [20]:
### Renaming Column Name /Inplace
df.rename(columns={'Age':'Date of Birth'},inplace=True)
In [21]:
df.columns
Out[21]:
In [22]:
len(df.columns.values)
Out[22]:
In [49]:
# Renaming Column Names using select values
df.columns.values[7] = 'Email Address'
In [23]:
df.columns
Out[23]:
In [24]:
### Selecting All Columns Except One
df.columns[df.columns != 'SALARY']
Out[24]:
In [25]:
### Selecting All Columns Except One
df.loc[:, df.columns != 'SALARY'].columns
Out[25]:
In [26]:
# Select Column Names Except One Using Difference
df.columns.difference(['SALARY'])
Out[26]:
In [27]:
# Select Column Names Except One Using Negation of isin
df.loc[:,~df.columns.isin(['SALARY'])].columns
Out[27]:
In [28]:
### Select Column Names that Begins with a Word or Character
df.filter(like='STREET').columns
Out[28]:
In [29]:
### Select Column Names that Begins with a Word or Character
df.loc[:,df.columns.str.startswith('STREET')].columns
Out[29]:
In [30]:
### Select Column Names that ENDS with a Word or Character
df.loc[:,df.columns.str.endswith('ame')].columns
Out[30]:
In [31]:
### Select Column Names that ENDS with a Word or Character Using Filter and Regex name$
df.filter(regex='ame$',axis=1).columns
Out[31]:
In [32]:
### Select A Group of Column Names
df.columns.values[0:4]
Out[32]:
In [34]:
### Select A Group of Column Names
df.columns[0:4]
Out[34]:
My brother suggested I might like this web site. He was totally right. This post actually made my day. You can not imagine just how much time I had spent for this information! Thanks!
Glad it was helpful