Data Cleaning In Python – Working with Column Names

When dealing with data analysis and data science, it is essential to have consistent column names to enable you do most of the selection of the columns for your analysis.

But in certain cases, the name of your columns may not be consistent and may have issues such as:

  • Inconsistent Cases
  • Presence of Spaces Between Words
  • Unnecessary Characters in Column Names
  • +Etc

Let us see how to deal with these issues. The following are the outline of what we will be discussing in this tutorial.

  • How to check columns
  • How to rename columns
  • How to put underscore in all column names
  • How to replace a character or empty space in column names
  • How to uppercase/lowercase columns
  • How to select all column names except one
  • How to select column names of a particular order or phrase(df.filter)
  • How to select a group of column name

Let us begin

 
In [1]:
# Load Dataset
import pandas as pd
In [2]:
# Load Dataset
df = pd.read_csv("raw_dataset.csv")
In [3]:
# Firt Rows
df.head()
Out[3]:
First Name Last name Age SALARY STREET Address1 STREET Address2 STREET Address3 email
0 Joel Padilla 10/28/2019 $92.32</td> <td>431-6530 Eu, Rd.</td> <td>364-2264 Augue Rd.</td> <td>P.O. Box 864, 3882 Orci Street</td> <td>eu@nibh.com</td> </tr> <tr> <th>1</th> <td>Fritz</td> <td>Tyler</td> <td>09/27/2019</td> <td>$83.91 Ap #377-2267 Ac Av. 979-2228 Vel Ave 9865 Eu Av. est.ac.mattis@malesuadafringilla.net
2 Wing Phelps 02/18/2019 $17.15</td> <td>Ap #545-5786 Pulvinar Ave</td> <td>Ap #973-5781 Sagittis Avenue</td> <td>9959 Ut St.</td> <td>dolor@cubilia.net</td> </tr> <tr> <th>3</th> <td>Ryan</td> <td>Ross</td> <td>05/21/2019</td> <td>$45.97 634-7858 Id Road 907-8824 Fringilla Ave 318-5271 In Ave interdum.libero.dui@vitaeerat.com
4 Drake Day 01/09/2020 $84.38 999-8221 Tempor, St. 297-6939 Turpis. Ave P.O. Box 638, 6932 Laoreet Rd. nulla.Integer.vulputate@liberoat.ca
In [4]:
# Columns
df.columns
Out[4]:
Index(['First Name', 'Last name', 'Age', 'SALARY', 'STREET Address1',
       'STREET Address2', 'STREET Address3', 'email'],
      dtype='object')
In [5]:
## Features of Columns
dir(df.columns)
Out[5]:
['T',
 '__abs__',
 '__add__',
 '__and__',
 '__array__',
 '__array_priority__',
 '__array_wrap__',
 '__bool__',
 '__bytes__',
 '__class__',
 '__contains__',
 '__copy__',
 '__deepcopy__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__divmod__',
 '__doc__',
 '__eq__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__init__',
 '__init_subclass__',
 '__inv__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__module__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__nonzero__',
 '__or__',
 '__pos__',
 '__pow__',
 '__radd__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rfloordiv__',
 '__rmul__',
 '__rpow__',
 '__rsub__',
 '__rtruediv__',
 '__setattr__',
 '__setitem__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__sub__',
 '__subclasshook__',
 '__truediv__',
 '__unicode__',
 '__weakref__',
 '__xor__',
 '_accessors',
 '_add_comparison_methods',
 '_add_logical_methods',
 '_add_logical_methods_disabled',
 '_add_numeric_methods',
 '_add_numeric_methods_add_sub_disabled',
 '_add_numeric_methods_binary',
 '_add_numeric_methods_disabled',
 '_add_numeric_methods_unary',
 '_assert_can_do_op',
 '_assert_can_do_setop',
 '_assert_take_fillable',
 '_attributes',
 '_can_hold_identifiers_and_holds_name',
 '_can_hold_na',
 '_can_reindex',
 '_cleanup',
 '_coerce_scalar_to_index',
 '_coerce_to_ndarray',
 '_comparables',
 '_concat',
 '_concat_same_dtype',
 '_constructor',
 '_convert_arr_indexer',
 '_convert_can_do_setop',
 '_convert_for_op',
 '_convert_index_indexer',
 '_convert_list_indexer',
 '_convert_listlike_indexer',
 '_convert_scalar_indexer',
 '_convert_slice_indexer',
 '_convert_tolerance',
 '_data',
 '_defer_to_indexing',
 '_deprecations',
 '_dir_additions',
 '_dir_deletions',
 '_engine',
 '_engine_type',
 '_evaluate_with_datetime_like',
 '_evaluate_with_timedelta_like',
 '_filter_indexer_tolerance',
 '_format_attrs',
 '_format_data',
 '_format_native_types',
 '_format_space',
 '_format_with_header',
 '_formatter_func',
 '_get_attributes_dict',
 '_get_fill_indexer',
 '_get_fill_indexer_searchsorted',
 '_get_grouper_for_level',
 '_get_level_number',
 '_get_level_values',
 '_get_loc_only_exact_matches',
 '_get_names',
 '_get_nearest_indexer',
 '_get_reconciled_name_object',
 '_get_string_slice',
 '_get_unique_index',
 '_has_complex_internals',
 '_id',
 '_infer_as_myclass',
 '_inner_indexer',
 '_invalid_indexer',
 '_is_homogeneous_type',
 '_is_memory_usage_qualified',
 '_is_numeric_dtype',
 '_is_strictly_monotonic_decreasing',
 '_is_strictly_monotonic_increasing',
 '_isnan',
 '_join_level',
 '_join_monotonic',
 '_join_multi',
 '_join_non_unique',
 '_join_precedence',
 '_left_indexer',
 '_left_indexer_unique',
 '_map_values',
 '_maybe_cast_indexer',
 '_maybe_cast_slice_bound',
 '_maybe_promote',
 '_maybe_update_attributes',
 '_mpl_repr',
 '_na_value',
 '_nan_idxs',
 '_ndarray_values',
 '_outer_indexer',
 '_reduce',
 '_reindex_non_unique',
 '_reset_cache',
 '_reset_identity',
 '_scalar_data_error',
 '_searchsorted_monotonic',
 '_set_names',
 '_shallow_copy',
 '_shallow_copy_with_infer',
 '_simple_new',
 '_sort_levels_monotonic',
 '_string_data_error',
 '_summary',
 '_to_safe_for_reshape',
 '_try_convert_to_int_index',
 '_typ',
 '_unpickle_compat',
 '_update_inplace',
 '_validate_for_numeric_binop',
 '_validate_for_numeric_unaryop',
 '_validate_index_level',
 '_validate_indexer',
 '_validate_names',
 '_validate_sort_keyword',
 '_values',
 '_wrap_joined_index',
 '_wrap_setop_result',
 'all',
 'any',
 'append',
 'argmax',
 'argmin',
 'argsort',
 'array',
 'asi8',
 'asof',
 'asof_locs',
 'astype',
 'contains',
 'copy',
 'delete',
 'difference',
 'drop',
 'drop_duplicates',
 'droplevel',
 'dropna',
 'dtype',
 'dtype_str',
 'duplicated',
 'empty',
 'equals',
 'factorize',
 'fillna',
 'format',
 'get_duplicates',
 'get_indexer',
 'get_indexer_for',
 'get_indexer_non_unique',
 'get_level_values',
 'get_loc',
 'get_slice_bound',
 'get_value',
 'get_values',
 'groupby',
 'has_duplicates',
 'hasnans',
 'holds_integer',
 'identical',
 'inferred_type',
 'insert',
 'intersection',
 'is_',
 'is_all_dates',
 'is_boolean',
 'is_categorical',
 'is_floating',
 'is_integer',
 'is_interval',
 'is_lexsorted_for_tuple',
 'is_mixed',
 'is_monotonic',
 'is_monotonic_decreasing',
 'is_monotonic_increasing',
 'is_numeric',
 'is_object',
 'is_type_compatible',
 'is_unique',
 'isin',
 'isna',
 'isnull',
 'item',
 'join',
 'map',
 'max',
 'memory_usage',
 'min',
 'name',
 'names',
 'nbytes',
 'ndim',
 'nlevels',
 'notna',
 'notnull',
 'nunique',
 'putmask',
 'ravel',
 'reindex',
 'rename',
 'repeat',
 'searchsorted',
 'set_names',
 'set_value',
 'shape',
 'shift',
 'size',
 'slice_indexer',
 'slice_locs',
 'sort',
 'sort_values',
 'sortlevel',
 'str',
 'summary',
 'symmetric_difference',
 'take',
 'to_flat_index',
 'to_frame',
 'to_list',
 'to_native_types',
 'to_numpy',
 'to_series',
 'transpose',
 'union',
 'unique',
 'value_counts',
 'values',
 'view',
 'where']
In [6]:
### Get The Columns As an Array
df.columns.values
Out[6]:
array(['First Name', 'Last name', 'Age', 'SALARY', 'STREET Address1',
       'STREET Address2', 'STREET Address3', 'email'], dtype=object)
In [7]:
### Get The Columns As  List
df.columns.tolist()
Out[7]:
['First Name',
 'Last name',
 'Age',
 'SALARY',
 'STREET Address1',
 'STREET Address2',
 'STREET Address3',
 'email']
In [8]:
### To View Columns Names
df.columns.view()
Out[8]:
Index(['First Name', 'Last name', 'Age', 'SALARY', 'STREET Address1',
       'STREET Address2', 'STREET Address3', 'email'],
      dtype='object')
In [9]:
### To View a Summary of the Column Names
df.columns.summary()
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:2: FutureWarning: 'summary' is deprecated and will be removed in a future version.
  
Out[9]:
'Index: 8 entries, First Name to email'
In [10]:
# Convert the Column Names To Series/ DataFrame
df.columns.to_series()
Out[10]:
First Name              First Name
Last name                Last name
Age                            Age
SALARY                      SALARY
STREET Address1    STREET Address1
STREET Address2    STREET Address2
STREET Address3    STREET Address3
email                        email
dtype: object
In [11]:
# Convert the Column Names To DataFrame
df.columns.to_frame()
Out[11]:
0
First Name First Name
Last name Last name
Age Age
SALARY SALARY
STREET Address1 STREET Address1
STREET Address2 STREET Address2
STREET Address3 STREET Address3
email email
In [12]:
# Check to see if column names contains a phrase
df.columns.contains('First Name')
Out[12]:
True
In [13]:
# Check to see if column names are duplicated
df.columns.duplicated()
Out[13]:
array([False, False, False, False, False, False, False, False])
In [14]:
### Attributes and Methods of Str
dir(df.columns.str)
Out[14]:
['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__frozen',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_freeze',
 '_get_series_list',
 '_is_categorical',
 '_make_accessor',
 '_orig',
 '_parent',
 '_validate',
 '_wrap_result',
 'capitalize',
 'cat',
 'center',
 'contains',
 'count',
 'decode',
 'encode',
 'endswith',
 'extract',
 'extractall',
 'find',
 'findall',
 'get',
 'get_dummies',
 'index',
 'isalnum',
 'isalpha',
 'isdecimal',
 'isdigit',
 'islower',
 'isnumeric',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'len',
 'ljust',
 'lower',
 'lstrip',
 'match',
 'normalize',
 'pad',
 'partition',
 'repeat',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'slice',
 'slice_replace',
 'split',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',
 'wrap',
 'zfill']
In [15]:
### Making Column Name Lower Case
df.columns.str.lower()
Out[15]:
Index(['first name', 'last name', 'age', 'salary', 'street address1',
       'street address2', 'street address3', 'email'],
      dtype='object')
In [16]:
### Making Column Name Upper Case
df.columns.str.upper()
Out[16]:
Index(['FIRST NAME', 'LAST NAME', 'AGE', 'SALARY', 'STREET ADDRESS1',
       'STREET ADDRESS2', 'STREET ADDRESS3', 'EMAIL'],
      dtype='object')
In [17]:
### Making Column Name Title Case
df.columns.str.title()
Out[17]:
Index(['First Name', 'Last Name', 'Age', 'Salary', 'Street Address1',
       'Street Address2', 'Street Address3', 'Email'],
      dtype='object')
In [18]:
### Replacing Empty spaces with underscore
df.columns.str.replace(' ','_')
Out[18]:
Index(['First_Name', 'Last_name', 'Age', 'SALARY', 'STREET_Address1',
       'STREET_Address2', 'STREET_Address3', 'email'],
      dtype='object')
In [19]:
### Renaming Column Name
df.rename(columns={'Age':'Date of Birth'})
Out[19]:
First Name Last name Date of Birth SALARY STREET Address1 STREET Address2 STREET Address3 email
0 Joel Padilla 10/28/2019 $92.32 431-6530 Eu, Rd. 364-2264 Augue Rd. P.O. Box 864, 3882 Orci Street eu@nibh.com
1 Fritz Tyler 09/27/2019 $83.91</td> <td>Ap #377-2267 Ac Av.</td> <td>979-2228 Vel Ave</td> <td>9865 Eu Av.</td> <td>est.ac.mattis@malesuadafringilla.net</td> </tr> <tr> <th>2</th> <td>Wing</td> <td>Phelps</td> <td>02/18/2019</td> <td>$17.15 Ap #545-5786 Pulvinar Ave Ap #973-5781 Sagittis Avenue 9959 Ut St. dolor@cubilia.net
3 Ryan Ross 05/21/2019 $45.97</td> <td>634-7858 Id Road</td> <td>907-8824 Fringilla Ave</td> <td>318-5271 In Ave</td> <td>interdum.libero.dui@vitaeerat.com</td> </tr> <tr> <th>4</th> <td>Drake</td> <td>Day</td> <td>01/09/2020</td> <td>$84.38 999-8221 Tempor, St. 297-6939 Turpis. Ave P.O. Box 638, 6932 Laoreet Rd. nulla.Integer.vulputate@liberoat.ca
5 Lucian Wynn 06/11/2019 $0.54</td> <td>P.O. Box 650, 6721 Ut, Av.</td> <td>P.O. Box 963, 2210 Est St.</td> <td>1926 Posuere, Rd.</td> <td>molestie@nuncsitamet.com</td> </tr> <tr> <th>6</th> <td>Kasper</td> <td>Villarreal</td> <td>02/05/2019</td> <td>$47.69 6449 Ultrices Av. 979-7052 Parturient Rd. P.O. Box 395, 3413 Tellus St. nec.orci@mi.com
7 Emerson Pratt 03/01/2020 $13.08</td> <td>405-6163 Mollis St.</td> <td>P.O. Box 963, 1079 Lorem, St.</td> <td>Ap #460-9797 Velit Road</td> <td>semper@Proinmi.net</td> </tr> <tr> <th>8</th> <td>Dustin</td> <td>Fleming</td> <td>08/30/2019</td> <td>$61.60 9205 Maecenas Ave P.O. Box 919, 9529 Donec Avenue P.O. Box 926, 8939 Fusce St. vulputate.velit.eu@lacusCras.co.uk
9 Addison Juarez 07/02/2019 $17.32</td> <td>322-2727 Lacinia Rd.</td> <td>P.O. Box 952, 7768 Sed Road</td> <td>Ap #644-6787 Pellentesque Rd.</td> <td>mollis.non@Quisquetinciduntpede.org</td> </tr> <tr> <th>10</th> <td>Xanthus</td> <td>Colon</td> <td>05/29/2019</td> <td>$39.79 653-2325 Aliquam Rd. P.O. Box 239, 1241 Diam Rd. 9441 Duis Ave Sed.et@tristique.co.uk
11 Raphael Leonard 12/01/2018 $81.86</td> <td>Ap #915-7047 Aliquam Ave</td> <td>Ap #438-382 Ac Street</td> <td>7773 Odio. Ave</td> <td>mollis.dui.in@scelerisque.co.uk</td> </tr> <tr> <th>12</th> <td>Oliver</td> <td>Hartman</td> <td>03/15/2019</td> <td>$7.36 Ap #840-7597 Risus. Ave Ap #424-8954 Erat. Avenue P.O. Box 859, 934 Eu Road tellus.Suspendisse@gravida.ca
13 Demetrius Hurst 07/08/2020 $42.07</td> <td>P.O. Box 595, 3299 Metus St.</td> <td>4391 Parturient St.</td> <td>P.O. Box 401, 8266 Dictum Street</td> <td>Integer.mollis.Integer@Phasellusvitaemauris.com</td> </tr> <tr> <th>14</th> <td>Kermit</td> <td>English</td> <td>01/23/2019</td> <td>$92.75 568-6252 Mus. Rd. P.O. Box 296, 1566 Cursus. Rd. 3912 At Ave imperdiet.dictum.magna@lobortisClass.org
15 Zahir Travis 11/03/2019 $27.64</td> <td>500-8214 Gravida. Street</td> <td>P.O. Box 873, 7725 Pede, Rd.</td> <td>Ap #507-2527 Mollis. St.</td> <td>Quisque@semegestas.net</td> </tr> <tr> <th>16</th> <td>Jelani</td> <td>Carroll</td> <td>11/21/2018</td> <td>$51.32 817-746 Mattis. St. P.O. Box 762, 9695 Nisi. Rd. 2847 Curabitur Road leo.elementum@nislMaecenasmalesuada.ca
17 Zachery Skinner 08/24/2018 $69.21</td> <td>750-2734 Eu St.</td> <td>1613 Interdum. Street</td> <td>5101 Nulla. Av.</td> <td>nulla.ante.iaculis@pede.co.uk</td> </tr> <tr> <th>18</th> <td>Jermaine</td> <td>Osborne</td> <td>08/07/2019</td> <td>$14.09 Ap #695-2496 Enim Avenue Ap #319-5081 Cras Ave 914-3943 Pede Road Sed.eu.nibh@sapienmolestie.net
19 Curran Gay 09/20/2018 $17.96</td> <td>Ap #334-1963 Gravida Street</td> <td>Ap #859-6577 Accumsan Rd.</td> <td>934-3060 Enim. St.</td> <td>id.erat.Etiam@elitpede.ca</td> </tr> <tr> <th>20</th> <td>Grady</td> <td>Schroeder</td> <td>12/26/2018</td> <td>$79.33 3832 Morbi Rd. 388-4974 Eleifend St. 5403 Donec Rd. odio@eleifendCrassed.org
21 Moses Barnett 11/26/2018 $30.79</td> <td>P.O. Box 337, 2447 Vestibulum, Road</td> <td>Ap #183-9990 Arcu. Rd.</td> <td>Ap #328-773 Magna. St.</td> <td>eleifend.nunc@milaciniamattis.org</td> </tr> <tr> <th>22</th> <td>Todd</td> <td>Richmond</td> <td>06/26/2020</td> <td>$64.85 9020 Ac Rd. 650-7083 Nec Rd. P.O. Box 658, 7970 Ac St. Duis.a.mi@nonummyFusce.net
23 Dante Summers 08/01/2019 $90.14</td> <td>651-6550 Felis, Av.</td> <td>Ap #613-7918 Dui, Rd.</td> <td>9110 Vestibulum, St.</td> <td>dui.Fusce@Etiamvestibulummassa.co.uk</td> </tr> <tr> <th>24</th> <td>Erasmus</td> <td>Nieves</td> <td>07/01/2019</td> <td>$0.22 626 Est. Avenue 892-8304 Pellentesque Rd. 8333 Pharetra. Av. Fusce.feugiat@ipsumdolor.ca
25 Reed Duke 11/25/2019 $30.94</td> <td>1903 Egestas St.</td> <td>6642 Fusce Ave</td> <td>Ap #788-4216 Ipsum Road</td> <td>Morbi.neque@ultrices.com</td> </tr> <tr> <th>26</th> <td>Abbot</td> <td>Love</td> <td>06/11/2020</td> <td>$29.24 Ap #660-2020 Tincidunt Av. P.O. Box 486, 8215 Cras St. 215-5661 Integer Avenue sed.tortor.Integer@quisarcuvel.ca
27 Gary Lowe 10/12/2018 $59.96</td> <td>1524 Malesuada Av.</td> <td>9038 Consectetuer Rd.</td> <td>Ap #931-7278 Ridiculus Av.</td> <td>sit@Pellentesque.edu</td> </tr> <tr> <th>28</th> <td>Eaton</td> <td>Webster</td> <td>11/22/2019</td> <td>$0.41 8240 Vitae Rd. Ap #804-4838 Vitae St. 334 Lacus. Rd. a@Integeraliquam.ca
29 Allen Odom 08/03/2019 $20.60</td> <td>942 Duis Road</td> <td>Ap #266-6330 Conubia Ave</td> <td>1887 Eget Rd.</td> <td>Nunc@felis.com</td> </tr> <tr> <th>…</th> <td>…</td> <td>…</td> <td>…</td> <td>…</td> <td>…</td> <td>…</td> <td>…</td> <td>…</td> </tr> <tr> <th>70</th> <td>Ira</td> <td>Francis</td> <td>11/06/2018</td> <td>$43.03 428-8034 Orci St. 8708 Donec Rd. 415 Neque Rd. blandit.Nam@Sed.net
71 Clinton Emerson 06/18/2019 $45.08</td> <td>Ap #857-9020 Ipsum. Street</td> <td>Ap #469-4153 Luctus St.</td> <td>P.O. Box 748, 8147 Faucibus. St.</td> <td>euismod@seddui.org</td> </tr> <tr> <th>72</th> <td>Thaddeus</td> <td>Workman</td> <td>05/10/2019</td> <td>$6.86 P.O. Box 925, 1576 Et St. P.O. Box 682, 3390 Leo. Ave Ap #962-5025 Ipsum Av. elit.erat.vitae@maurisipsumporta.com
73 Marshall Harris 02/04/2020 $6.35</td> <td>P.O. Box 593, 7578 Nunc St.</td> <td>9197 Rutrum Avenue</td> <td>155-9329 Bibendum St.</td> <td>mollis@DonecegestasAliquam.com</td> </tr> <tr> <th>74</th> <td>Jin</td> <td>Gould</td> <td>10/08/2019</td> <td>$48.66 587-6660 Vel Road P.O. Box 213, 7547 Pharetra, St. P.O. Box 886, 3113 Arcu. Road a.malesuada.id@Sed.ca
75 Ulric Alvarado 09/25/2019 $72.00</td> <td>217-9576 Libero. Street</td> <td>678-7906 Iaculis Rd.</td> <td>376-8891 Neque Avenue</td> <td>turpis.egestas@ultricesiaculisodio.com</td> </tr> <tr> <th>76</th> <td>Brennan</td> <td>Berry</td> <td>08/03/2019</td> <td>$60.00 Ap #892-1345 Tellus St. 8733 Ligula St. Ap #248-2366 Nunc Rd. Nulla@enimdiamvel.com
77 Driscoll Burch 04/10/2020 $42.83</td> <td>Ap #345-6769 Lacus Avenue</td> <td>574-6143 Et St.</td> <td>105-5673 Vehicula Ave</td> <td>sit@atvelit.ca</td> </tr> <tr> <th>78</th> <td>Dylan</td> <td>Figueroa</td> <td>02/10/2019</td> <td>$92.13 164-9154 Vitae Street Ap #215-459 Proin Av. Ap #914-6812 Tempor Rd. nascetur.ridiculus.mus@vulputatenisisem.edu
79 Knox Luna 10/29/2019 $5.63</td> <td>2523 In Street</td> <td>Ap #506-7675 Imperdiet Rd.</td> <td>Ap #321-3874 Vel Street</td> <td>magna.Phasellus.dolor@nectempus.org</td> </tr> <tr> <th>80</th> <td>Dean</td> <td>Dennis</td> <td>10/07/2019</td> <td>$58.39 Ap #961-521 Ipsum Av. 682-2710 Ac, Street 680-3228 Tristique Av. eleifend.Cras@loremeumetus.co.uk
81 Clark Andrews 06/03/2019 $94.58</td> <td>5076 Gravida. Rd.</td> <td>423-3585 Erat Street</td> <td>6084 Gravida St.</td> <td>nec@enim.co.uk</td> </tr> <tr> <th>82</th> <td>Jasper</td> <td>Wilder</td> <td>08/23/2019</td> <td>$65.66 9586 Lorem, Avenue P.O. Box 237, 128 Integer Rd. 6702 Libero Av. non.luctus@malesuadaiderat.ca
83 Abbot Riley 03/04/2020 $30.38</td> <td>284-2591 Non Avenue</td> <td>5189 Nulla Street</td> <td>4878 Tempus Road</td> <td>sollicitudin.orci@liberodui.net</td> </tr> <tr> <th>84</th> <td>Keith</td> <td>Meyer</td> <td>08/28/2018</td> <td>$44.97 P.O. Box 835, 3445 At, Avenue 756-3192 Nec Road 312-8233 Vehicula Avenue aliquet.magna.a@Suspendissecommodo.org
85 Kasimir Christensen 09/09/2019 $30.62</td> <td>P.O. Box 962, 2647 Purus. Street</td> <td>P.O. Box 753, 1217 Ut, St.</td> <td>Ap #661-6774 Commodo Street</td> <td>ornare@risusodioauctor.edu</td> </tr> <tr> <th>86</th> <td>Kane</td> <td>Woods</td> <td>11/20/2018</td> <td>$33.01 4968 A Ave 451-2416 Pede, St. 916-6138 Ultricies Av. morbi.tristique.senectus@euenimEtiam.co.uk
87 Honorato Ruiz 09/27/2019 $29.15</td> <td>P.O. Box 838, 2148 Ut Street</td> <td>P.O. Box 645, 3116 Neque. Road</td> <td>742-2419 Sem St.</td> <td>Nunc@natoquepenatibuset.net</td> </tr> <tr> <th>88</th> <td>Boris</td> <td>Ramirez</td> <td>04/11/2019</td> <td>$11.19 P.O. Box 468, 5418 Velit Av. Ap #986-7943 Diam Rd. Ap #115-7485 Vivamus Rd. massa.non.ante@dui.edu
89 Bruce Church 01/03/2020 $92.55</td> <td>280-3706 Adipiscing St.</td> <td>P.O. Box 396, 8141 Amet, Ave</td> <td>Ap #544-3906 Sed St.</td> <td>Nullam@Ut.ca</td> </tr> <tr> <th>90</th> <td>Darius</td> <td>Warren</td> <td>07/18/2019</td> <td>$59.48 255-1540 Dui. Rd. P.O. Box 999, 1286 Dolor. Ave 5712 Pretium Ave Cum.sociis.natoque@liberoat.com
91 August Cleveland 10/24/2019 $98.53</td> <td>239-187 Venenatis Avenue</td> <td>710-6545 Diam. St.</td> <td>P.O. Box 656, 3450 Dui, Street</td> <td>faucibus.leo.in@arcu.net</td> </tr> <tr> <th>92</th> <td>Stuart</td> <td>Martin</td> <td>10/25/2019</td> <td>$25.95 Ap #818-2389 Sed Rd. P.O. Box 922, 7474 Purus St. Ap #907-3803 Sed Road sem.semper.erat@milaciniamattis.net
93 Hashim Andrews 12/31/2018 $56.50</td> <td>Ap #961-4075 Vitae, Road</td> <td>8722 Ornare Rd.</td> <td>9402 Nec Rd.</td> <td>libero.nec@SuspendisseduiFusce.org</td> </tr> <tr> <th>94</th> <td>Adam</td> <td>Lowery</td> <td>03/10/2019</td> <td>$56.22 Ap #103-7642 Vivamus St. P.O. Box 146, 7542 Lacus. Rd. 788-6809 Habitant Ave aliquet@Proinvelarcu.edu
95 Victor Hobbs 05/24/2019 $54.56</td> <td>4034 Vitae St.</td> <td>P.O. Box 930, 1683 Eu Rd.</td> <td>P.O. Box 181, 3360 Mus. Rd.</td> <td>ipsum@dictumaugue.com</td> </tr> <tr> <th>96</th> <td>Neil</td> <td>Bradford</td> <td>02/07/2020</td> <td>$74.52 1434 Aliquet, Street 956-6627 Nunc Av. Ap #727-6109 Sapien. Av. sapien.Nunc@euodioPhasellus.net
97 Noble Conrad 10/29/2019 $43.99</td> <td>Ap #173-7049 Eget, St.</td> <td>Ap #620-2512 Ut Street</td> <td>8768 Aenean St.</td> <td>tellus.Nunc.lectus@ornare.org</td> </tr> <tr> <th>98</th> <td>Brody</td> <td>Whitaker</td> <td>08/09/2018</td> <td>$96.24 Ap #371-9803 Aliquam Rd. 8892 Euismod Street Ap #201-659 Libero. Street non.dapibus.rutrum@eumetus.co.uk
99 Alden Mccormick 07/27/2019 $2.66 Ap #375-1139 Risus. Road 7259 Duis Avenue 955-4058 Maecenas St. ut.erat@aceleifend.com

100 rows × 8 columns

In [20]:
### Renaming Column Name /Inplace
df.rename(columns={'Age':'Date of Birth'},inplace=True)
In [21]:
df.columns
Out[21]:
Index(['First Name', 'Last name', 'Date of Birth', 'SALARY', 'STREET Address1',
       'STREET Address2', 'STREET Address3', 'email'],
      dtype='object')
In [22]:
len(df.columns.values)
Out[22]:
8
In [49]:
# Renaming Column Names using select values
df.columns.values[7] = 'Email Address'
In [23]:
df.columns
Out[23]:
Index(['First Name', 'Last name', 'Date of Birth', 'SALARY', 'STREET Address1',
       'STREET Address2', 'STREET Address3', 'email'],
      dtype='object')
In [24]:
### Selecting All Columns Except One
df.columns[df.columns != 'SALARY']
Out[24]:
Index(['First Name', 'Last name', 'Date of Birth', 'STREET Address1',
       'STREET Address2', 'STREET Address3', 'email'],
      dtype='object')
In [25]:
### Selecting All Columns Except One
df.loc[:, df.columns != 'SALARY'].columns
Out[25]:
Index(['First Name', 'Last name', 'Date of Birth', 'STREET Address1',
       'STREET Address2', 'STREET Address3', 'email'],
      dtype='object')
In [26]:
# Select Column Names Except One Using Difference
df.columns.difference(['SALARY'])
Out[26]:
Index(['Date of Birth', 'First Name', 'Last name', 'STREET Address1',
       'STREET Address2', 'STREET Address3', 'email'],
      dtype='object')
In [27]:
# Select Column Names Except One Using Negation of isin
df.loc[:,~df.columns.isin(['SALARY'])].columns
Out[27]:
Index(['First Name', 'Last name', 'Date of Birth', 'STREET Address1',
       'STREET Address2', 'STREET Address3', 'email'],
      dtype='object')
In [28]:
### Select Column Names that Begins with a Word or Character
df.filter(like='STREET').columns
Out[28]:
Index(['STREET Address1', 'STREET Address2', 'STREET Address3'], dtype='object')
In [29]:
### Select Column Names that Begins with a Word or Character
df.loc[:,df.columns.str.startswith('STREET')].columns
Out[29]:
Index(['STREET Address1', 'STREET Address2', 'STREET Address3'], dtype='object')
In [30]:
### Select Column Names that ENDS with a Word or Character
df.loc[:,df.columns.str.endswith('ame')].columns
Out[30]:
Index(['First Name', 'Last name'], dtype='object')
In [31]:
### Select Column Names that ENDS with a Word or Character Using Filter and Regex name$
df.filter(regex='ame$',axis=1).columns
Out[31]:
Index(['First Name', 'Last name'], dtype='object')
In [32]:
### Select A Group of Column Names
df.columns.values[0:4]
Out[32]:
array(['First Name', 'Last name', 'Date of Birth', 'SALARY'], dtype=object)
In [34]:
### Select A Group of Column Names
df.columns[0:4]
Out[34]:
Index(['First Name', 'Last name', 'Date of Birth', 'SALARY'], dtype='object')

 

You can also get the video tutorial here as well as the entire dataset and jupyter notebook on github.

Video Tutorial

Thanks For Reading

Jesus Saves

By Jesse E.Agbe(JCharis)

2 thoughts on “Data Cleaning In Python – Working with Column Names”

  1. My brother suggested I might like this web site. He was totally right. This post actually made my day. You can not imagine just how much time I had spent for this information! Thanks!

Leave a Reply to jesse_jcharis Cancel Reply

Your email address will not be published. Required fields are marked *