Skip to main content

Data Cleaning Functions in Excel

 


The CLEAN function

Using the CLEAN function removes nonprintable characters text. For example,

if the text labels shown in a column are using crazy nonprintable characters

that end up showing as solid blocks or goofy symbols, you can use the

CLEAN function to clean up this text. The cleaned‐up text can be stored in

another column. You can then work with the cleaned text column.


The CLEAN function uses the following syntax:

CLEAN(text)

The text argument is the text string or a reference to the cell holding the text

string that you want to clean. For example, to clean the text stored in Cell A1,

use the following syntax:

CLEAN(A1)


The CONCATENATE function

The CONCATENATE function combines, or joins, chunks of text into a single

text string. The CONCATENATE function uses the following syntax:

CONCATENATE(text1,text2,text3,...)

The text1, text2, text3, and so on arguments are the chunks of text that

you want to combine into a single string. For example, if the city, state, and

ZIP code were stored in fields named city, state, and zip, you could

create a single text string that stores this information by using the following

syntax:

CONCATENATE(city,state,zip)

If city were Redmond, state were WA, and zip were 98052, this function

returns this text string:

RedmondWA98052

The smashed together nature of the concatenated city, state, and ZIP code

information isn’t a typographical mistake, by the way. To concatenate this

information but include spaces, you need to include spaces as function arguments.

For example, the following syntax:

CONCATENATE(“Redmond”, “ ”,“WA”, “ ”,“98052”)

returns the text string

Redmond WA 98052


The EXACT function

The EXACT function compares two text strings. If the two text strings are

exactly the same, the EXACT function returns the logical value for true, which


is 1. If the two text strings differ in any way, the EXACT function returns the

logical value for false, which is 0. The EXACT function is case‐sensitive. For

example, Redmond spelled with a capital R differs from redmond spelled with

a lowercase r.

The EXACT function uses the following syntax:

EXACT(text1,text2)

The text1 and text2 arguments are the text strings that you want to

compare.

For example, to check whether the two strings "Redmond" and

"redmond" are the same, use the following formula:

EXACT("Redmond","redmond")

This function returns the logical value for false, 0, because these two text

strings don’t match exactly. One begins with an uppercase R and the other

begins with a lowercase r.


The FIND function

The FIND function finds the starting character position of one text string

within another text string. For example, if you want to know at what position

within a text string the two‐letter state abbreviation WA starts, you could use

the FIND function.

The FIND function uses the following syntax:

FIND(find_text,within_text,start_num)

The find_text argument is the text that you’re looking for. The within_

text argument identifies where or what you’re searching. The start_num

argument tells Excel at what point within the string it should begin its search.

For example, to find at what point the two‐letter state abbreviation WA begins

in the string Redmond WA 98052, use the following formula:

FIND(“WA”,“Redmond WA 98052”,1)

The function returns the value 9 because WA begins at the ninth position

(because spaces are counted).

The start_num function argument is optional. If you omit this argument,

Excel begins searching at the very beginning of the string.


The FIXED function

The FIXED function rounds a value to specified precision and then converts

the rounded value to text. The function uses the following syntax:

FIXED(number,decimals,no_commas)

The number argument supplies the value that you want to round and convert

to text. The optional decimals argument tells Excel how many places to the

right of the decimal point that you want to round. The optional no_commas

argument needs to be either 1 (if you want commas) or 0 (if you don’t want

commas) in the returned text.

For example, to round to a whole number and convert to text the value

1234.56789, use the following formula:

FIXED(1234.56789,0,1)

The function returns the text 1,235.

The LEFT function

The LEFT function returns a specified number of characters from the left end

of a text string. The function uses the following syntax:

LEFT(text,num_chars)

The text argument either supplies the text string or references the cell holding

the text string. The optional num_chars argument tells Excel how many

characters to grab.

For example, to grab the leftmost seven characters from the text string

Redmond WA, use the following formula:

LEFT(“Redmond WA”,7)

The function returns the text Redmond.

The LEN function

The LEN function counts the number of characters in a text string. The function

uses the following syntax:

LEN(text)


The text argument either supplies the text string that you want to measure

or references the cell holding the text string. For example, to measure the

length of the text string in cell I81, use the following formula:

LEN(I81)

If cell I81 holds the text string Semper fidelis, the function returns the

value 14. Spaces are counted as characters, too.

The LOWER function

The LOWER function returns an all‐lowercase version of a text string. The

function uses the following syntax:

LOWER(text)

The text argument either supplies the text string that you want to convert

or references the cell holding the text string. For example, to convert the text

string PROFESSIONAL to professional, use the following formula:

LOWER(“PROFESSIONAL”)

The function returns professional.

The MID function

The MID function returns a chunk of text in the middle of text string. The

function uses the following syntax:

MID(text,start_num,num_char)

The text argument either supplies the text string from which you grab some

text fragment or it references the cell holding the text string. The start_num

argument tells Excel where the text fragment starts that you want to grab.

The num_char argument tells Excel how long the text fragment is. For example,

to grab the text fragment tac from the text string tic tac toe, use the

following formula:

=MID("tic tac toe",5,3)

The function returns tac.


The PROPER function

The PROPER function capitalizes the first letter in every word in a text string.

The function uses the following syntax:

PROPER(text)

The text argument either supplies the text string or references the cell

holding the text string. For example, to capitalize the initial letters in the text

string ambassador kennedy, use the following formula:

PROPER("ambassador kennedy")

The function returns the text string Ambassador Kennedy.

The REPLACE function

The REPLACE function replaces a portion of a text string. The function uses

the following syntax:

REPLACE(old_text,start_num,num_chars,new_text)

The old_text argument, which is case‐sensitive, either supplies the text

string from which you grab some text fragment or it references the cell holding

the text string. The start_num argument, which is the starting position,

tells Excel where the text starts that you want to replace. The num_chars

argument tells Excel the length of the text fragment (how many characters)

that you want to replace. The new_text argument, also case‐sensitive, tells

Excel what new text you want to use to replace the old text. For example, to

replace the name Chamberlain with the name Churchill in the text string

Mr. Chamberlain, use the following formula:

REPLACE("Mr. Chamberlain",5,11,"Churchill")

The function returns the text string Mr. Churchill.

The REPT function

The REPT function repeats a text string. The function uses the following

syntax:

REPT(text,number_times)

Chapter 3: Scrub‐a‐Dub‐Dub: Cleaning Data 73

The text argument either supplies the text string or references the cell holding

the text string. The number_times argument tells Excel how many times

you want to repeat the text. For example, the following formula:

REPT("Walla",2")

returns the text string WallaWalla.

The RIGHT function

The RIGHT function returns a specified number of characters from the right

end of a text string. The function uses the following syntax:

RIGHT(text,num_chars)

The text argument either supplies the text string that you want to manipulate

or references the cell holding the text string. The num_chars argument

tells Excel how many characters to grab.

For example, to grab the rightmost two characters from the text string

Redmond WA, use the following formula:

RIGHT(“Redmond WA”,2)

The function returns the text WA.

The SEARCH function

The SEARCH function calculates the starting position of a text fragment

within a text string. The function uses the following syntax:

SEARCH(find_text,within_text,start_num)

The find_text argument tells Excel what text fragment you’re looking for.

The within_text argument tells Excel what text string that you want to

search. The start_num argument tells Excel where to start its search. The

start_num argument is optional. If you leave it blank, Excel starts the search

at the beginning of the within_text string.

For example, to identify the position at which the text fragment Churchill

starts in the text string Mr. Churchill, use the following formula:

SEARCH("Churchill","Mr. Churchill",1)

The function returns the value 5.


The SUBSTITUTE function

The SUBSTITUTE function replaces occurrences of text in a text string. The

function uses the following syntax:

SUBSTITUTE(text,old_text,new_text,instances)

The text argument tells Excel what text string you want to edit by replacing

some text fragment. The old_text argument identifies the to‐be‐replaced

text fragment. The new_text supplies the new replacement text.

As an example of how the SUBSTITUTE function works, suppose that you

need to replace the word Senator with the word President in the text

string Senator Obama.

SUBSTITUTE("Senator Obama","Senator","President")

The function returns the text string President Obama.

The instances argument is optional, but you can use it to tell Excel for

which instance of old_text you want to make the substitution. For example,

the function

SUBSTITUTE("Senator Senator","Senator","President",1)

returns the text string President Senator.

The function

SUBSTITUTE("Senator Senator Obama","Senator","President",2)

returns the text string Senator President Obama.

If you leave the instances argument blank, Excel replaces each occurrence of

the old_text with the new_text. For example, the function

SUBSTITUTE("Senator Senator Obama","Senator","President")

returns the text string President President Obama.

The T function

The T function returns its argument if the argument is text. If the argument

isn’t text, the function returns nothing. The function uses the following syntax:

T(value)


For example, the formula T(123) returns nothing because 123 is a value.

The formula T(“Seattle”) returns Seattle because Seattle is a text

string.

The TEXT function

The TEXT function formats a value and then returns the value as text. The

function uses the following syntax:

TEXT(value,format_text)

The value argument is the value that you want formatted and returned as

text. The format_text argument is a text string that shows the currency

symbol and placement, commas, and decimal places that you want. For

example, the formula

=TEXT(1234.5678,"$##,###.00")

returns the text $1,234.57.

Note that the function rounds the value.

The TRIM function

The TRIM function removes extra spaces from the right end of a text string.

The function uses the following syntax:

TRIM(text)

The text argument is the text string or, more likely, a reference to the cell

holding the text string.

The UPPER function

The UPPER function returns an all‐uppercase version of a text string. The

function uses the following syntax:

UPPER(text)

The text argument either supplies the text string that you want to convert

or it references the cell holding the text string. For example, to convert

76 Part I: Getting Started with Data Analysis

the text string professional to PROFESSIONAL, you can use the following

formula:

UPPER(“professional”)

The function returns the text string PROFESSIONAL.

The VALUE function

The VALUE function converts a text string that looks like a value to a value.

The function uses the following syntax:

VALUE(text)

The text argument either supplies the text string that you want to convert

or it references the cell holding the text string. For example, to convert the

text string $123,456.78 — assume that this isn’t a value but a text string —

you can use the following formula:

VALUE(“$123,456.78”)

The function returns the value 123456.78.

Comments

Popular posts from this blog

20 Power BI Dax Measures

Power bi 20 dax measures 20 DAX (Data Analysis Expressions) measures in Power BI with examples: Total Sales: scss Copy code Total Sales = SUM (Sales[Amount]) Average Sales Price: scss Copy code Avg Sales Price = AVERAGE (Sales[Amount]) Total Units Sold: mathematica Copy code Total Units Sold = SUM ( Sales [ Quantity ] ) Total Customers: scss Copy code Total Customers = COUNTROWS (Customer) Total Products: mathematica Copy code Total Products = COUNTROWS ( Product ) Maximum Sales Amount: scss Copy code Max Sales Amount = MAX (Sales[Amount]) Minimum Sales Amount: scss Copy code Min Sales Amount = MIN (Sales[Amount]) Sales Growth Percentage: mathematica Copy code Sales Growth % = ( Total Sales - [ Total Sales Last Year ] ) / [ Total Sales Last Year ] Total Profit: scss Copy code Total Profit = SUM (Sales[Profit]) Total Orders: scss Copy code Total Orders = COUNTROWS (Orders) Total Customers with Sales: css Copy code Total Customers with Sales = COUNTROWS( FILTER ...

Year Over Year calculation using Pivot Table

Instead of creating a formula outside of the pivot table, you can do this inside the pivot table. Start from the image with column D empty. Drag Revenue a second time to the Values area. Look in the Columns section of the Pivot Table Fields panel. You will see a tile called Values that appears below Date. Drag that tile so it is below the Date field. Your pivot table should look like this: Double-click the Sum of Revenue2 heading in D4 to display the Value Field Settings dialog. Click on the tab for Show Values As. Change the drop-down menu to % Difference From. Change the Base Field to Date. Change the Base Item to (Previous Item). Type a better name than Sum of Revenue2 - perhaps % Change. Click OK. You will have a mostly blank column D (because the pivot table can't calculate a percentage change for the first year. Right-click the D and choose Hide.

Split Column in Power Query

  Split Column In Power Query, a column of text can be split into multiple columns and in a number of different ways to achieve the results you want. By default, the name of the new columns contains the same name as the original column with a suffix of a period (.) and a number that represents each split section from the original column. You can then rename the column. Split Column Types - Click on the link below for details Split a column by delimiter Split a column by number of characters Split a column by positions Split a column by letter case combinations Split a column by digit and non-digit combinations