Excel Power BI Boot Camp in May

March 23rd, 2015 by datapig No comments »

Hi Folks. I've been busy wrapping up the final touches to a book that will come out in May.

I don't have time for a full post, but there's always time for a crass commercial message.

.

I wanted to give you a heads up on our new BI training scheduled for May in Dallas Texas.

This 3-day event is aimed squarely at Excel analysts who find it increasingly necessary to become more efficient at working with the new Microsoft BI tools like Power Pivot and Power Query. 

Bill Jelen (MrExcel) and I will guide you through the mysterious world of Microsoft's new BI Tools from a business analyst's point of view, introducing you to the rich set of tools and reporting capabilities that can be leveraged to more effectively synthesize data into Business Intelligence Dashboards.

 Learn how to:

 * Analyze large amounts of data and report those results in a meaningful way
 * Get better visibility into data with new Data Mining tools
 * Perform advanced Clustering and Market Basket analytics
 * Add interactive controls to your dashboards without VBA
 * Automate repetitive data cleansing and transformation tasks
 * Create eye-catching visualizations and Dashboards
 * Create map-based dashboards
 * Access external data sources to expand your message.

   
If you or any of your team members register with the discount code BUDDY, you'll get a $100 discount off registration.

Click Here to Register for this Event

 

Pull your Global Address Book into Excel

March 6th, 2015 by datapig 5 comments »

A friend of mine recently asked if there is a way to create an Excel table with employee information from the Outlook Properties dialog box for all employees in his Global Address Book. The Outlook Properties dialog box (seen below) gets its data from the company Active Directory. Active Directory is a special kind of database that is typically used to store organizational data such as employee names, user IDs and location information.

.

It's often useful to have a table of employee names and contact info. I have pulled this kind of table together in the past with some VBA code. But today, we don't have to. We can use Microsoft's free Power Query Add-in to reach into the Active Directory and extract data like email address, employee, department, and telephone number.

.

Today, I'll show you the steps you need to take to create a refreshable alpha roster of employees and contact info.

.

Step 1: Install Power Query

First, you will need to download and install the free Power Query Add-in.

Note that Microsoft offers Power Query for both Excel 2010 and Excel 2013 in both 32 and 64 bit platforms. Be sure to download the version that matches your version of Excel as well as the platform (32 or 64 bit) of Office your PC is running. Once installed, you'll need to activate the Add-in by following these steps:

  • Open Excel and look for a Power Query tab on the Excel Ribbon. If you see it, then the Power Query Add-In is already activated. You can skip the remaining steps.
  • Go up to the Excel Ribbon and click File->Options.
  • Choose the Add-Ins option on the left then look at the bottom of the dialog box for the Manage dropdown. Select COM Add-Ins from that dropdown and then click Go.
  • Look for Power Query for Excel in the list of available COM Add-Ins. Check the box next to each one of these options and click OK.
  • Close Excel and restart.

A successful install will result in a new Power Query tab on the Excel Ribbon.

.

.

Step 2: Understand the Active Directory Model

The Active Directory database is huge; it has dozens of tables. It can be difficult to know which field stores which piece of data – especially since the Active Directory field names are often cryptic. To help, I've overlaid the Active Directory field names on top of the Outlook properties dialog box to give you an idea which fields we need to capture. It turns out that these fields can be pulled from just three of the tables in Active Directory.


.

.

Step 3: Start a Power Query Data Extract

Click on the Power Query tab and select the From Other Sources command. There, you will find the Active Directory option.


.

Power Query will recognize your Domain and populate the server name for your Global Address book


.

Pressing the OK button will populate the Navigator pane with all the available tables in your Active Directory. You'll see tons of tables here. Ignore all tables except the user table. Double-click the user table to pull it into the Power Query window.


.

At this point, you'll have a table that looks like this in your Power Query window. This table is made up of related data sets that can be drilled into.


.

.

Step 4: Remove Unnecessary Columns

The first thing you'll want to do is delete all columns except for:

  • displayName
  • organizationalPerson
  • person
  • mailRecipient


.

.

Step 5: Select Needed Fields from organizationalPerson

Now it's time to select the fields you need.

Start by clicking the drill button next to organizationPerson.

This will open up a filter menu.

Click the (Select All Columns) option to clear all the check boxes.

.

Now go down the list of fields and place a check next to:

  • co
  • company
  • department
  • employeeType
  • givenName
  • initials
  • l
  • mail
  • physicalDeliveryOfficeName
  • postalCode
  • st
  • streetAddress
  • title

.

.

Step 6: Select Needed Fields from Person

Next, click the drill button next to Person.

This will open up a filter menu.

Click the (Select All Columns) option to clear all the check boxes.

.

Now go down the list of fields and place a check next to:

  • sn
  • telephoneNumber

.

.

Step 7: Select Needed Fields from mailRecipient

Click the drill button next to mailRecipient.

This will open up a filter menu.

Click the (Select All Columns) option to clear all the check boxes.

.

Now go down the list of fields and place a check next to:

  • mailNickName
  • msExchangeAssistantName

.

.

Step 8: Remove "Dummy" Service Accounts

Because we want this extract to contain only employees, we will want to remove any service account. That is to say, dummy accounts used f6or group email boxes which aren't tied to individual people. To remove these accounts, find the employeeType field and click the drill down arrow. This will open up a filter menu where you can uncheck the (null) value.

.

.

Step 9: Load Result to Excel

At this point, you can choose to rename and reorder the columns as you see fit.

Once you're done, click the Close & Load command on the Home tab of the Power Query window.

.

After a few moments, the entire contents of your Global Address book will be imported to a new worksheet. Best of all, you can refresh this table by simply right-clicking anywhere inside this table and choosing the Refresh option.

.

Automatically Delete Pivot Table Drilldown Sheets

February 24th, 2015 by datapig 5 comments »

I don't know about you, but my favorite thing to do is to pass off a built in Excel feature as one of my own genius report development ideas.

Me: "I made it so that you can double-click on any number inside the Pivot Table to drill into details"

Client: "ooh….nice feature Mike. You're a genius!"

Me: "Well what can I say...I give because I love."

.

Well that nifty trick of double-clicking inside a Pivot Table to drill down has a side effect; it leaves behind lots of worksheets. In most cases, we don't want to keep these sheets. In fact, they often become a nuisance, forcing us to take the time to clean them up by deleting them. This is especially a problem when you distribute pivot table reports to users who frequently drill into details. There is no guarantee they will every clean up the drilldown sheets. Although it is unlikely these sheets will cause issues, they can clutter up the workbook.

.

Today, I'll share with you a little VBA that removes the junk drill-down sheets left behind from double-clicking on a Pivot Table.

.

The basic premise of this VBA is actually very simple. When the user clicks into details, outputting a drill down sheet, the macro follows along and simply renames the output sheet so that the first 10 characters are "PivotDrill". Then before the workbook closes, the macro will find any sheet that starts with "PivotDrill" and deletes it. The implementation does get a bit tricky because you essentially have to two pieces of code. One piece goes in the Worksheet_BeforeDoubleClick event, while the other piece goes into the Workbook_BeforeClose event.

.

The first thing you'll need to do is enter this code into the Worksheet_BeforeDoubleClick event (for the sheet your Pivot Table sits in).

.

  1. Private Sub Worksheet_BeforeDoubleClick(ByVal Target As Range, Cancel As Boolean)
  2.  
  3. 'Declare you Variables
  4. Dim pt As String
  5.  
  6. 'Exit if Double-Click did not occur on a pivot table
  7.  
  8. On Error Resume Next
  9.  
  10. If IsEmpty(Target) And ActiveCell.PivotField.Name <>"" Then
  11.  
  12. Cancel = True
  13.  
  14. Exit Sub
  15.  
  16. End If
  17.  
  18. 'Set the pivot table object
  19.  
  20. pt = ActiveSheet.Range(ActiveCell.Address).PivotTable
  21.  
  22. 'If Drilldowns are Enabled, Drill down
  23.  
  24. If ActiveSheet.PivotTables(pt).EnableDrilldown Then
  25.  
  26. Selection.ShowDetail = True
  27.  
  28. ActiveSheet.Name = Replace(ActiveSheet.Name, "Sheet", "PivotDrill")
  29.  
  30. End If
  31.  
  32. End Sub

.

Next, we set up the Worksheet_BeforeClose event. As the name suggests, this code will run when the workbook closes.

.

  1. Private Sub Workbook_BeforeClose(Cancel As Boolean)
  2.  
  3. 'Declare you Variables
  4.  
  5. Dim ws As Worksheet
  6.  
  7. 'Loop through worksheets
  8.  
  9. For Each ws In ThisWorkbook.Worksheets
  10.  
  11. 'Delete any sheet that starts with PivotDrill
  12.  
  13. If Left(ws.Name, 10) = "PivotDrill" Then
  14.  
  15. Application.DisplayAlerts = False
  16.  
  17. ws.Delete
  18.  
  19. Application.DisplayAlerts = True
  20.  
  21. End If
  22.  
  23. Next ws
  24.  
  25. End Sub

.

.

At this point, each time the workbook closes, the "PivotDrill" sheets will be deleted.

No more junk sheets left over from overzealous managers clicking all over your Pivot Tables.

Text Match and Fuzzy Lookup

January 28th, 2015 by datapig 5 comments »

I would bet that every one of us has seen Customer lists from separate sources that can't be matched up because of spelling mistakes and inconsistent naming conventions and abbreviations. In fact, there are consulting firms whose entire business revolves around helping organizations reconcile their "Master Data". I can't count how many times I've seen entire projects launched to create a "gold source for master data". Yet here we are; manually matching up addresses, customer names, or people names from different data sources.

.

Well today, I'd like to introduce you to a free tool from Microsoft that can help ease your pain a little. This tool is a free Excel Add-in called Fuzzy Lookup. The Fuzzy Lookup Add-in enables you to perform similarity analysis to match textual strings that mean the same thing but are not spelled exactly the same. With this tool, you can match things like customer addresses from two separate data sources, or even find imperfect duplicates in a single dataset.

.

Understanding the Jaccard Index of Similarity

The magic behind the Fuzzy Lookup Add-in comes from the Jaccard Index of Similarity.

The Jaccard index, also known as the Jaccard similarity coefficient, was developed by Paul Jaccard as a way to document the distribution of different types of flora (yawn). Jaccard's index gave him a statistical way to measure similarities between sample sets. The gist of Jaccard's index is this:

.

You take two sample sets. You count the attributes shared by both samples (call this Z). You count the attributes observed in only one sample (call this X). You then count the attributes observed in only the other sample (call this Y). Finally, you can calculate the similarity index by using the three counts in this operation: Z / (Z+X+Y).

.

Confused? Don't worry. I have an example.

Let's say we want to compare the similarities between these two Excel nerds.

We define some attributes, then we tag each nerd with a 1 when each attribute evaluates to TRUE.

.

We can now get the count of the intersections and then perform the math.

As you can see, the similarity index is .57 or 57%.

The more similar two sample sets are, the closer Jaccard's index will be to 1.

 

In its simplest form, the same kind of analysis can be done on Textual Strings using the Jaccard Index. You can see in this example, the words hair and hare result in an index of .60.

By the way, in terms of textual matches, 60% is not a great score. In my experience, most textual matches below 70% index is suspect.

 

Understanding the Fuzzy Lookup Transformations

Ok. Now you know that the Fuzzy Lookup algorithm is based on the Jaccard Index of Similarity. But it's important to note that Fuzzy Lookup does not run on your raw data. Before running the Jaccard Index, Fuzzy Lookup applies some built-in transformation algorithms designed to improve its matching results.

 

The first thing to understand is that Fuzzy Lookup converts data using something called a Tokenizer. For example, the record {"DataPig", "Bacon Boulevard"} might be tokenized into the set, {" DataPig", "Bacon", "Boulevard"}. Each word is considered to be a token. Tokens are assigned weights based on frequency of use. High weights are applied to tokens that are uncommon. Low weights are applied to tokens that come up frequently. For example, frequent words such as "Boulevard" are given a lower weight than less frequent words such as "DataPig". Fuzzy Lookup gives you the opportunity to override the default token weights by supplying your own weights via an Excel Table.

.

Fuzzy Lookup also has algorithms and built-in dictionaries that allow for the automatic correction of spelling mistakes, word merge scenarios, string split scenarios, and string prefix corrections. All of these automatic transformation can be turned on and off by via the Fuzzy Lookup Configure dialog box. You can add custom Transformation rules by pointing the TransformationRowsetName property to your own Excel Table of rules.

Note: The sample workbook installed with the Fuzzy Lookup Add-in includes a Customization tab that shows you how to set up your own custom transformation table.

.

Using the Fuzzy Lookup Add-In

To use the Fuzzy Lookup Add-in, you'll need to first convert your datasets into Excel Tables (click anywhere inside your data and press Ctrl+T).

In this example, we have two Excel Tables (IndProfile and CustInfo). We want to match up the customer names.

.

Click the Fuzzy Lookup command button to launch the tool.

.

In the Fuzzy Lookup dialog box, do the following:

1. Select the two tables you want matched.

2. Select the columns that hold the data you want matched.

3. Click the join button to drop your two columns down to the Match Columns section.

4. Select all the columns you want included in your output. Be sure to include the FuzzyLookupSimilarity column.

5. Define how many matches you want per record. I recommend setting this to 1.

6. Define the Similarity Threshold. This is the Jaccard Index number. As mentioned before, in my experience, any match with an index of less than .70 is typically not that accurate. I usually set this to .60 just to see the results.

.

Once you have defined your selections, be sure to place your cursor in a new blank worksheet. Be warned, Fuzzy Lookup WILL OVERWRITE DATA.

Press the Go button to output the results.

You will see a new table that includes the Output Columns you selected (in step 4 above). The Similarity column will show you the Jaccard index for each match. Any row with no matches will get an index of 0.

.

It's important to review any matches with a similarity index less than 1.

Don't take for granted that Fuzzy Lookup will get everything right.

For example, you can see in Row 7 below, Fuzzy Lookup gave us a .86 index score on the match for NVISION Inc. and WORLD VISION INC. An index score of .86 is typically a good match. However, we can see that in this case, the two names are not the same.

.

If you want to perform a Fuzzy search for duplicates within the same table, simply reference only that table when setting up the search parameters. For instance, in this example we are looking for Fuzzy duplicates in the Org Name column of the IndProfile table.

.

I guess that's about it.

Fuzzy Lookup is one of those tools you really need to experiment with to get a feel for how helpful it can be.

Although it's not perfect, it has saved me hours of scripting and manual matching.

So before you burn out your eyeballs staring at two data sets again, take some time to play with Fuzzy Lookup. It may become your new best friend.

Business Intelligence vs Data Science

January 26th, 2015 by datapig No comments »

Hello all! Belated Happy New Year! I've been busy working on a new book and a few other projects. It's time to get back to blogging.

.

I recently saw this article explaining the difference between BI Analysts and Data Scientists.

On the surface, this article highlights the operational differences between BI Analysts and Data Scientists. But at the core, it's a view into how the BI industry has changed over the last few years. Here's a graphic from that article.

.

As I look at this graphic, I personally recognize many of the tasks outlined here.

I can remember a time when I felt the needs of my customers change from simple trending dashboards to more predictive analytics. In the last few years, many Excel analysts have been asked to do more Data Science'y kind of stuff. Stuff like:

  • Analyze the correlation between key demographic data and transactional metrics such as number of purchases, purchase amounts, and frequency of visits
  • Score customers based on customer satisfaction levels, likelihood to recommend, and capacity utilization
  • Analyze association rules; for example, if Customer A buys Product A, we can predict (with 90% confidence level) that there is an 85% likelihood that this customer will buy Product B.
  • Predict outcomes based on certain independent variables found through regression analysis; for example, a machine's likelihood to break down given a number of interrelated variables.

.

In the past, business intelligence primarily consisted of dashboards showing historic data on an agreed upon set of key performance metrics. Organizations today are demanding more dynamic predictive analysis, the ability to iteratively perform data discovery, and the freedom to take the hard left and right turns on data presentation. These managers often turn to Excel analysts to provide the needed analytics and visualization tools.

.

Over the new few months, I'd like to focus my blog posts here on some of the more advanced Data Science'y kind of stuff that we'll all be asked to do.

.

This leads me to the Excel Power BI Boot Camp I'm hosting with Bill Jelen (Mr. Excel) in Dallas from May 20th – May 22nd.

.

This 3-day event is aimed squarely at Excel analysts who find it increasingly necessary to:

  • Analyze large amounts of data and report those results in a meaningful way
  • Get better visibility into data with new Data Mining tools
  • Perform advanced Clustering and Market Basket analytics
  • Add interactive controls to your dashboards without VBA
  • Automate repetitive data cleansing and transformation tasks
  • Create eye-catching visualizations and Dashboards
  • Create map-based dashboards
  • Access external data sources to expand your message

.

Bill and I only have 25 seats left in the class. If you sign up before March 1st, you'll get a $100 discount off registration.

.

I'll be back tomorrow with an exciting new series of posts on the SQL Server Data Mining Add-in.

See you then!

 

My Take on Linkedin

December 23rd, 2014 by datapig 6 comments »

The Case Against Maths

December 19th, 2014 by datapig 11 comments »

I didn't want to finish out the last full work week of the year without a blog entry.

So let me bring up something that has been bothering me for quite some time.

Take a look at these two statements, then point to the one that looks right to you.

.

If you're American, you probably chose the statement on the right. If you're English, I bet you chose the statement on the left. In researching articles for some of my statistics posts, I often encounter variations on how the abbreviated word for mathematics is used. American authors use the word math as in – "Do the Math". The English use the word maths as in – "Do the Maths".

.

As a proud overweight American, I find it odd to say "Do the Maths".

What's bothers me the most is the seemingly arbitrary plural designation given to the abbreviation for mathematics.

.

Here is the way I see it. Mathematics is not a pluralized word for Mathematic. You don't have 1 Mathematic and 5 Mathematics. You simply have Mathematics. In fact, the word Mathematics (the overarching name for the study of all things mathematical) is used solely in singular verb forms.

You would say "Mathematics is useful in business".

You would not say "Mathematics are useful in business".

.

Even the English agree.

They say "Maths is fun".

No one says "Maths are fun"

.

That's right, even though there is an S in Maths, it's not treated as a plural.

It's obvious that the English use the abbreviated term "maths" (with an S at the end) to account for the s in Mathematics. Ok, I see that. So the question now is why is there an S at the end of Mathematics in the first place? After all, we don't say Literatures or Musics.

.

Well, according to Wikipedia, The most probable reason is that the origin of the word Mathematics is the Greek word mathematika. In Greek, when a word ends in the letter A, that typically denotes that the word is plural. So when mathematika was being translated to English, the letter S was added to the end.

,

In short, the term Maths is a singular word that looks plural.

And that, my friends, gives me the heebee-jeebees.

.

Ahhh…it feels good to get that off my chest…finally.

Well, have a great weekend – and remember:

Office Update Breaks ActiveX Controls

December 11th, 2014 by datapig 7 comments »

I checked my email today and saw a bevy of emails from friends and clients claiming that their workbooks broke over-night. When one person contacts me saying something is broken, my general attitude is "they're on crack". But when I get a flood of emails, that's a horse of another color.

.

Apparently, a recent Office Update introduced security "fixes" that break any workbook or code involving ActiveX Controls. I have to admit, I didn't even notice it. I guess my workbooks are sorely lacking in ActiveX Controls.

.

The ever amazing Jan Karel Pieterse shows us how to fix the issue at Daily Dose of Excel.

.

His fix addresses the set of .exd files that prevent the addition or use of ActiveX Controls. These .exd files are added to your machine through the update.

The answer is to delete these .exd files (which are located under C:\users\[your name]\App Data\local\temp).

Here are the ones I found on my machine. I deleted the nasty buggers and my Excel is back to normal.

.

I bet it's a great day to be on the Microsoft Help Desk today.

.

Thanks Jan Karel for the fix!

 

 

Changing the Look of Embedded Objects

December 4th, 2014 by datapig 5 comments »

Most of us know that you can embed objects like Word Files, Outlook Messages, and even other Excel Files. Here's an example of a Word object I embedded in my spreadsheet. My customers can double-click on this object to open the Word document I've embedded. This is cool, but it sure does come out ugly. I'd like to make it more attractive.

.

.

Here are some steps I can take to make a better looking Embedded Object.

  1. Click Insert->Object to open the Object dialog box.
  2. In the Object dialog box, go the Create from File tab and click the Browse button to find the file I want embedded (I selected a document on the mating habits of koalas).
  3. Click the Display as Icon check box.


    .

  4. Click the Change icon button.
  5. In the Change Icon dialog box, select the icon that is just a blank space, and then clear the caption input so that no caption will be shown.

    .
    Don't see a blank icon option?

    You can click the Browse button on the Change Icon dialog box and choose any strange file type that doesn't have associated icons. My go-to file is the bootstat.dat file under the Windows directory (c:\windows\bootstat.dat). Again, you can pick any strange file type.
    .


    .
    .

  6. After pressing OK, I see an empty square. I can click on that square and go to the Format tab. On the Format tab, I click Shape Fill->Picture.


    .
    .

  7. I can select a picture I want to use instead of that lame Word icon. I've got a cool picture of a Koala. I choose that and press the Insert button.

    .

.

And voila! I've got a nice looking embedded object. When I double-click the object, my document on the mating habits of koalas opens.

.

Of course, you don't have to use pictures of over-sexed koalas.

You can use this technique to display your own application icons, info-graphics, or any other image that fits the theme of your report or dashboard. These embedded objects are great for distributing different kinds of files with your Excel workbook. You can use these to include things like help files, back-up documentation, change logs, etc.

.

New Comments Filter on Bacon Bits

December 3rd, 2014 by datapig 3 comments »

Over the last few days, I've been fighting an issue with the comments on this blog.

Apparently, the spam robots have been posting over 25 comments per second on my blog, causing time outs and overloads. I'll never understand why anyone would think this practice of shot gunning comments would be an effective way of driving revenue.

.

In any case, I had to do some back-end cleanup of my WordPress SQL database and install a new Captcha mechanism to hopefully curb the spam bots.

.

I typically don't like Captcha tools, but I found one that fits the irreverent style of this blog

If and when you enter a comment, you'll need to solve a small puzzle.

Simply look at the picture on the right and click the image on the left that best corresponds with the picture.

.

Here's to the end of my WordPress woes.

Happy commenting.