Ddl%20square%20logo%20-%20dark_large

District Data Labs

Visual Diagnostics for More Informed Machine Learning: Part 3

Visual Evaluation and Parameter Tuning

Note: Before starting Part 3, be sure to read Part 1 and Part 2!

Welcome back! In this final installment of Visual Diagnostics for More Informed Machine Learning, we'll close the loop on visualization tools for navigating the different phases of the machine learning workflow. Recall that we are framing the workflow in terms of the . . .

Read More

May 25, 2016

Preparing for NLP with NLTK and Gensim

PyCon 2016 Tutorial on Sunday May 29, 2016 at 9am

This post is designed to point you to the resources that you need in order to prepare for the NLP tutorial at PyCon this coming weekend! If you have any questions, please contact us according to the directions at the end of the post.

In this tutorial, we will explore the features of the NLTK library for text processing in order to build . . .

Read More

Posted in: nlppython

May 25, 2016

Visual Diagnostics for More Informed Machine Learning: Part 2

Demystifying Model Selection

Note: Before starting Part 2, be sure to read Part 1!

When it comes to machine learning, ultimately the most important picture to have is the big picture. Discussions of (i.e. arguments about) machine learning are usually about which model is the best. Whether it's logistic regression, random forests, Bayesian methods, support vector . . .

Read More

May 23, 2016

Visual Diagnostics for More Informed Machine Learning: Part 1

Feature Analysis

How could they see anything but the shadows if they were never allowed to move their heads?

— Plato The Allegory of the Cave

Python and high level libraries like Scikit-learn, TensorFlow, NLTK, PyBrain, Theano, and MLPY have made machine learning accessible to a broad programming community that might never have found it otherwise. . . .

Read More

May 19, 2016

Named Entity Recognition and Classification for Entity Extraction

Combining NERCs to Improve Entity Extraction

The overwhelming amount of unstructured text data available today from traditional media sources as well as newer ones, like social media, provides a rich source of information if the data can be structured. Named Entity Extraction forms a core subtask to build knowledge from semi-structured and unstructured text sources. Some of the first . . .

Read More

May 11, 2016

Building a Classifier from Census Data

An end-to-end machine learning example using Pandas and Scikit-Learn

One of the machine learning workshops given to students in the Georgetown Data Science Certificate is to build a classification, regression, or clustering model using one of the UCI Machine Learning Repository datasets. The idea behind the workshop is to ingest data from a website, perform some initial analyses to get a sense for what's . . .

Read More

May 02, 2016

Graph Analytics Over Relational Datasets with Python

The analysis of interconnection structures of entities connected through relationships has proven to be of immense value in understanding the inner-workings of networks in a variety of different data domains including finance, health care, business, computer science, etc. These analyses have emerged in the form of Graph Analytics -- the . . .

Read More

March 12, 2016

Archive