Linux for the Young Computer Scientist

Posted by mitch on April 07, 2015
career, education, software

So you’re about to graduate from college and while looking for a job, and someone expresses surprise when you confess that you’re not well-versed at Linux. Uh oh.

Everyone who expects to work in computing should know some basics about Linux. So much of the world runs Linux these days–phones, thermostats, TVs, cars.

Here’s a list of tasks that the young computer scientist should be able to do with Linux. The goal isn’t for you to be able to get a job as a “sysadmin” but to have a general familiarity with enough different things that you can solve real world problems with a Linux system. Of course, much of this applies to Mac OS X, too.

  1. Install CentOS or Ubuntu into a virtual machine on your Windows or Mac desktop/laptop. Open a Terminal window.
  2. You’ll probably get most of your help through Google searching, but on the command line, you can get help with specific commands by using the man command. E.g., man ls
  3. Basic file navigation: ls, cd, pwd, pushd, popd, dirs, df, du, mv. Be careful with rm.
  4. Basic editing with vim (open a file, save it, close it without saving, edit it, copy/paste with yank, jump to a specific line number, delete a word, delete a line, replace a letter.) (You can use nano while you’re coming up to speed on vim.)
  5. Use grep, less, cat, tail, head, diff commands. Use with pipes. Use of tail -f, less +F, tail -10, head -5 (or other numbers) is handy.
  6. tar and gzip to create and expand archives of files.
  7. Use sed, awk — replace the contents of a file, print the column of a file.
  8. Command-line git commands to checkout, edit files, commit, and push back to a remote repository (e.g., Github).
  9. Basic process navigation: ps, top, kill, fg, bg, jobs, pstree, Ctrl-Z, Ctrl-C.
  10. Unix permissions: chmod, chgrp, useradd, sudo, su; what do 777, a+rw, u+r mean, how to read the left column of ls -l / output.
  11. Simple bash scripts: Write a loop to grep a file for certain output, set command aliases
  12. Compile a simple C program with gcc. Use gdb to set breakpoints, view variables in a C program being debugged (where, bt, frame, p).
  13. Use tcpdump to watch HTTP traffic to a certain host.
  14. Understand /etc/rc.d and /etc/init.d scripts
  15. A basic understanding of /etc/rc.sysinit
  16. Attach a new disk and format it with fdisk or parted and mkfs.ext4. Run fsck. mount it. Check it with df.
  17. Know how to disable selinux and iptables for debugging. (service, chkconfig)
  18. How to use the route, ifconfig, arp, ping, traceroute, dig, nslookup commands.
  19. Write an iptables rule to forward a low number port (e.g, 80) to a high number port (e.g, 5000). Why would someone want to do this?
  20. A cursory understanding of the filesystem layout — what’s in /etc, /bin, /usr, /var, etc.
  21. A cursory understanding of what’s in /proc.
  22. Configure and use SSH keys for automatic login to another host.
  23. Forward a GUI window over SSH with X11
  24. Reboot and halt the machine safely (shutdown -h now, reboot, halt -p, init, etc commands)
  25. yum and apt-* commands (CentOS and Ubuntu, respectively)
  26. Modify boot options in grub to boot single user, to boot to a bash shell

For extra credit:

  1. The find command is a complicated beast, but simple to get started with.
  2. Copy files over SSH with scp.
  3. The dd command is useful for dealing with a variety of tasks, such as grabbing an image of a disk, getting random data from /dev/urandom, or wiping out a disk, and so on. Also be aware of the special files /dev/zero and /dev/null.
  4. Figure out how to recover a forgotten root password.
  5. Disable X11 and be able to do these tasks without the GUI.
  6. Do the same tasks above on a FreeBSD machine.
  7. Without the GUI, configure the machine to use a static IP address instead of DHCP.
  8. Use screen to create multiple sessions. Logout and re-attach to an existing screen session.
  9. Write a simple Makefile for a group of C or C++ files.
  10. What does chmod +s do? Other special bits.
  11. netstat, ncat, ntop.
  12. ldd, strings, nm, addr2line, objdump
  13. Grep with regular expressions
  14. What’s in /etc/fstab?
  15. history, !<number>, !!, !$, Ctrl-R

Books to peruse:

  1. Unix Power Tools
  2. sed & awk
  3. bash Cookbook
  4. Learning Python Every computing professional should know a simple scripting language that ties to the OS for more complex scripts than are rational than bash; python is an excellent place to start.
  5. Advanced Programming in the UNIX Environment, 3rd Edition (be sure to get the latest edition)
  6. If you’re interested in networking, be sure to read TCP/IP Illustrated, Volume 1: The Protocols (2nd Edition)
  7. You probably took an OS class. While Tanenbaum and Silberschatz write great books, if you want to know Linux internals better, Rubini’s device driver book is an excellent read. There is a 4th edition coming later this year. Linux Device Drivers, 3rd Edition

Tags: ,

Conway’s Law and Your Source Tree

Posted by mitch on February 05, 2014

In the last post, I mentioned Conway’s Law:

organizations which design systems […] are constrained to produce designs which are copies of the communication structures of these organizations.

Dr. Conway was referring to people in his article–but what if we substitute “organization” with your product’s source tree and “the communication structures” to how functions and data structures interact? Let’s talk more about Conway’s Law in the context of source tree layout.

Many products of moderate complexity involve multiple moving parts. Maybe a company has a cloud service (back end) and a Web UI (front end). Or a back end and a mobile front end. Or a daemon, some instrumentation scripts, and a CLI. Or a firmware and a cloud service.

I’ve had my hands in a number of companies at various depths of “hands in.” Those who lay out a source tree that fully acknowledges the complexity of the product as early as possible tend to be the ones who win. Often, a company is started to build a core product–such as an ability to move data–and the user interface, the start-up scripts, the “stuff” that makes the algorithm no longer a student project but a product worth many millions of dollars–is an afterthought. That’s fine until someone creates a source tree that looks like this:


What’s wrong here? Presumably, some of the code in util.c could be used in other places. Maybe some of the functions in error.c would be handy to abstract out as well. An arrangement like this in which the cool_product is a large monolithic app likely means it’s going to be difficult to test any of the parts inside of it; likely modules and layering are not respected in a large monolithic app. (Note that I am not saying it’s impossible to get this right, but I am saying it’s unlikely that tired programmers will keep the philosophy in mind, day in and day out.)

A slightly different organization that introduces a library might look as follows:

			Unit tests for the lib/ stuff
		Build scripts or related stuff required,
		code generation, etc.

As a side effect, we can also improve testing of the core code, thus improving reliability and regression detection. Ideally, the cool_product is a small amount of code outside of libraries that can be unit tested independently.

More than once I’ve heard the excuse, “We don’t have time to do this right with the current schedule.”

“I don’t have time for this” means “This isn’t important to me.” When you say, “I don’t have time to clean up the garage,” we all know what you really mean.

I was incredibly frustrated working with a group who “didn’t have time” to do anything right. Years later, that company continues to ship buggy products that could have been significantly less buggy. A few weeks of investment at the beginning could have avoided millions of dollars of expense and numerous poor reviews from customers due to the shoddy product quality. And it all comes back to how hard (or easy) it is to use the existing code, i.e., the communication structure of the code.

If you don’t have time to get it right now, when will you have time to go back and do it right later?

Getting it right takes some time. But getting it wrong always takes longer.

Teams with poor source tree layout often end up copying and pasting code. Sometimes a LOT of code. Whole files. Dozens of files. And as soon as you do that, you’re done. Someone fixes a bug in one place and forgets to fix it in another–over time, the files diverge.

If you’re taking money from investors and have a screwed up the source tree layout, there are two ethical options:

  1. Fix it. A week or two now will be significantly cheaper than months of pain and glaring customer issues when you ship.
  2. Give the money back to the investors.

If you’re reading this and shaking your head because you can’t believe people paint themselves into a corner with their source tree layouts, I envy you! But if you’re reading this and trying to pretend you don’t face a similar position with your product, it might be time to stop hacking and start engineering by opening up the communication paths where they should be open and locking down the isolation and encapsulation where they should not. This holds true for any language and for any type of product.

Tags: , , ,

Mac Software I Want in 2011

Posted by mitch on December 25, 2010
productivity, software

2011 is just a week away now and there’s a few things I’d like to see come to the Mac in the next year.

This past year, we finally got something called ‘Outlook’ native on the Mac. It’s time that this Outlook got a few things that have been missing on the Mac for a while–for me, that means:

1. Google Apps Sync Engine for Outlook. Without this, Outlook on the Mac isn’t as exciting for Google Apps as it is for Windows.

2. Xobni for Mac Outlook. Xobni is a very handy tool and I have bought copies for it for both machines on which I use Windows. But I’d really like to have it on the Mac.

3. A plug-in for the Mac Outlook. Again, we feature parity with Windows would be nice. Specifically, I want to easily tie emails onto contacts and opportunities in

In addition to these Outlook items, I have some other wishes:

4. OmniFocus needs a little email improvement. Right now, using email to send something to OmniFocus requires Apple Mail and custom rules. I’d like to see one of two solutions: (1) is to have OmniFocus check email with a dedicated account that the user configures. This seems confusing to explain to customers, so I am not sure that’s a good solution for Omni. (2) is for Omni to provide an email service, just like Evernote does. Evernote gets email integration right–just like Salesforce does–everyone should copy their approach.

5. More native DropBox support for iPad and iPhone apps. OK, this isn’t exactly Mac-specific, but I’d really like to see OmniFocus and Evernote applications able to browse DropBox contents easily. Native DropBox support for DAV as a front-end and DAV in apps with an eye towards simple DropBox integration would be handy.

6. Some better graphics tools. I use Photoshop, Illustrator, OmniGraffle; I’ve played with Pixelmator, DrawIt, and others–but somehow none of these quite do what I want for “marketing graphics”. I want the control of Illustrator and Photoshop to build widgets and something like OmniGraffle but with more intelligence to piece them together. This almost sounds like ClarisDraw… but EasyDraw isn’t the answer either.

7. More head-less and powerful virtualization. VMware Fusion guest processes are lost when the Mac window server goes away (e.g., killed via remote ssh). VMware Fusion needs support for multiple Ethernet interfaces without hacking around in random files. I’d like to see something marketed for a more professional workstation user with more of the Workstation features. I’d be happy to see a Fusion Pro or something at a higher price point if that is what’s needed.

8. VMware VI Client without having to run a Windows virtual machine in Fusion or investigate awkward WINE stuff.

I have some other wishes as well–I’d like to see Apple fix the broken iTunes sync with devices and limits around a single library. For example, music I buy on the road I cannot sync to an iPod or iPad or iPhone, since they all sync to my desktop–and my 256 GB MacBook Air isn’t big enough to hold my 1 TB iTunes library.

Tags: , , , ,

Mac Productivity Software Round-up

Posted by mitch on December 23, 2010

Recently I spent some down time playing with some new utilities to see what’s out there that might help me. I’ve added the following tools to my Mac toolbox:

TotalFinder ($15) — This is a plug-in for the Mac Finder that adds tabbed browsing and some other nifty hacks, such as a “two-up” tab view for looking at two directories concurrently in the same window. Finder window management has been a mess for 26 years, and perhaps a real problem for me for the last 20 years, so it’s nice to see someone working on this.

SecondBar (free) — This puts a second menubar up on a second monitor. Unfortunately, the name reflects reality–it only provides 1 more menubar, and not one per additional screen. However, it works well for what it does. I tried some other menu utilities, but they all miss the boat for my needs.

BetterTouchTool (free) — This does a number of things with the Magic Mouse that didn’t seem very useful to me. However, it also provides Windows 7-style window snapping, which is one of my favorite Windows 7 features. And it works better with multiple monitors than the Windows 7 feature does.

StoryMill ($50) — The killer feature for StoryMill is the full-screen mode. It’s significantly better than what Word or Pages have for full-screen and makes writing prose much easier for me. Of course, it brings a lot of organizational tools for writing very long documents (books!) as well. I love using StoryMill for cranking out raw text to be edited later.

Kaleidoscope ($40) — This is a very cool differ. There are certain things that I miss from my own differ, RoaringDiff, but my favorite part about Kaleidoscope is that I don’t have to fix the bugs. I just started looking at Kaleidoscope today, but expect I will be registering soon once I’ve confirmed that the CLI entry point will work well for my svn workflow.

Evernote (free or $45/yr for premium) — I’ve been using Evernote for a while now. Recently I added two new pieces to my Evernote ecosystem. One is that I’ve configured the email address book on my multi-function printer/copier/scanner like this one (the one I have is no longer available) so that I can scan paper into Evernote with 4 button clicks. The other is that I bought FastEver Snap ($2) for my iPhone, which lets me photograph whiteboards and upload them to Evernote immediately without having to take explicit action.

Tags: , ,