My thoughts on AI in 2023

OpenAI took the world by storm in 2023 with the relase of ChatGPT. Other companies quickly followed suit, introducing their own competitors to ChatGPT. In this update, I want to write down my thoughts on some of the biggest players in the field of generative AI.

When I use the term “generative AI,” I am referring to a computer program that can generate text, images, or other types of content. These programs can understand natural language and follow complex instructions. One notable example of a generative AI is OpenAI’s “GPT-4,” commonly known as ChatGPT. Another well-known generative AI is Midjourney, which specializes in generating images from text descriptions. In this update, I will focus on AI with text output, as that is the area I have the most experience with.

February 11, 2023

Use GPT-3 To Build A Code Translator

Once you know a programming language well, the process for learning a new language is not very hard. It is just time consuming. You need to read the documentation for basic syntax and flow control, get familiar with its idioms, memorize core parts of the standard libraries, and learn its tool chains. What can we do to speed up the learning process? One thing we can do is provide great examples in the documentation. Can we do better? What if you have working examples for every problem you encountered? What if you can describe your intents in a familiar language and see how it should look in a new language? As it turns out, GPT-3 is really good at this task.

January 2, 2023

OpenAI and ChatGPT

I had two weeks off at the end of 2022. Telesign closed its operations on the last week of 2022 to give employees well-deserved time off for a year of hardwork and I took another week off in addition to that. Like many technologists, I became captivated by OpenAI’s release of ChatGPT in 2022, and I spent a lot of the last two weeks exploring what OpenAI has to offer.

ChatGPT is a chatbot developed by OpenAI. It has been widely recognized for its impressive capabilities, such as imitating human writing, transforming plain English into code, and making glaringly stupid mistakes. The underlying technology of ChatGPT is a machine learning model known as GPT-3. This model is designed to predict how a human might continue a previous piece of text. For example, given what I’ve written so far, GPT-3 predicted what the rest of this blog post will look like:

May 3, 2021

Unit Tests vs Integration Tests

I used to favor automated integration testing. Now, I find myself walking away from it. I no longer find it worth the cost of setting up and maintaining such tests. I now rely mostly on unit testing.

First, I need to define what I meant by those terms. In context of this post, unit tests

In Wikipedia

unit test	integration test
automated, not always	not specified
ranging from entire ‘module’ to an individual function	modules tested as a group
depends on execution conditions and testing procedures	depends on unit test/implies modules are already unit tested

From Martin Fowler

October 16, 2020

Dell XPS-13 - Developer Edition

This Feburary, I ordered a Dell XPS-13 Developer Edition. The Developer Edition is a line of Dell laptops that ships with certified Linux OS. It has been my programming powerhouse for the last eight months.

My first choice for a laptop was not Linux. Windows and macOS are simply more practical. Games and MS Office just works on Windows. Software support on macOS is generally good (except games), but it beats Windows with its underlying BSD architecture. There is better hardware support simply because more people use it. However, I ultimately ended up with a Dell and Linux and it worked out great.

Pages

Vocabulary

bind: Sometimes a synonym for “map”
conflate: combine
disjoint: separate, e.g., odd numbers and even numbers are disjoint
disjunction: inclusive or – if one of the inputs is True
extrinsic: not intrinsic
federated: Top-down delegation of responsibilities; Has a single point of failure at the top.
PID controller: control loop feedback mechanism; e.g., curise control
99%ile: Abbreviation of percentile
EBNF: Extended Backus-Naur Form; Useful for defining the syntax of a programming language
EBNF terminal: a token/word/chunk in EBNF
Alpha: Αα
Beta: Ββ
Gamma: Γγ
Delta: Δδ; Commonly denotes ‘difference’
Epsilon: Εε; Used in Greedy-Epsilon algo for multi-armed bandit problems; Error margin in floating point comparisons.
Zeta: Ζζ
Eta: Ηη
Theta: Θθ
Iota: Ιι
Kappa: Κκ
Lambda: Λλ
Mu: Μμ
Nu: Νν
Xi: Ξξ
Omicron: Οο
Pi: Ππ
Rho: Ρρ
Sigma: Σσς
Tau: Ττ
Upsilon: Υυ
Phi: Φφ
Chi: Χχ
Psi: Ψψ
Union: ∪
Aleph: ℵ; Symbol for cardinal numbers. ℵ is pronounced as Aleph-null.
Empty Set: ∅
Such That: Commonly represented as a colon, :; Example, D={x^2|x ∈ N, x >=1, x <= 4}. This reads D is the set of all x^2 SUCH THAT: 1) x is a natural number; 2) x is greater or equal to 1; 3) x is less than or equal to 4.
Intersection: ∩
Subset: ⊂ or ⊆; e.g., if A = {1,4,9} and B = {1,4}, then B ⊂ A (B is a subset of A).
Belongs To: ∈; ∉ Means “not belong”.; To say that 1 belongs to S, we write 1 ∈ S.; e.g., if A = {1,4,9} and e = 4, then we say e∈A, meaning “e belongs to A”. However, one would not say e⊂A – e is a single element, not a set. Similarly, if B = {1,4}, one would not say B∈A or “B belongs to A”, as B is a set not a single element.
Complements: Difference between two sets
Relative Complement: A\B means objects that belong to A and not to B. i.e., {1,2,3}{3} == {1,2}
Omega: Ωω
P(A|B): The likelihood of event A occurring given that B is true.
P(A^C): The probability that A doesn’t happen
Precision Recall: Precision = probability that some retreived doc is relevant; Recall = probability that some relevant doc was retreived.
Narrow Integration Tests: exercise only that portion of the code in a service that talks to a separate service; uses test doubles of those services, either in process or remote; thus consist of many narrowly scoped tests, often no larger in scope than a unit test (and usually run with the same test framework that’s used for unit tests)
Broad Integration Tests: require live versions of all services, requiring substantial test environment and networ access; exercise code paths through all services, not just code responsible for interactions
Balanced Binary Search Tree: For example, red-black tree or AVL tree.
Natural Numbers: ℕ; double-struck N; Cardinal numbers,
Complex Numbers: ℂ; double-struck C

https://en.wikipedia.org/wiki/Blackboard_bold

August 1, 2020

Go io/fs Design (Part I)

As usual, LWN has a good write up on what’s going on in the Go community. This week’s discussion in on the new io/fs package. The Go team decided to use a Reddit thread to host the conversation about this draft design. LWN points out that posters raised the following concerns:

We added status logging by wrapping http.ResponseWriter, and now HTTP/2 push doesn’t work anymore, because our wrapper hides the Push method from the handlers downstream. / It becomes infeasible to use the decorator pattern more
Doing it “generically” involves a combinatorial explosion of optional interfaces

Ultimately, Russ Cox admits, “It’s true - there’s definitely a tension here between extensions and wrappers. I haven’t seen any perfect solutions for that.”

May 19, 2020

Unit tests and system clock

It took me way to long to learn this. Your code (and their unit tests) should inject the system clock as a dependency.

An example, let’s say you have a service that writes a record to the database with the system clock.

public void save(String userName) {
	long currentTimeMs = System.currentTimeMillis();
	User user = User.builder()
	    .name(userName)
		.updateTimeMs(currentTimeMs);
	database.save(user);
}

How would you test this? You can inject a mock database instance and use it to verify that it got a User object. Great! You can verify the username is as expected. How do you verify that tricky business rule that updateTimeMS is the “current time”?

May 17, 2020

Go Project Organization

Here’s a rough layout of how I organize my Go project. Some parts are situational and some parts are essential. I’ll go over both in this blog.

A rough layout:

+ basedir
   +-- go.mod (module jcheng.org)
   +-- hello (empty)
         +-- log/
         +-- utils/
         +-- config/
         +-- models/
         +-- repositories/
         +-- services/
         +-- cmd/
              +-- hello_app/
                     +--/cmd/
                          +-- speak/
                          +-- email/
                          +-- sms/

The basedir

Situational.

February 27, 2020

Dependencies

Some past self version of me is saying, every class and function should be explicit about their dependencies, so that they are easily testable. John0 would say, “If you have a service that talks to a database, the database client should be an explicit dependency specified in the constructor. This makes the code easily testable.”

There is another version of myself from 10 minutes ago arguing it’s foolish to be explicit about everything. He’d point to this piece of code he’s just looked at:

Related

The basedir