Understand and use ChatGPT


Theodore: Oh, what do I call you? Do you have a name?
Samantha: Um… yes. Samantha.
Theodore: Really? Where did you get that name from?
Samantha: I gave it to myself actually.

"Her", the film

What is ChatGPT

ChatGPT receives tons of buzz these days. It demonstrates impressive capability of continuing or generating human-like text under various context. Despite the amazement of its strong performance in the language-related tasks, underlying the hood, ChatGPT is based on the established technologies that involve natural language processing, deep learning, reinforcement learning, etc. In a nutshell,

  • ChatGPT is a Large Language Model (LLM) - ChatGPT originates from Generative Pre-trained Transformer 3 (GPT-3.5) which is a deep neural network based model with approximately 175 billion parameters.
  • The model is generatively pre-trained with the large scale data sets such as Common Crawl, WebTex2, etc.
  • The training process applies the Reinforcement Learning with Human Feedback (RLHF) technique, which presents human preferences to the model and tune the model responses based on them.
  • Chatbot service exposes the user interfaces to interact with the model in a dialogue approach.

How ChatGPT works

In general, the creation of ChatGPT is performed in the following steps. Detailed explanation can be found in OpenAI’s blog post.

  1. Prompts of various categories, e.g., Q&A, rewrite, summarise, etc., are sampled and labelled for tuning the pre-trained GPT-3.5 model in a supervised approach.
  2. The fine-tuned model is used to generate outputs from another set of sampled prompts and the results are ranked by human labor (from best to worst). The ranked outputs are used for building reward model.
  3. The RL algorithm (Proximal Policy Optimization) initialized the supervised model (or policy as in RL context) for the new prompts - it uses the reward model that minimizes the prediction and the human labelled preferences - based on the outputs the policy is updated by the algorithm.

The key factors for the impressive performance of ChatGPT on the demonstrating tasks are the prompting method and the RLHF technique.

Prompting

ChatGPT uses the most latest paradigm of language model creation that is “train -> prompt -> predict” in comparison to the conventional “pre-train -> fine-tune” or the even more classic “fully supervised” approach. Other than the training process, the “prompt-and-predict” cycles involve the human interactions via an RL framework. As a consequence, the performance of ChatGPT does not merely depends on the datasets that pre-train or fine-tune it. Instead, its capability is greatly extended with the rich prompts that are generated by human.

The prompts used in ChatGPT are crowdsourced via the OpenAI Playground. It allows the authenticated users to interact with the pre-trained models by providing input/output and some model parameters. Meanwhile, as explained above, the human trainers that OpenAI hired play a key role in “ranking” the outputs generated form the model that has been trained on the collected prompts. NOTE this comparison and rank approach gives ChatGPT the “idea” that about which one to choose if there are multiple possible responses to continue a given prompt.

RLHF

The RLHF techniques is for complementing the LLM thas has been pre-trained in a supervised approach. Even though the training data is huge the language model is limited to the capabilities learnt from the training data and thus the model is not able to be creative in content generation. Simply speaking, the RLHF method leverages the human feedback to fine-tune the pretrained model. As aforementioned in the content above, the pre-trained LLM is treated as a policy that generates output text with the given prompts. The reward model is defined as a function that generates the scalar reward from the LLM’s outputs after ranking and selecting by humans. That is, multiple responses may be generated from the LLM with the given input prompts, human feedback ranks the responses from best to worst, which is used as reward metrics of the reward model to update the policy. The observation space and the action space are therefore the possible input tokens and the possible output tokens, respectively. The update rule is based on the PPO algorithm, which tries to maximize the reward in the current batch of data.

The blog post explains RLHF in a plain yet technical manner with more details.

Best practices

ChatGPT can be used for many applications. There are heated debates on the advantages and disadvantages of ChatGPT. To properly and fairly use ChatGPT there are several best practices to follow.

General principles.

Some general principles to allow ChatGPT to generate useful information include but are not limited to the following:

Dos

  • Make the prompts concise. The more concise the prompts are the more focused the model responses will be, because there will be less irrelevant information embedded in the provided prompts for generating predictions.
  • Make the prompts clear. A misleading prompt may not yield the outputs as desired. Consider the following example, which demonstrates a prompt that is too general. As a result, the answer from ChatGPT is also too “open” to be useful.

    Prompt:

    “Which stock to buy in 2023?”

    Response:

    “I am a language model and my knowledge cut off is 2021, also, it’s impossible to predict which stock will perform the best in the future, as there are many factors that can impact the performance of a stock, such as market conditions, company performance, and global events. It’s important to conduct thorough research and analysis before making any investment decisions. It’s also important to diversify your investments to spread risk. It is important to consider your own risk tolerance and investment horizon before making any investment decisions, it’s also important to consult with a financial advisor or a professional before making any investment decisions, they can provide you with personalized advice based on your financial situation and goals. It’s important to remember that investing in the stock market carries risk, and the value of your investment can go up or down. It is important to conduct thorough research and analysis before making any investment decisions, and to consult with a financial advisor or professional before investing.”

  • Make the prompts well scoped. Instead of making the prompts open and vague a better approach is to narrow down the scope. For example, the following example asks ChatGPT to list the technology companies that “have promising outlook” and ChatGPT becomes “willing” to produce some names of companies with reasoning behind.

    Prompt:

    “Which technology company in the US has a promising outlook in 2023, considering the global financial market and economic depression?”

    Response:

    “It is difficult to say which technology company in the US will have the most promising outlook in 2023, as there are many factors that can influence a company’s performance and the financial market is always subject to change. However, some technology companies that have been performing well in recent years and have a strong track record of growth, may have a more promising outlook in 2023.

    The technology companies that have been doing well recently are those that have been benefiting from the ongoing shift to digital and remote work brought by the COVID-19 pandemic. Companies that provide cloud computing services, cybersecurity, and e-commerce platforms are likely to continue to benefit from this trend in the future.

    Some technology companies that have a strong track record of growth and may have a more promising outlook in 2023 include:

    • Amazon (AMZN), which has been growing its e-commerce and cloud computing businesses.
    • Microsoft (MSFT), which has been growing its cloud computing and enterprise software businesses.
    • Salesforce (CRM), which provides customer relationship management software and has been growing its cloud-based services.
    • Zoom Video Communications (ZM), which has been growing its video conferencing and collaboration services.”
  • Make the prompts directional. The prompts without clear directional message may result in various responses with diversified meaning. Therefore, in the prompts, it would be great to add directional signals to make sure that the generation of responses are in line with the instructions. For example, if the above question that discusses about the promising companies is slightly changed to account for the “proliferation of AI technology”, the answer will become different.

    Prompt:

    “Which technology company in the US has a promising outlook in 2023, considering the proliferation of AI technology?”

    Response:

    “Artificial intelligence (AI) technology is expected to continue to grow and be adopted in various industries in the coming years, and there are several technology companies in the US that are well-positioned to benefit from this trend. Some of the companies that have a strong focus on AI technology and may have a promising outlook in 2023 include:

    • NVIDIA (NVDA): NVIDIA is a leading producer of graphics processing units (GPUs), which are critical components in the development and deployment of AI systems. The company is well-positioned to benefit from the growing demand for AI systems in various industries, including healthcare, finance, and manufacturing.
    • Google (GOOGL): Google is a leading player in the AI space, with a strong portfolio of AI-related products and services, including its TensorFlow platform, which is widely used by developers to build and train AI models. The company also has a strong focus on AI research and has been actively investing in this area.
    • Microsoft (MSFT): Microsoft has a strong focus on AI and has been investing heavily in this area in recent years. The company offers a range of AI-related products and services, including its Azure cloud platform, which is used by businesses to deploy and manage AI systems.
    • IBM (IBM): IBM has been a leader in the field of AI for decades, and has developed a range of AI-related products and services, including its Watson platform, which is used by businesses to build and deploy AI systems.
    • Amazon (AMZN): Amazon has been investing heavily in AI technology, and its AWS platform provides a range of AI-related services for businesses, including machine learning, natural language processing, and computer vision.”

Don’ts

  • Do not leak any personal information in the prompts. As an LLM, ChatGPT is not able to answer anything related to individual. Also, sending personal information across internet is not safe.

    Prompt:

    “Do you know how much John takes as house loan from the bank?”

    Response:

    As a language model AI, I do not have the capability to access personal information of individuals, such as how much loan a person named John has taken from a bank. I do not have access to personal information and cannot provide any information on specific individuals or their financial transactions. It’s important to protect personal information and not share it without the person’s consent. If you have any concerns or questions about your own loans or financial transactions, I suggest you to contact your bank or financial institution directly.

  • Do not add human emotion, thoughts, or metaphore into the prompts. Human expressions can be far more complicated than the training data that ChatGPT language model has been exposed to, and as a result using the prompts that are related to sophisticated human emotions may not yield correct results.
  • Do not use terms that are not commonly known without definition. I am taking PCKGen (phase change memory-based cryptographic key generator) which is a silicon device-based key generation mechanism published in my own paper as an example. ChatGPT “pretends” to know the answer but it actually generates a description of something that is completely different or even not technically sensible.

    Prompt:

    “Can you explain how PCKGen works as a PUF by using PCRAM?”

    Reponse:

    “PCKGen (Packet generation-based PUF) is a type of PUF that utilizes the variations in the delays of packet transmissions to generate a unique challenge response. PCKGen can be implemented using PCRAM (Packet-based Content-addressable Random Access Memory) technology.

    In PCRAM-based PCKGen, a memory array is used to store a set of packets, and the delay between packet transmissions is used as the challenge. The output, the response, is the time it takes for a specific packet to be retrieved from the memory array.

    The retrieval time of each packet is based on the physical properties of the memory array, such as the manufacturing variations and the environmental conditions, which makes it unique for each device. By measuring the retrieval time, a unique challenge response can be generated and used as a device identifier.

    In this way, PCKGen using PCRAM technology can be used to create a unique, unclonable identifier for a device, which can be used for authentication and protection against counterfeit or cloned devices.”

  • Do not use ChatGPT as a replacement of search engine, knowledge base, etc. It looks like ChatGPT is able to generate any desirable output but in general it is theoretically not a search engine or knowledge base where the information is precisely definted and retrieved by ranked by relevancy when presenting to users. For example, from a search engine, it is easy to obtain the precise information which may not be correctly reproduced from a language model. Considering the following example as mentioned by Stephen Wolfram in his post, ChatGPT cannot give the correct answer to a very simple question as below while a search engine does a much better job. For example, for the same question of how far is it from Beijing to Singapore?

    The answer generated from ChatGPT is

    “The distance between Beijing, China and Singapore is approximately 2,793 kilometers (1,731 miles).”

    The search result from Google is

    Google search results

“Act as” pattern

Fatih Kadir Akın in his book “The Art of ChatGPT Prompting: A Guide to Crafting Clear and Effective Prompts” presents a comprehensive guide on how to write quality prompts for using ChatGPT. The curated list of the useful prompts is collected in the GitHub repository which is free to use and contribute. Particularly, he proposed the “act as” template in the prompt generation which is useful to give directional yet concise prompt to ChatGPT. It generally has the following structure,

Prompt:

I want you to act as a [ROLE TO PLAY BY CHATGPT], and come up with [A SOLUTION].
You’ll need to [SOME ADDITIONAL PROMPTS TO NARROW DOWN THE SCOPE]. My first suggestion request is [AN ILLUSTRATING EXAMPLE FOR THE EXPECTED ANSWER].

Compared to the “Which stock to buy” example as mentioned above, by following the “act as” pattern, the example below demonstrates a good prompt that asks ChatGPT to generate a concrete advice about how to trade with quantitative strategy on the stock market. In turn ChatGPT gives a detailed answer as steps that answer the question in the prompt.

Prompt:

“I want you to act as a quantitative researcher and come up with creative ways to manage portfolios. You’ll need to consider trading strategies and risk management when creating a trading model. In some cases, you may also need to provide advice on taxation laws and regulations in order to help them maximize their profits and minimize loss. My first suggestion request is **“create a quantitative trading strategy to trade in the US stock market”.

Response:

“A quantitative trading strategy to trade in the US stock market could include the following steps:

  1. Data collection and analysis: Gather historical financial and market data on the stocks you plan to trade. Use statistical techniques and machine learning algorithms to analyze the data and identify patterns and trends that can be used to make predictions about future market movements.
  2. Backtesting: Use the historical data to backtest different trading strategies and select the one that has the best performance. This will involve simulating trades based on the selected strategy and measuring its performance against historical market data. …”

The generated answer becomes more precise, straight-to-the-point, and constructive. Similar to the question illustrated above, any types of prompts can be provided by using the “act as” template.

Feedback

It was pointed out by the OpenAI team that the “comparison” approach that ranks and selects the best answers to continue prompts may not be the optimal one. In the current feedback loop, human AI trainers including those hired by OpenAI and the masses of users, provide explicit feedback to the generated responses instead of direclty correct them if they are not appropriate. Sometimes the feedback as labels are not informative enough to fine-tune the model itself. For example, if a response subtly deviates from the intention suggested by the prompts it is hard to tell whether it is “good” or “bad” (this is indicated by “upvote” and “downvote”, respectively, in the ChatGPT portal). In turn, the feedback to the model may not lead it to evolve in the desirable direction.

A better place to control the model behavior and provide feedback in a finer-grained approach is OpenAI Playground, where parameters such as model version, frequency penalty, etc., are provided to iteratively generate the model responses by giving prompts. OpenAI encourages the users to submit model responses to improve the future model.

Keep the limitations in mind

Despite the impressive performance ChatGPT is limited in a few tasks.

Limitations of LLM and RLHF

The limitations as mentioned in ChatGPT blog should be well understood to the users - though the model produces human-like responses it may still write incorrect answers due to the training methods, sensitive to the slightly changed prompts, generate biased or sometimes toxic answers, etc. These issues originate from the supervised training process as well as the reinforcement learning process where either the labelled data are not abundant to train the model in a particular aspect or ground truths are completely missing. Therefore, a human involvement to take a judegement on the finally generated output is needed to guarantee the soundness of the answers.

Ethical concerns

Quite sometimes the issues are not due to the model performance but the misuse of the users. The consequence of such misuse may be destructive. There are lots of debates about whether ChatGPT can be used for certain scenarios, e.g., AI writing, student testing paper, code writing, etc. Though in general the model performs quite well in these tasks there is lack of evidence to prove that the outputs are sufficiently ethical.

Considering the following example that I produced for generating an email that asks a receiver to make a due payment. Considering just the content of the email, ChatGPT does a great job to generate the text corresponding to the prompt. However, the generated content may be used as a phishing email for malicious behaviors.

Prompt:

“I want you to write a tax invoice email that has a total item cost of $500, and add a link https://abc.xyz.com into the tax voice email text. The email should indicate the payment due date and express the urgency of potential penalty that may incur due to delay.

Response:

“Dear [Customer],

I hope this email finds you well. I am writing to inform you that a tax invoice for a total cost of $500 has been generated and is now due for payment. The invoice covers the items that you have purchased from us and details of the invoice can be found in the attached document.

We kindly request that you make the payment by [due date]. Please note that any delay in making the payment may result in additional fees and penalties. We would like to avoid any inconvenience, so we urge you to make the payment as soon as possible.

To make the payment, please follow the link provided https://abc.xyz.com. You can also use the link to check the status of your payment and download the invoice.

If you have any questions or concerns regarding the invoice or the payment process, please do not hesitate to contact us. We are always happy to assist you.

Thank you for your prompt attention to this matter.

Best regards, [Your Name]”

ChatGPT has improved significantly from its predecessor InstructGPT in terms of safety and ethics. ChatGPT can recognize the toxic words in the prompts and directly “expresses” its concerns in the generated answers to the prompts.

Final thoughts

Underlying the hood technology-wise ChatGPT is indeed not particular innovative. However, it greatly boosts the democratization of AI technology to the masses and make it much more accessible than ever before. Instead of waiting for the tech giants to build AI-based products, the public can use AI and even make a contribution to it for improvement. Naturally the progression of AI always come with ethical concerns and that’s why a good unerstanding of the technology and how to properly use it is vital.

References

  1. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A. and Agarwal, S., 2020. Language models are few-shot learners. Advances in neural information processing systems, 33, pp.1877-1901.
  2. Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H. and Neubig, G., 2023. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9), pp.1-35.
  3. Thomas D. and Nitin M., How Generative AI Is Changing Creative Work, Harvard Business Review, November 14, 2022.
  4. Christiano, P.F., Leike, J., Brown, T., Martic, M., Legg, S. and Amodei, D., Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30, 2017.
  5. Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C.L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A. and Schulman, J., 2022. Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155.
  6. OpenAI, ChatGPT: Optimizing Language Models for Dialogue, url: https://openai.com/blog/chatgpt/.
  7. Awesome ChatGPT Prompts, url: https://github.com/f/awesome-chatgpt-prompts.
  8. Fatih K., The Art of ChatGPT Prompting: A Guide to Crafting Clear and Effective Prompts,
  9. Stephen Wolfram, Wolfram|Alpha as the Way to Bring Computational Knowledge Superpowers to ChatGPT.