Skip to content

Instantly share code, notes, and snippets.

View neoblizz's full-sized avatar
🐙

Muhammad Osama neoblizz

🐙
View GitHub Profile
@neoblizz
neoblizz / tcolorbox.tex
Created July 2, 2023 19:02
Colored Alert Boxes in LaTeX
% Pick a nice color
\usepackage{xcolor}
\definecolor{pastelpurple}{RGB}{221,199,255}
% For Alert Boxes
% \usepackage{awesomebox}
\usepackage[most]{tcolorbox}
% textmarker style from colorbox doc
\tcbset{textmarker/.style={%
@neoblizz
neoblizz / hipifyme.sh
Created December 8, 2022 01:12
HIPIFY CUDA code (entire directory)
find . -type f \( -iname \*.cu -o -iname \*.cuh -o -iname \*.cpp -o -iname \*.hpp -o -iname \*.hxx -o -iname *.cxx \) -exec hipify-perl -inplace -experimental -print-stats {} \;
@neoblizz
neoblizz / doxygen-install.md
Created November 21, 2022 18:41
Install Latest Doxygen (UNIX)

I got Doxygen version 1.9.6 (6bc5f864d0c847a74944d6e9e4a42346e8c18b28) to build using the following instructions on Ubuntu 18.04.

sudo apt-get install graphviz bison flex
git clone https://github.com/doxygen/doxygen.git
cd doxygen
mkdir build
cd build
cmake -G "Unix Makefiles" ..
# Default installation at /usr/local/bin/doxygen
@neoblizz
neoblizz / launch.cuh
Last active October 23, 2022 03:21
C++ wrapper around cooperative groups launch API.
#pragma once
// Includes CUDA
#include <cuda_runtime.h>
#include <cooperative_groups.h>
#include <utility>
namespace cg = cooperative_groups;
@neoblizz
neoblizz / SoL-SpMM-GEMM.md
Last active March 26, 2022 02:01
Simple Speed-of-Light Analysis of SpMM and GEMM

Hardware Speed-of-Light Analysis

The following numbers are based on NVIDIA's Volta microarchitecture. To perform a similar analysis for a newer architecture, I recommend changing the numbers below based on device_query CUDA sample or wikipedia page.

CUDA Cores = SM * Cores per SM (SM = 80, Cores/SM = 64)
Maximum Clock Rate = Clock Rate (KHz) * 1e-6 (GHz)
Maximum Throughput (type == floats, doubles or half) =
    CUDA Cores * Maximum Clock Rate * Type Ratio (device properties) (GFLOP/s)

Maximum Memory Bandwidth = 
@neoblizz
neoblizz / sssp.cpp
Last active March 15, 2022 21:01
Parallel SSSP using C++20.
#include <vector>
#include <algorithm>
#include <execution>
#include <mutex>
#include <utility>
#include <ranges>
struct frontier_t {
// Underlying representation of frontier.
std::vector<int> active_vertices;
@neoblizz
neoblizz / ignore_on_device.md
Last active January 17, 2022 19:45
Envisioning `__ignore__` support in NVCC with a simple example.

How crazy is it to imagine a keyword (NVCC-supported), something like __ignore__, where if you use that in front of an expression (function, variable, object, etc.), it is ignored on the device side (in __device__ and __global__). This solves the issue where complicated containers that support host and device code, and their constructors/destructors that run on host code are all just ignored on device when they are passed as a member of larger class or struct. For example;

__global__ void kernel(foo_t foo) {
  auto idx = threadIdx.x;
  auto ptr = foo.get_ptr();
  ptr[idx] = idx;
}
@neoblizz
neoblizz / inheritance.md
Last active December 7, 2021 19:33
Capturing conditional inheritance in CPP (with pixel shaders as a toy example)

The Problem

We have a top-level object that the user wants to interact with, such as a pixel on the screen. But given the contents within that pixel, it may choose to color/shade it differently. If that pixel is representing a cloth, it may have a texture and color of a cloth, if it is representing metal, it may be shiny and metal-like... you get the point. To represent this object in c++, we have number of options. The most obvious one is to have a function that colors (or applies some sort of texture) to the pixel, and has the different specializations for the materials/colors within that function.

Obvious approach

void apply_texture(pixel_t* p, texture_t t) {
  if(t == texture_t::cloth) {
    // apply cloth
 } else if (t == texture_t::skin) {
@neoblizz
neoblizz / sparsify.cu
Last active May 4, 2021 11:47
CUDA-based implementation to introduce sparsity.
#include <stdio.h>
#include <stdlib.h>
#include <ctime>
#include <random>
#include <thrust/device_vector.h>
#include <thrust/host_vector.h>
#include <thrust/transform.h>
#include <thrust/iterator/counting_iterator.h>
@neoblizz
neoblizz / effective_modern_cmake.md
Created February 9, 2021 12:15 — forked from mbinna/effective_modern_cmake.md
Effective Modern CMake

Effective Modern CMake

Getting Started

For a brief user-level introduction to CMake, watch C++ Weekly, Episode 78, Intro to CMake by Jason Turner. LLVM’s CMake Primer provides a good high-level introduction to the CMake syntax. Go read it now.

After that, watch Mathieu Ropert’s CppCon 2017 talk Using Modern CMake Patterns to Enforce a Good Modular Design (slides). It provides a thorough explanation of what modern CMake is and why it is so much better than “old school” CMake. The modular design ideas in this talk are based on the book [Large-Scale C++ Software Design](https://www.amazon.de/Large-Scale-Soft